BIO序列提取实体(NER命名实体识别)

  • Post author:
  • Post category:其他

1、NER命名实体识别,网络预测的结果BIO,如何转录,提取出实体?

思路1:遇到B则前面存在的实体,进行一次存储。多个i粘连一块儿也可能被认为是一个实体。错误的情况是B识别成i了。对于类别判断失误,粘连的实体取众数。

#标签转录BIO格式
string="我是李明,我爱中国,我来自呼和浩特"
predict=["o","o","i-per","i-per","o","o","o","b-loc","i-loc","o","o","o","o","b-per","i-loc","i-loc","i-loc"]
item = {"string": string, "entities": []}
entity_name = ""
flag=[]
visit=False
for char, tag in zip(string, predict):
    if tag[0] == "b":
        if entity_name!="":
            x=dict((a,flag.count(a)) for a in flag)
            y=[k for k,v in x.items() if max(x.values())==v]
            item["entities"].append({"word": entity_name,"type": y[0]})
            flag.clear()
            entity_name=""
        entity_name += char
        flag.append(tag[2:])
    elif tag[0]=="i":
        entity_name += char
        flag.append(tag[2:])
    else:
        if entity_name!="":
            x=dict((a,flag.count(a)) for a in flag)
            y=[k for k,v in x.items() if max(x.values())==v]
            item["entities"].append({"word": entity_name,"type": y[0]})
            flag.clear()
        flag.clear()
        entity_name=""

if entity_name!="":
    x=dict((a,flag.count(a)) for a in flag)
    y=[k for k,v in x.items() if max(x.values())==v]
    item["entities"].append({"word": entity_name,"type": y[0]})
print(item)
{'string': '我是李明,我爱中国,我来自呼和浩特', 'entities': [{'word': '李明', 'type': 'per'}, {'word': '中国', 'type': 'loc'}, {'word': '呼和浩特', 'type': 'loc'}]}

思路2:只取B开头的实体,其它的不要。同样类别也是取众数。

#标签转录BIO格式
string="我是李明,我爱中国,我来自呼和浩特"
predict=["o","o","i-per","i-per","o","o","o","b-loc","i-loc","o","o","o","o","b-per","i-loc","i-loc","i-loc"]
item = {"string": string, "entities": []}
entity_name = ""
flag=[]
visit=False
for char, tag in zip(string, tags):
    if tag[0] == "b":
        if entity_name!="":
            x=dict((a,flag.count(a)) for a in flag)
            y=[k for k,v in x.items() if max(x.values())==v]
            item["entities"].append({"word": entity_name,"type": y[0]})
            flag.clear()
            entity_name=""
        visit=True
        entity_name += char
        flag.append(tag[2:])
    elif tag[0]=="i" and visit:
        entity_name += char
        flag.append(tag[2:])
    else:
        if entity_name!="":
            x=dict((a,flag.count(a)) for a in flag)
            y=[k for k,v in x.items() if max(x.values())==v]
            item["entities"].append({"word": entity_name,"type": y[0]})
            flag.clear()
        flag.clear()
        visit=False
        entity_name=""

if entity_name!="":
    x=dict((a,flag.count(a)) for a in flag)
    y=[k for k,v in x.items() if max(x.values())==v]
    item["entities"].append({"word": entity_name,"type": y[0]})
print(item)
{'string': '我是李明,我爱中国,我来自呼和浩特', 'entities': [{'word': '中国', 'type': 'loc'}, {'word': '呼和浩特', 'type': 'loc'}]}

版权声明:本文为hqh131360239原创文章,遵循 CC 4.0 BY-SA 版权协议,转载请附上原文出处链接和本声明。