关键词:
风力发电机故障
CasRel
余弦注意力
SOTA
摘要:
针对风力发电机故障领域关系抽取任务存在大量的专业术语、单实体重叠和多实体重叠、单句子关系多以及句子结构复杂的问题,本文提出一种改进的CasRel二元级联标记框架CA-CasRel。由于ERNIE模型在预训练过程中引入了实体信息和知识图谱信息,本研究首先用ERNIE模型代替BERT模型进行句子编码嵌入,深层次地捕捉文本序列的实体关系信息。另外,本研究在预测头实体之后,对头实体向量用余弦注意力机制进行编码,余弦注意力具有方向敏感性和语义纯度强化,可高效提取语义的本质特征和对稀疏特征的无偏捕捉。最终,所提架构和基线CasRel模型相比,其精确率、召回率和F1值分别提升了1.71%、6.09%和4.05%,达到了领域内的SOTA效果。In the task of relation extraction within the wind turbine fault domain, there are numerous challenges, such as the presence of domain-specific terminology, overlapping of single and multiple entities, multiple relations in a single sentence, and complex sentence structures. This paper proposes an improved CasRel-based binary tagging framework, CA-CasRel. Since the ERNIE model incorporates entity information and knowledge graph data during pre-training, this study replaces the BERT model with ERNIE for sentence encoding and embedding, enabling a deeper capture of entity-relation information in text sequences. Furthermore, after predicting the head entity, the head entity vector is encoded using a cosine attention mechanism. Cosine attention, with its direction sensitivity and enhanced semantic purity, efficiently captures essential semantic features and unbiasedly detects sparse features. Compared to the original CasRel model, the proposed architecture improves precision, recall, and F1 score by 1.71%, 6.09%, and 4.05%, respectively, achieving state-of-the-art performance in the domain.