关键词:
anaphora resolution
natural language processing
machine learning
linguistic features
摘要:
Anaphora resolution is one of the problems in natural language processing. It is the process of disambiguating the antecedent of a referring expression from the set of entities in a discourse. The correct interpretation of pronouns plays an important role in the construction of meaning. Thus, the resolution of pronominal anaphors remains a very important task for many natural language processing applications. Additionally, it plays an increasingly significant role in computational linguistics. However, a significant amount of work on anaphora resolution is focused on English; anaphora resolution for other languages, including Arabic, is still limited. In this paper, we present a new set of computational and linguistic features to resolve Arabic anaphors using a machine learning approach. In this paper, an in-depth study was conducted on a set of computational and linguistic features to exploit their effectiveness and investigate their effect on anaphora resolution. The aim was to efficiently integrate different feature sets and classification algorithms to synthesize a more accurate classification procedure. Four well-known machine learning algorithms k-nearest neighbor, maximum entropy, decision tree and meta-classifier, were employed as base-classifiers for each of the feature sets. A wide range of comparative experiments on Quran datasets was conducted, the discussion presented, and conclusions were drawn. The experimental results show that our approach gives satisfactory results.