Term Extract
Automatic extracting natural language processing of terms is an important issue. The goal is to extract word collocations in the text. Term extraction can be used in machine translation, automatic indexing, information retrieval, information extraction, vocabulary knowledge base building and other fields. Currently, the research method at home and abroad of word collocation extraction is statistics-based. As a special word collocation, the extraction process of terminology generally has two steps:
1. Term Candidate Extraction; 2. Term Selection In Candidate
Usually based on internal bonding intensity of statistical calculating of the string to determine whether it is a candidate term. Common methods are frequency, mutual information, Dic formula and so on. Mutual information method performs the best in two words vocabulary extraction, and its F-measure is 57.82%. Term selection methods have frequency-based ranking selection method that is based on the frequency the candidate terms appear in the corpus sorting from greatest to least. In sequence select a certain number of candidate terms as the term results; in addition use morphology, syntactic information and semantic information of the term to select terms.
|