natural language processing – validating the keyphrases using wikipedia

G.VIJAY,V.VINOTHA

Published in International Journal of Advanced Research in Computer Science Engineering and Information Technology

ISSN: 2321-3337          Impact Factor:1.521         Volume:3         Issue:2         Year: 25 August,2014         Pages:375-381

International Journal of Advanced Research in Computer Science Engineering and Information Technology

Abstract

The main objective of the work is to access the vast repository of information that is, Wikipedia’s structure and its content using an open source toolkit named as Wikipedia miner. Wikipedia content is a promising resource for natural language processing and many other research areas. An automate process is designed here to validate the available list of key phrases from different domains using this toolkit in two different steps explained below. KEA (Key phrase Extraction algorithm) is an algorithm for extracting key phrases from text documents. Different combinations of Key phrases are extracted here from a part of the process using KEA. It first cleans the input text, then identifies phrases, and finally stems it. The identified phrases from KEA are then validated by searching it in Wikipedia’s content structure. If the phrase is found in the Wikipedia content, then it will be a valid key phrase. It will be useful for developing various applications like thesaurus creation, domain-specific indexing and searching etc.

Kewords

NLPTOOLKIT

Reference

[1] Paul Buitelaar, Philipp Cimiano, and Bernado Magnini, (2005) Ontology Learning from Text: Methods, Evaluation and Applications (DFKI Saarbrucken, University of Karlsruhe, and ITC-irst) [2] Ian H. Witten, Gordon W. Paynter, Eibe Frank, (2000) Practical Automatic Keyphrase Extraction Algorithm, Dept of Computer Science, University of Waikato, Hamilton, New Zealand. [3]Takashi Tomokiyo and Matthew Hurst, (2003) Applied Research Center Intelliseek, Inc. Pittsburgh, PA 15213. [4]Milne, D. and Witten, I.H. (2008) Learning to link with Wikipedia, proceedings of the 17th ACM conference on Information and knowledge management (2008). Department of Computer Science, University of Waikato, Hamilton, New Zealand. [5]Medelyan, O. and Milne, D. (2008) Augmenting domain-specific thesauri with knowledge from Wikipedia. In Proceedings of the NZ Computer Science Research Student Conference, NZ CSRSC-2008 [6]Ian H. Witten David Milne, and David M. Nichols. (2007) A Knowledge-Based Search Engine Powered by Wikipedia. In Proceedings of the ACM Conference on Information and Knowledge Management (CIKM'2007), Lisbon [7] Hassan, S. and Mihalcea, R. (2008) Cross-lingual Semantic Relatedness Using Encyclopedic Knowledge. In Proceedings of EMNLP 2009, 1192-1201