a survey of web content mining tools and future aspects

C.Menaka,N.Nagadeepa

Published in International Journal of Advanced Research in Computer Science Engineering and Information Technology

ISSN: 2321-3337          Impact Factor:1.521         Volume:3         Issue:1         Year: 13 August,2014         Pages:375-385

International Journal of Advanced Research in Computer Science Engineering and Information Technology

Abstract

As the data on the web is continuously increasing day by day so, web mining become necessary to draw an inference from the huge data available on web. In web mining non trivial pattern and useful information are retrieved from the web data. Web mining consists of three types namely Web usage mining, Web content mining and Web structure mining.Today, they are several billions of HTML documents, pictures and another multimedia files available on the Internet. There is a need of methods to help us extract information from the content of web pages. One answer to this problem is using the data mining techniques that is known as web content mining, which is defined as “the process of extracting useful information from the text, images and other forms of content that make up the pages”. Web mining implies the application of data mining techniques to extract knowledge from Web content, structure, and usage - is the collection of technologies to fulfill this potential. Interest in Web mining has grown rapidly in its short existence, both in the research and practitioner communities. This paper provides a brief overview of web mining and various content mining tools and the accomplishments of the field - both in terms of technologies and applications.

Kewords

Web mining, Web content mining,Types of web mining, Content mining tools,.

Reference

[1] Robert Cooley, Bamshad Mobasher, Jaideep Srivastava , “Web Mining: information and Pattern Discovery on the WWW” [2] Mary Garvin , “Data Mining and the Web: What They Can Do Together” [3] Han J Kamber M, “Data Mining : concepts and Techniques” , Second Edition Morgan Kaufmann publishers .2006 [4]Lieu, B., Web Data Mining Exploring Hyperlinks, Contents, and Usage Data (Springer-Verlag, Berlin, Heidelberg 2007). [5] M. Zdravko, T.L. Daniel,, Data mining the Web : Uncovering patterns in Web content, structure & usage (WileyInterscience Publication, 2007). [6] J. Srivastava , P. Desikan , V. Kumar, “Web Mining – Concepts, Applications and Research Directions”, Studies in Fuzziness and Soft Computing, Volume 180, pp. 275–307, (2005). [7] B. Masand, M. Spiliopoulou, J. Srivastava, O. Zaiane, ed. Proceedings of “WebKDD2002 –Web Mining for Usage Patterns and User Profiles”, Edmonton, CA, 2002. [8] M. Spiliopoulou, “Data Mining for the Web”, Proceedings of the Symposium on Principles of Knowledge Discovery in Databases (PKDD), 1999. [9] Screen-scraper, http://www.screen-scraper.com Viewed 19 February 2013. [10] Automation Anywhere Manual. AA, http://www.automationanywhere.com Viewed 06 February 2013. [11] Mozenda, http://www.mozenda.com/web-mining-software Viewed 18 February 2013. [12] Web Content Extractor help. WCE, http://www.newprosoft.com/web-content-extractor.htm Viewed 18 February 2013. [13] Raymond Kosala, Hendrik Blockee, "Web Mining Research : A Survey", ACM Sigkdd Explorations Newsletter, June 2000, Volume 2. [14] Magdalini Eirinaki “Web Mining : A Roadmap” Http : //WWW.engr.sjsu.edu/meirinaki/papers/NEIS.pdf [15] Qingyu Zhang & Richard S. Segall, “Web Mining: A Survey of Current Research”, Information Technology and Decision Making, 7(4), 683- 720, 2008. [16] Zhang, Q., Segall, R.S., Web Mining: A Survey of Current Research, Techniques, and Software, International Journal of Information Technology & Decision Making. Vol.7, No. 4, pp. 683-720. World Scientific Publishing Company (2008). [17] Pol, K., Patil, N., Patankar, S. and Das, C. 2008. A Survey on Web Content Mining and extraction of Structured and Semi structured Data. IEEE First International Conference on Emerging. [18] Nimgaonkar, S. and Duppala, S. 2012. A Survey on Web ContentMining and extraction of Structured and Semi structured data, IJCA Journal [19] Kosla, R. and Blockeel, H. 2000. Web Mining Research: A Survey. SIG KDD Explorations; Vol. 2, 1-15. [20] Faustina Johnson and Santosh Kumar Gupta Web Content Mining Techniques: A Survey. International Journal of Computer Applications (0975 – 888) Volume 47– No.11, June 2012