clustering through intelligent crawling for web based mechanism

Mohammed Aarif A,A.Deepak Kumar,G.Saranya

Published in International Journal of Advanced Research in Computer Science Engineering and Information Technology

ISSN: 2321-3337          Impact Factor:1.521         Volume:4         Issue:3         Year: 07 May,2016         Pages:1046-1055

International Journal of Advanced Research in Computer Science Engineering and Information Technology

Abstract

Web forum is one of the important data sources for many of the web applications. Because of the complex in site link structure, forum crawling is one of the challenging tasks. Without carefully selecting or checking the traversal path, a generic crawler usually downloads, that duplicates and makes the forum page invalid, and thus it wastes both the precious bandwidth and the storage space which are the major drawbacks of the typical crawlers. Thus the proposal includes an automatic approach to explore an acceptable links traversal strategy to direct the crawling of a given target forum, that helps in crawling the forum information more effectively. In this strategy the skeleton links and page flipping links are identified. The Skeleton links instruct the crawler, only to crawl the valuable pages and meanwhile this avoids the duplication and uninformative pages. This additionally uses the page flipping links that helps the crawler to fully transfer a protracted discussion thread that is typically shown in multiple pages in web forums. By using the revealed traversal strategy, informative pages are archived by the forum crawler which is highly efficient when compared with earlier related work and a commercial generic crawler. The frequency of updating that takes place in the system is not specified which is a major drawback in the existing paper. Duplicity of crawling is greatly avoided, which incorporates towards focused crawling. Manipulating the relativeness and clearance of data is provided which is one of the major research areas for the developers

Kewords

crawler, EIT, ITF, FoCUS

Reference

1. Mingming Li, Chunlin Li, Chao Wu and Youlong Luo A Focused Crawler URL Analysis Algorithm based on Semantic Content and Link Clustering in Cloud Environment International Journal of Grid Distribution Computing Vol.8, No.2 2015, pp.49 60 http:dx.doi.org 10.14257 ijgdc.2015.8.2.06 2. M.Nikhil, Mrs. A.Phani Sheetal Focus Accustom To Crawl Web Based Forums International Journal for Research in Applied Science Engineering Technology IJRASET Volume 3 Issue V, May 2015 ISSN 2321 9653 3.Yugandhara Patil, Sonal Patil Review of Web Crawlers with Specification and Working International Journal of Advanced Research in Computer and Communication Engineering Vol. 5, Issue 1, January 2016