Published in International Journal of Advanced Research in Computer Science Engineering and Information Technology
ISSN: 2321-3337 Impact Factor:1.521 Volume:4 Issue:1 Year: 09 April,2015 Pages:391-404
Empirical forum mining is the online discussion board where user can request and exchange information. The forum contains the lot of data and information can be handle by using threads in the forum crawler. Crawler is nothing but the linking between the pages which we traverse during our searching of any content. There are many existing system which provide the facility to get content but there is a problem in getting the appropriate data. But by using our proposed system we have to obtain the most appropriate answer to the posted question out of thousand responses. We have to avoid wasting of resources from some responses which may not yield desired result. Here in this system we have three pages as index page, entry page and thread page. In entry page we can ask or see the question, in index page there is information on URL pointing to the board. Thread page contains the post to the question. We are using index/thread URL detection, page flipping URL detection and entry URL discovery algorithms to do this. After this FAQ generation technique is applied. In this we are mining the most frequently asked questions and finding the most suitable answer to the question by reducing uninformative data. The conversion of similar URL into the regular expression is done. This is known as index thread flipping (ITF). The clustering of data is done with the help of k-means algorithm
threads, crawler, entry page, index page, thread page, summarization, FAQ
[1] J. JIANG, X. SONG, AND N. YU, “FOCUS: LEARNING TO CRAWL WEB FORUMS”, IEEE TRANS. Knowledge and Data Engg, pp. 1293-1306, 2013. [2] “Web Forum Crawling Techniques”, by Namrata H.S Bamrah, B.S Satpute, Pramod Patil, International journal of computer Applications, No 17, Vol 85, January 2014. [3] “iRobot: An Intelligent Crawler for Web Forums” by R. Cai, J.-M. Yang, W. Lai, Y. Wang, and L. Zhang, Proc. 17th Int‟l Conf. World Wide Web, pp. 447-456, 2008. [4] “Learning URL Patterns for Webpage De-Duplication”, by H.S. Koppula, K.P. Leela, A. Agarwal, K.P. Chitrapura, S. Garg and A. Sasturkar, Proc. Third ACM Conf. WebSearch and Data Mining, pp. 381-390, 2010. [5] A. Moreo, E.M. Eisman, J.L. Castro, J.M. Zurita. Learning regular expressions totemplate-based FAQ retrieval systems, Knowledge-Based Systems, (2013). [6] J. Mao, J. Zhu. FAQ Auto Constructing Based on Clustering, in: Computer Science and Electronics Engineering (ICCSEE), 2012 International Conference on: IEEE, 2012 [7] Gao,C.Lin,C-Y. andSong,Y-I. Wang,L.(2008) , “Finding Question- Answer Pairsfrom Online Forums”, Proc. 31st Ann. Intl ACM SIGIR Conf. Research and Development in Information Retrieval. [8] “Automatic Extraction of Web Data Records Containing User-Generated Content ”,IEEE Trans. Knowledge and Data Engg, pp. 1293-1306, 2013. [9] “Web Crawler”, by Raja Iswary, KeshabNath, October 2013, International Journalof Advanced Research in Computer and Communication Engineering, Vol. 2 [10]G. S. Manku, A. Jain, and A. D. Sarma. Detecting near duplicates for Web crawling. In Proc.of 16thWWW, pages 141-150, 2007. [11] W.-C. Hu, D.-F. Yu, H.C. Jiao. A FAQ Finding Process in Open Source Project Forums, Fifth International Conference on Software Engineering (2010). [12] Ran Vijay Singh and M.P.S Bhatia , “Data Clustering with Modified K-means Algorithm”, IEEE International Conference on Recent Trends in Information Technology, ICRTIT 2011, pp 717-721. [13]D. Napoleon and P. Ganga Lakshmi, “An Efficient k-Means Clustering Algorithm for Reducing Time Complexity using Uniform Distribution Data Points”, IEEE 2010. [14] Neha Aggarwal and KritiAggarwal, ”A Mid- point based k –mean Clustering Algorithm.For Data Mining”. International Journal on Computer Science and Engineering (IJCSE) 2012. [15]KoheiArai, AliRidhoBarakbah, Hierarchical K-means: an algorithm for centroids initialization for k-means.