fast and improved feature subset selection algorithm based clustering for high dimensional data

K. Vijayalakshmi,S. Anithaa,B.Raghu

Published in International Journal of Advanced Research in Computer Science Engineering and Information Technology

ISSN: 2321-3337          Impact Factor:1.521         Volume:2         Issue:3         Year: 08 April,2014         Pages:154-163

International Journal of Advanced Research in Computer Science Engineering and Information Technology

Abstract

The Clustering is a method of grouping the information into modules or clusters. Their dimensionality increases usually with a tiny number of dimensions that are significant to definite clusters, but data in the unrelated dimensions may produce much noise and wrap the actual clusters to be exposed. Attribute subset selection method is frequently used for data reduction through removing unrelated and redundant dimensions (or attribute). The Ant colony optimization technique is used for solving computational problems which can be reduced to find good path during graphs in the minimum spanning tree problem and traveling salesman problem. In my paper the feature subset selection algorithm and Ant colony optimization algorithm are employed to improve the feature subset selection. The proposed method helps us in improved feature subset selection algorithm based on hierarchical cluster and it’s minimizes redundant data set and improves the attribute subset accuracy.

Kewords

Hierarchical clustering based algorithm-filter technique,Graph-based cluster

Reference

[1] Pasi Franti.,Fast Agglomerative clustering using a K-Nearst Neighbor Graph,IEEE vol 28,November 2006. [2] Qinbao song .,FAST-cluster based Feature subset selection Algorithm for High Dimensional Data.IEEE 2008. [3] Grigorious F.Tzortzis, The Global Kernel K-Means Algorithm for clustering in feature space,IEEE Transactions on neural networks,vol 20,No.7,July 2009. [4] Ujjwal maulik .,Integrating clustering and Supervised Learning for Categorical Data Analysis, IEEE Transactions onSystem man ,cybernetics ,vol,40,July 2010. [5] Rahmat widia Sembiring., Clustering High Dimensional Data Using Subspace and Projected clustering Algorithms ,IJCSIT Vol.2,No.4,August 2010. [6] Bell D.A. and Wang, H., A formalism for relevance and its application in feature subset selection, Machine Learning, 41(2), pp 175-195, 2000. [7] Chanda P., Cho Y., Zhang A. and Ramanathan M.,Mining of Attribute Interactions Using Information Theoretic Metrics, In Proceedings of IEEE international Conference on Data Mining Workshops, pp 350-355, 2009. [8] Cardie, C., Using decision trees to improve casebased learning, In Pro-ceedings of Tenth International Conference on Machine Learning, pp 25-32, 1993. [9] Z. Pawlak, “Rough Sets”, International Journal of Computer and Information Sciences, Vol.11, No.5, pp. 341-356, 1982. [10] Z.Pawlak, “Rough Sets: Theoritical Aspects and Reasoning about Data”, Kluwer Academic Publishers, Dordrecht, 1991. [11] J. F. Peters and A. Skowron (eds.), “Transactions on Rough Sets 1”, Springer-Verlag,Berlin,2004.