mining signature of heterogeneous event sequences using beta divergence to develop highly robust data

D.Gayathri,F.Evelin Rosy,D.Vani

Published in International Journal of Advanced Research in Computer Science Engineering and Information Technology

ISSN: 2321-3337          Impact Factor:1.521         Volume:2         Issue:2         Year: 08 March,2014         Pages:84-91

International Journal of Advanced Research in Computer Science Engineering and Information Technology

Abstract

Large collections of electronic clinical records today provide us with a vast source of information on edical practice. However, the utilization of those data for exploratory analysis to support clinical decisions is still limited. Extrmacting useful patterns from such data is particularly challenging because it is longitudinal, sparse and heterogeneous. In this paper, we propose a Nonnegative Matrix Factorization based framework using a convolutional approach for open-ended temporal pattern discovery over large collections of clinical records. We call the method One-Sided Convolutional NMF. Our framework can mine common as well as individual shift-invariant temporal patterns from heterogeneous events over different patient groups, and handle sparsity as well as scalability problems well. Furthermore, we use an event matrix based representation that can encode quantitatively all key temporal concepts including order, concurrency and synchronicity. We derive efficient multiplicative update rules for OSC-NMF, and also prove theoretically its convergence. Finally, the experimental results on both synthetic and real world electronic patient data are presented to demonstrate the effectiveness of the proposed method

Kewords

Temporal pattern, Non negative matrix, Synthetic data

Reference

[1] B. Cao, D. Shen, J.T. Sun, X. Wang, Q. Yang, and Z. Chen, ―Detect and Track Latent Factors with Online Nonnegative Matrix Factorization,‖ Proc. 20th Int’l Joint Conf. Artificial Intelligence, pp. 2689-2694, 2007. [2] F.R.K. Chung, Spectral Graph Theory. Am. Math. Soc., 1997. [3] C. Ding, T. Li, and M.I. Jordan, ―Convex and Semi-Nonnegative Matrix Factorizations,‖ IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 32, no. 1, pp. 45-55, Jan. 2010. [4] M. Dong, ―A Tutorial on Nonlinear Time-Series Data Mining in Engineering Asset Health and Reliability Prediction: Concepts, Models, and Algorithms,‖ Math. Problems in Eng., vol. 2010, pp. 1-23, 2010. [5] J. Eggert and E. Korner, ―Sparse Coding and NMF,‖ Proc. IEEE Int’l Joint Conf. Neural Networks, vol. 2, pp. 2529-2533, 2004. [6] W. Fei, L. Ping, and K. Christian, ―Online Nonnegative Matrix Factorization for Document Clustering,‖ Proc. 11th SIAM Int’l Conf. Data Mining, 2011. [7] C. Fe´votte and J. Idier, Algorithms for Nonnegative Matrix Factorization with the Beta-Divergence, arXiv:1010.1763, 2010. [8] P.O. Hoyer, ―Non-Negative Matrix Factorization with Sparseness Constraints,‖ J. Machine Learning Research, vol. 5, pp. 1457-1469, 2004. [9] P.O. Hoyer, ―Non-Negative Sparse Coding,‖ Proc. 12th IEEE Workshop Neural Networks for Signal Processing, 2002. [10] Y.R. Ramesh Kumar and P.A. Govardhan, ―Stock Market Predictions—Integrating User Perception for Extracting Better Prediction a Framework,‖ Int’l J. Eng. Science, vol. 2, no. 7, pp. 3305-3310, 2010. [11] D.D. Lee and H.S. Seung, ―Learning the Parts of Objects by Non- Negative Matrix Factorization,‖ Nature, vol. 401, no. 6755, pp. 788-91, 1999. [12] J. Lin, E. Keogh, S. Lonardi, and B. Chiu, ―A Symbolic Representation of Time Series, with Implications for Streaming Algorithms,‖ Proc. Eighth ACM SIGMOD Workshop Research Issues in Data Mining and Knowledge Discovery, pp. 2-11, 2003. [13] J. Mairal, F. Bach Inria Willow Project-Team, and G. Sapiro, ―Online Learning for Matrix Factorization and Sparse Coding,‖ J. Machine Learning Research, vol. 11, pp. 19-60, 2010. [14] F. Moerchen, ―Time Series Knowledge Mining Fabian,‖ PhD thesis, 2006. [15] F. Moerchen and D. Fradkin, ―Robust Mining of Time Intervals with Semi-Interval Partial Order Patterns,‖ Proc. SIAM Conf. Data Mining, pp. 315-326, 2010.