data leakage prevention system by context based keyword matching and encrypted data detection

Soumya S R,Smitha E S

Published in International Journal of Advanced Research in Computer Science Engineering and Information Technology

ISSN: 2321-3337          Impact Factor:1.521         Volume:3         Issue:1         Year: 26 June,2014         Pages:375-384

International Journal of Advanced Research in Computer Science Engineering and Information Technology

Abstract

Data Leakage is an important concern for the business organizations in this increasingly networked world days. Unauthorized disclosure may have serious consequences for an organization in both long term and short term. Data leakage is enhanced by the fact that transmitted data (both inbound and outbound), including emails, instant messaging, website forms, and file transfers among others, are largely unregulated and un monitored on their way to their destinations. The objective of this paper is to enhance the security of Data Leakage Prevention(DLP) system by finding documents containing confidential information even when most of the document consists of non-confidential content by context based keyword matching method and finding encrypted information in word documents by using Entropy method. The combined approach will help to enhance the security of the DLP system efficiently by detecting sensitive through text document or word document that contain encrypted information.

Kewords

Data Leakage Prevention, Context, cluster graph, Entropy method.

Reference

[1] A. Shabtai, Y. Elovici, et al, A Survey of Data Leakage Detection and Prevention Solutions, Springer, 2012. [2] “Data leak prevention,” Information Systems Audit and Control Association, Tech. Rep., 2010. [3] W.W. Cohen, Learning rules that classify e-mail, in: Proceedings of the AAAI Spring Symposium on Machine Learning in Information Access, 1996, pp. 18–25.. [4] H. Drucker, D. Wu, et al, Support vector machines for spam categorization, IEEE Transactions on Neural Networks, vol. 10, no. 5, September 1999 [5] I. Androutsopoulos, J. Koutsias, et al, An experimental comparison of naive Bayesian and keyword-based anti-spam filtering with personal e-mail messages, in: Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, ACM, Athens, Greece, 2000, pp. 160–167. [6] José María Gómez-Hidalgo, José Miguel Martín-Abreu, Javier Nieves, Igor Santos, Felix Brezo, Pablo G. Bringas, "Data Leak Prevention through Named Entity Recognition," socialcom, pp.1129-1134, IEEE Second International Conference on Social Computing, 2010. [7] Zilberman, P., Katz, G., Elovici, Y., Shabtai, A., and Dolev, S., “Analyzing Group Communication for Preventing Data Leakage via Email”, In Proceedings of the IEEE Intelligence and Security Informatics (ISI 2011), Beijing, China, July 10-12, 2011. [8] Amir Harel, Asaf Shabtai, Lior Rokach, and Yuval Elovici". M-Score: A Misuseability Weight Measure", IEEE Transaction on Dependable and Secure Computing, vol. 9, no. 3, may/june 2012. [9 ] G. Katz et al., CoBAn: A context based model for data leakage prevention, Information Science, 2013, Elsevier. [10] Filiol, E.: A new statistical testing for symmetric ciphers and hash functions. In: Deng, R., Bao, F., Zhou, J., Qing, S. (eds.) 4th International Conference on Information and Communications Security (ICICS 2002). LNCS, vol. 2513, pp. 342–353.Springer, Heidelberg (2002) [11] Katos, V.: A randomness test for block ciphers. Applied Mathematics and Computation 162, 29–35 (2005) [12] Mohammed M. Alani, Testing Randomness in Ciphertext of Block-Ciphers Using DieHard Tests, International Journal of Computer Science and Network Security IJCSNS), Vol.10, No.4, April 2010, pp. 53-57. [13] Philip Penrose, Richard Macfarlane, William J. Buchanan"Approaches to the classification of high entropy file fragments",Digital Investigation ,372–384, 2013 Elsevier.