dynamic detection of duplicate content in cloud using merkle hash tree

Aarthi. V,Chakravarthi A.S ,Karthik.K,Dilli Prasad.K

Published in International Journal of Advanced Research in Computer Science Engineering and Information Technology

ISSN: 2321-3337          Impact Factor:1.521         Volume:6         Issue:3         Year: 21 April,2024         Pages:1838-1843

International Journal of Advanced Research in Computer Science Engineering and Information Technology

Abstract

We all know that Cloud computing is being used all the companies for effective data storage and retrieval system. Cloud has to be so scalable to handle the load from the users. But the storage of cloud would become a problem as number of users and data storage will be infinite. Mainly duplication of files would be the main issue. There is no process to filter and remove the same files from the cloud server in order to avoid the storage issues. In our project, we are removing the duplicate files from the server through Merkle hash Tree Algorithm and allows to store only the Non duplicate files in the cloud server. This will surely avoid Deduplication of files in Cloud server.

Kewords

Data storage, Duplicate files, Storage issues, Merkle hash Tree Algorithm, Deduplication, Cloud server, Data retrieval, Effective system.

Reference

[1] B. Zhu, K. Li, and R. H. Patterson, “Avoiding the disk bottleneck in the data domain deduplication file system,” in Proc. 6th USENIXConf. File Storage Technol., 2008, pp. 1–14. [2] A. El-Shimi, R. Kalach, A. Kumar, A. Ottean, J. Li, and S. Sengupta, “Primary data deduplicationlarge scale study and system design,” in Proc. USENIX Annu. Tech. Conf., 2012, pp. 285– 296. [3] D. T. Meyer and W. J. Bolosky, “A study of practical deduplication,” ACM Trans. Storage, vol. 7, no. 14, pp. 1–20, 2012. [4] K. Srinivasan, T. Bisson, G. R. Goodson, and K. Voruganti, “iDedup: Latency-aware, inline data deduplication for primary storage,” in Proc. 11th USENIX Conf. File Storage Technol., 2012, pp. 1–14. [5] B. Mao, H. Jiang, S. Wu, and L. Tian, “POD: Performance oriented I/O deduplication for primary storage systems in the cloud,” in Proc. IEEE 28th Int. Parallel Distrib. Process. Symp., 2014, pp. 767–776. [6] A. Wildani, E. L. Miller, and O. Rodeh, “Hands: A heuristically arranged non-backup in-line deduplication system,” in Proc. IEEE 29th Int. Conf. Data Eng., 2013, pp. 446–457. [7] J. An and D. Shin, “Offline deduplication-aware block separation for solid state disk,” in Proc. 11th USENIX Conf. File Storage Technol., 2013, pp. 1– 2. [8] C. Constantinescu, J. Glider, and D. Chambliss, “Mixing deduplication and compression on active data sets,” in Proc. Data Compression Conf., 2011, pp. 393– 402. [9] V. Tarasov, et al., “Dmdedup: Device mapper target for data deduplication,” in Proc. Ottawa Linux Symp., 2014, pp. 83– 95. [10] H. Yu, X. Zhang, W. Huang, and W. Zheng, “PDFS: Partially dedupped file system for primary workloads,” IEEE Trans. Parallel Distrib. Syst., vol. 28, no. 3, pp. 863–876, Mar. 2017. [11] J. Kaiser, T. S€uß, L. Nagel, and A. Brinkmann, “Sorted deduplication: How to process thousands of backup streams,” in Proc. 32nd Symp. Mass Storage Syst. Technol., 2016, pp. 1–14. [12] S. Jiang and X. Zhang, “LIRS: An efficient low inter-reference recency set replacement policy to improve buffer cache performance,” ACM SIGMETRICS Perform. Eval. Rev., vol. 30, no. 1, pp. 31– 42, 2002. [13] N. Megiddo and D. S. Modha, “ARC: A self-tuning, low overhead replacement cache,” in Proc. 2nd USENIX Conf. File Storage Technol., 2003, vol. 3, pp. 115– 130.