job scheduling based on size to hadoop

M MD Shahbaz Hussain,R Vidhya

Published in International Journal of Advanced Research in Computer Science Engineering and Information Technology

ISSN: 2321-3337          Impact Factor:1.521         Volume:6         Issue:3         Year: 20 December,2016         Pages:1169-1177

International Journal of Advanced Research in Computer Science Engineering and Information Technology

Abstract

Job scheduling based on size with aging has been recognized as an effective approach to guarantee near optimal system response times. HFSP scheduler introducing this technique to a real, multi-server, complex and widely used system such as Hadoop. Job scheduling according to size requires a priori job size information, which is not available in Hadoop and estimates it on-line during job execution. Size based scheduling in HFSP adopts the idea of giving priority to small jobs that they will not be slowed down by large ones. HFSP is a size based and preemptive scheduler for Hadoop. HFSP is largely fault tolerant and tolerant to job size estimation errors. Here Scheduling decisions use the concept of virtual time and cluster resources are focused on jobs according to their priority, computed through aging. This protocol never faces Starvation Problem for small and large jobs.

Kewords

MapReduce, Performance, Data Analysis, Scheduling, Master Slave, SRPT, FCFS, Process Sharing.

Reference

1. Apache, “Hadoop: Open source implementation of MapReduce,” http: //hadoop.apache.org/. 2. J. Dean and S. Ghemawat, “MapReduce: Simplified data processing on large clusters,” in Proc. of USENIX OSDI, 2004. 3. Apache, “Spark,” http://spark.apache.org/. 4. M. Zaharia, M. Chowdhury, T. Das, A. Dave, J. Ma, M. McCauley, M. J. Franklin, S. Shenker, and I. Stoica, “Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing,” in Proceedings of the 9th USENIX Conference on Networked Systems Design and Implementation, 2012, pp. 2–2. 5. Microsoft, “The naiad system,” https://github.com/ MicrosoftResearchSVC/naiad. 6. D. G. Murray, F. McSherry, R. Isaacs, M. Isard, P. Barham, and M. Abadi, “Naiad: A timely dataflow system,” in Proceedings of the 24th ACM Symposium on Operating Systems Principles, 2013, pp. 439– 455. 7. Y. Chen, S. Alspaugh, and R. Katz, “Interactive query processing in big data systems: A cross-industry study of MapReduce workloads,” in Proc. of VLDB, 2012. 8. K. Ren et al., “Hadoop’s adolescence: An analysis of Hadoop usage in scientific workloads,” in Proc. of VLDB, 2013. 9. G. Ananthanarayanan, A. Ghodsi, S. Shenker, and I. Stoica, “Effective straggler mitigation: Attack of the clones.” in NSDI, vol. 13, 2013. 10. Apache, “Oozie Workflow Scheduler,” http://oozie.apache.org/. 11. Katarina Grolinger, Michael Hayes, Wilson A.Higashino, “Challenges for MapReduce in Big Data”. 12. Matei Zaharia, Mosharaf Chowdary, Tathagata Das, Ankur, Scott Shenker, “Resilent Disturned Datasets: A Fault-Tolerant , In-Memory Cluster Computing”. 13. Kai Ren, YongChul Kwon “Hadoop’s Adolscence” 14. Ganesh Ananthanarayanan, Ali Ghodsil, Scott shenker , Ion Stoical “Effective Straggler Mitigation : Attack of Clones” 15. Jeffrey Dean and Sanjay Ghemawat “MapReduce: Simplified Data Processing on Large Clusters”