Cost-efficient dynamic scheduling of big data applications in apache spark on cloud
Cost-efficient dynamic scheduling of big data applications in apache spark on cloud
dc.contributor.author | Islam, Muhammed Tawfiqul | |
dc.contributor.author | Srirama, Satish Narayana | |
dc.contributor.author | Karunasekera, Shanika | |
dc.contributor.author | Buyya, Rajkumar | |
dc.date.accessioned | 2022-03-27T00:16:10Z | |
dc.date.available | 2022-03-27T00:16:10Z | |
dc.date.issued | 2020-04-01 | |
dc.description.abstract | Job scheduling is one of the most crucial components in managing resources, and efficient execution of big data applications. Specifically, scheduling jobs in a cloud-deployed cluster are challenging as the cloud offers different types of Virtual Machines (VMs) and jobs can be heterogeneous. The default big data processing framework schedulers fail to reduce the cost of VM usages in the cloud environment while satisfying the performance constraints of each job. The existing works in cluster scheduling mainly focus on improving job performance and do not leverage from VM types on the cloud to reduce cost. In this paper, we propose efficient scheduling algorithms that reduce the cost of resource usage in a cloud-deployed Apache Spark cluster. In addition, the proposed algorithms can also prioritise jobs based on their given deadlines. Besides, the proposed scheduling algorithms are online and adaptive to cluster changes. We have also implemented the proposed algorithms on top of Apache Mesos. Furthermore, we have performed extensive experiments on real datasets and compared to the existing schedulers to showcase the superiority of our proposed algorithms. The results indicate that our algorithms can reduce resource usage cost up to 34% under different workloads and improve job performance. | |
dc.identifier.citation | Journal of Systems and Software. v.162 | |
dc.identifier.issn | 01641212 | |
dc.identifier.uri | 10.1016/j.jss.2019.110515 | |
dc.identifier.uri | https://www.sciencedirect.com/science/article/abs/pii/S0164121219302894 | |
dc.identifier.uri | https://dspace.uohyd.ac.in/handle/1/3087 | |
dc.subject | Apache spark | |
dc.subject | Cloud | |
dc.subject | Cost-efficiency | |
dc.subject | Scheduling | |
dc.title | Cost-efficient dynamic scheduling of big data applications in apache spark on cloud | |
dc.type | Journal. Article | |
dspace.entity.type |
Files
License bundle
1 - 1 of 1