NEWT - A resilient BSP framework for Iterative algorithms on hadoop YARN
NEWT - A resilient BSP framework for Iterative algorithms on hadoop YARN
No Thumbnail Available
Date
2014-09-18
Authors
Kromonov, Ilja
Jakovits, Pelle
Srirama, Satish Narayana
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
The importance of fault tolerance for parallel computing is ever increasing. The mean time between failures (MTBF) is predicted to decrease significantly for future highly parallel systems. At the same time, the current trend to use commodity hardware to reduce the cost of clusters puts pressure on users to ensure fault tolerance of their applications. Cloud-based resources are one of the environments where the latter holds true. When it comes to embarrassingly parallel data-intensive algorithms, MapReduce has gone a long way in ensuring users can easily utilize these resources without the fear of losing work. However, this does not apply to iterative communication-intensive algorithms common in the scientific computing domain. In this work we propose a new programming model inspired by Bulk Synchronous Parallel (BSP), for creating a new fault tolerant distributed computing framework. We strive to retain the advantages that MapReduce provides, yet efficiently support a larger assortment of algorithms, such as the aforementioned iterative ones. The model adopts an approach similar to continuation passing for implementing parallel algorithms and facilitates fault tolerance inherent in the BSP program structure. Based on the model we created a distributed computing framework - NEWT, which we describe and use to validate the approach.
Description
Keywords
Bulk Synchronous Parallel,
cloud computing,
fault tolerance,
Hadoop YARN,
iterative algorithms
Citation
Proceedings of the 2014 International Conference on High Performance Computing and Simulation, HPCS 2014