NEWT - A fault tolerant BSP framework on Hadoop YARN
NEWT - A fault tolerant BSP framework on Hadoop YARN
No Thumbnail Available
Date
2013-01-01
Authors
Kromonov, Ilja
Jakovits, Pelle
Srirama, Satish Narayana
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
The importance of fault tolerance for the parallel computing field is ever increasing, as the mean time between failures is predicted to decrease significantly for future highly parallel systems. The current trend of using commodity hardware to reduce the cost of clusters forces users to ensure that their applications are fault tolerant. When it comes to embarrassingly parallel data-intensive algorithms, MapReduce has gone a long way in simplifying the creation of such applications. However, this does not apply to iterative communication-intensive algorithms common in the scientific computing domain. In this work we propose a new programming model inspired by Bulk Synchronous Parallel (BSP) for creating new a fault tolerant distributed computing framework. We strive to retain the advantages that MapReduce provides, yet efficiently support a larger assortment of algorithms, such as the aforementioned iterative ones. © 2013 IEEE.
Description
Keywords
BSP,
Distributed computing,
fault tolerance,
MapReduce,
MPI,
parallel computing
Citation
Proceedings - 2013 IEEE/ACM 6th International Conference on Utility and Cloud Computing, UCC 2013