A novel approach for mining patterns from large uncertain data using MapReduce model
A novel approach for mining patterns from large uncertain data using MapReduce model
No Thumbnail Available
Date
2017-11-21
Authors
Rathan, B. Rini
Rani, K. Swarupa
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Frequent pattern mining discovers associations among different items in large sets of data. In many real-world applications, the presence of an object or a characteristic cannot be given exactly all the time. Instead, they can be better expressed in terms of probability and such data is called uncertain data. Mining frequent patterns from uncertain data is challenging due to presence of existential probabilities. With this scenario, researchers are focusing on mining frequent patterns from uncertain data. Leung et al. proposed a few algorithms like UF-Growth, PUF-Growth for pattern mining from uncertain data. These algorithms mine patterns in a sequential manner. They may not be the efficient solutions when dealing with huge amounts of data. Some other algorithms were proposed which can mine patterns in a parallel and distributed environment. But it has the overhead of data distribution, parallelization etc. All such overheads are internally taken care in MapReduce framework. In MR-Growth algorithm, data is stored in the form of UF-Tree. But when the same item has many different probabilities, the size of UF-Tree becomes large, which may effect the overall efficiency. In this paper, in order to overcome this limitation, we have modified and extended the works of Leung et al. [3] in order to represent the data in compact tree structure for mining uncertain data. The functionality and utility of the proposed MR-PUFGrowth algorithm has been demonstrated and also experimented with different kinds of benchmark datasets like mushroom, connect, retail, T10I4D100K.
Description
Keywords
MapReduce,
PUF-Tree,
Uncertain Data
Citation
2017 International Conference on Computer Communication and Informatics, ICCCI 2017