Itemset mining is a well-known exploratory technique used to discover interesting correlations hidden in a data collection. Since ever increasing amounts of data are being collected and stored (e.g., business transactions, medical and biological data, context-aware applications), scalable and efficient approaches are needed to analyzing these large data collections. This paper proposes a parallel disk-based approach to efficiently supporting frequent itemset mining on a multi-core processor. Our parallel strategy is presented in the context of the VLDB-Mine persistent data structure. Different techniques have been proposed to optimize both data- and compute-intensive aspects of the mining algorithm. Preliminary experiments, performed on both real and synthetic datasets, show promising results in improving the efficiency and scalability of the mining activity on large datasets.
P-Mine: Parallel itemset mining on large datasets / Baralis, ELENA MARIA; Cerquitelli, Tania; Chiusano, SILVIA ANNA; Grand, Alberto. - STAMPA. - 1:(2013), pp. 266-271. (Intervento presentato al convegno 2013 IEEE 29th International Conference on Data Engineering Workshops (ICDEW) tenutosi a Brisbane, Queensland (Australia) nel April 8-12, 2013) [10.1109/ICDEW.2013.6547461].
P-Mine: Parallel itemset mining on large datasets
BARALIS, ELENA MARIA;CERQUITELLI, TANIA;CHIUSANO, SILVIA ANNA;GRAND, ALBERTO
2013
Abstract
Itemset mining is a well-known exploratory technique used to discover interesting correlations hidden in a data collection. Since ever increasing amounts of data are being collected and stored (e.g., business transactions, medical and biological data, context-aware applications), scalable and efficient approaches are needed to analyzing these large data collections. This paper proposes a parallel disk-based approach to efficiently supporting frequent itemset mining on a multi-core processor. Our parallel strategy is presented in the context of the VLDB-Mine persistent data structure. Different techniques have been proposed to optimize both data- and compute-intensive aspects of the mining algorithm. Preliminary experiments, performed on both real and synthetic datasets, show promising results in improving the efficiency and scalability of the mining activity on large datasets.Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/11583/2518607
Attenzione
Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo