This paper presents a new approach to efficiently discovering correlations among data items on a sequence of incoming data windows. The approach enables both on-line (e.g., mining only the most recent data) and off-line (e.g., analyzing aggregate data windows) queries, besides supporting user-defined item and support constraints. Given a sequence of transactional data windows and a minimum support threshold, for each of the most recent data windows a projection is compactly stored in main-memory, including all items that have been frequently observed in the last windows. Users can easily perform constrained itemset extraction either from a single data window or from multiple ones. A summary of interesting itemsets mined from all available data is generated on a regular basis and compactly stored in a persistent data structure, to efficiently support further analysis (e.g., investigate only a selected past data window). Experimental results obtained on both real and synthetic data streams show the effectiveness and the efficiency of the proposed approach in mining interesting itemsets by means of both on-line and off-line queries.

An Efficient Itemset Mining Approach for Data Streams / Baralis, ELENA MARIA; Cerquitelli, Tania; Chiusano, SILVIA ANNA; Grand, Alberto; Grimaudo, Luigi. - STAMPA. - 6882:(2011), pp. 515-523. (Intervento presentato al convegno 15th International Conference on Knowledge-Based and Intelligent Information and Engineering Systems, KES 2011 tenutosi a Kaiserslautern (DE) nel September 12- 14, 2011) [10.1007/978-3-642-23863-5_53].

An Efficient Itemset Mining Approach for Data Streams

BARALIS, ELENA MARIA;CERQUITELLI, TANIA;CHIUSANO, SILVIA ANNA;GRAND, ALBERTO;GRIMAUDO, LUIGI
2011

Abstract

This paper presents a new approach to efficiently discovering correlations among data items on a sequence of incoming data windows. The approach enables both on-line (e.g., mining only the most recent data) and off-line (e.g., analyzing aggregate data windows) queries, besides supporting user-defined item and support constraints. Given a sequence of transactional data windows and a minimum support threshold, for each of the most recent data windows a projection is compactly stored in main-memory, including all items that have been frequently observed in the last windows. Users can easily perform constrained itemset extraction either from a single data window or from multiple ones. A summary of interesting itemsets mined from all available data is generated on a regular basis and compactly stored in a persistent data structure, to efficiently support further analysis (e.g., investigate only a selected past data window). Experimental results obtained on both real and synthetic data streams show the effectiveness and the efficiency of the proposed approach in mining interesting itemsets by means of both on-line and off-line queries.
2011
9783642238628
File in questo prodotto:
Non ci sono file associati a questo prodotto.
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11583/2460919
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo