Twitter data has recently been considered to perform a large variety of advanced analysis. Analysis ofTwitter data imposes new challenges because the data distribution is intrinsically sparse, due to a large number of messages post every day by using a wide vocabulary. Aimed at addressing this issue, generalized itemsets – sets of items at different abstraction levels – can be effectively mined and used todiscover interesting multiple-level correlations among data supplied with taxonomies. Each generalizeditemset is characterized by a correlation type (positive, negative, or null) according to the strength of thecorrelation among its items.This paper presents a novel data mining approach to supporting different and interesting targetedanalysis – topic trend analysis, context-aware service profiling – by analyzing Twitter posts. We aim atdiscovering contrasting situations by means of generalized itemsets. Specifically, we focus on comparingitemsets discovered at different abstraction levels and we select large subsets of specific (descendant)itemsets that show correlation type changes with respect to their common ancestor. To this aim, a novelkind of pattern, namely the Strong Flipping Generalized Itemset (SFGI), is extracted from Twitter mes-sages and contextual information supplied with taxonomy hierarchies. Each SFGI consists of a frequentgeneralized itemset X and the set of its descendants showing a correlation type change with respect to X. Experiments performed on both real and synthetic datasets demonstrate the effectiveness of the pro-posed approach in discovering interesting and hidden knowledge from Twitter data.

Twitter data analysis by means of Strong Flipping Generalized Itemsets / Cagliero, Luca; Cerquitelli, Tania; Garza, Paolo; Grimaudo, Luigi. - In: THE JOURNAL OF SYSTEMS AND SOFTWARE. - ISSN 0164-1212. - 94:(2014), pp. 16-29. [10.1016/j.jss.2014.03.060]

Twitter data analysis by means of Strong Flipping Generalized Itemsets

CAGLIERO, LUCA;CERQUITELLI, TANIA;GARZA, PAOLO;GRIMAUDO, LUIGI
2014

Abstract

Twitter data has recently been considered to perform a large variety of advanced analysis. Analysis ofTwitter data imposes new challenges because the data distribution is intrinsically sparse, due to a large number of messages post every day by using a wide vocabulary. Aimed at addressing this issue, generalized itemsets – sets of items at different abstraction levels – can be effectively mined and used todiscover interesting multiple-level correlations among data supplied with taxonomies. Each generalizeditemset is characterized by a correlation type (positive, negative, or null) according to the strength of thecorrelation among its items.This paper presents a novel data mining approach to supporting different and interesting targetedanalysis – topic trend analysis, context-aware service profiling – by analyzing Twitter posts. We aim atdiscovering contrasting situations by means of generalized itemsets. Specifically, we focus on comparingitemsets discovered at different abstraction levels and we select large subsets of specific (descendant)itemsets that show correlation type changes with respect to their common ancestor. To this aim, a novelkind of pattern, namely the Strong Flipping Generalized Itemset (SFGI), is extracted from Twitter mes-sages and contextual information supplied with taxonomy hierarchies. Each SFGI consists of a frequentgeneralized itemset X and the set of its descendants showing a correlation type change with respect to X. Experiments performed on both real and synthetic datasets demonstrate the effectiveness of the pro-posed approach in discovering interesting and hidden knowledge from Twitter data.
File in questo prodotto:
File Dimensione Formato  
2543387_draft.pdf

accesso aperto

Tipologia: 1. Preprint / submitted version [pre- review]
Licenza: PUBBLICO - Tutti i diritti riservati
Dimensione 345.66 kB
Formato Adobe PDF
345.66 kB Adobe PDF Visualizza/Apri
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11583/2543387
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo