Twitter, currently the leading microblogging social network, has attracted a great body of research works. This paper proposes a data analysis framework to discover groups of similar twitter messages posted on a given event. By analyzing these groups, user emotions or thoughts that seem to be associated with specific events can be extracted, as well as aspects characterizing events according to user perception. To deal with the inherent sparseness of micro-messages, the proposed approach relies on a multiple-level strategy that allows clustering text data with a variable distribution. Clusters are then characterized through the most representative words appearing in their messages, and association rules are used to highlight correlations among these words. To measure the relevance of specific words for a given event, text data has been represented in the Vector Space Model using the TF-IDF weighting score. As a case study, two real Twitter datasets have been analysed.

Analysis of Twitter Data Using a Multiple-level Clustering Strategy / Baralis, ELENA MARIA; Cerquitelli, Tania; Chiusano, SILVIA ANNA; Grimaudo, Luigi; Xiao, Xin. - STAMPA. - 8216:(2013), pp. 13-24. ((Intervento presentato al convegno Third International Conference on Model and Data Engineering (MEDI 2013) tenutosi a Amantea (Italy) nel September 25-27, 2013 [10.1007/978-3-642-41366-7].

Analysis of Twitter Data Using a Multiple-level Clustering Strategy

BARALIS, ELENA MARIA;CERQUITELLI, TANIA;CHIUSANO, SILVIA ANNA;GRIMAUDO, LUIGI;XIAO, XIN
2013

Abstract

Twitter, currently the leading microblogging social network, has attracted a great body of research works. This paper proposes a data analysis framework to discover groups of similar twitter messages posted on a given event. By analyzing these groups, user emotions or thoughts that seem to be associated with specific events can be extracted, as well as aspects characterizing events according to user perception. To deal with the inherent sparseness of micro-messages, the proposed approach relies on a multiple-level strategy that allows clustering text data with a variable distribution. Clusters are then characterized through the most representative words appearing in their messages, and association rules are used to highlight correlations among these words. To measure the relevance of specific words for a given event, text data has been represented in the Vector Space Model using the TF-IDF weighting score. As a case study, two real Twitter datasets have been analysed.
9783642413650
File in questo prodotto:
File Dimensione Formato  
2518923_draft.pdf

accesso aperto

Tipologia: 2. Post-print / Author's Accepted Manuscript
Licenza: PUBBLICO - Tutti i diritti riservati
Dimensione 228.94 kB
Formato Adobe PDF
228.94 kB Adobe PDF Visualizza/Apri
Pubblicazioni consigliate

Caricamento pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: http://hdl.handle.net/11583/2518923
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo