Natural disasters have become more frequent during the past 20 years due to significant climate changes. These natural events are hotly debated on social networks like Twitter and a huge amount of short text messages are continuously and promptly exchanged with personal opinions, descriptions of the natural events and their corresponding consequences. The analysis of these large and complex data could help policy-makers to better understand the event as well as to set priorities. However, the correct configuration of the tweet mining process is still challenging due to variable data distribution and the availability of a large number of algorithms with different specific parameters. The analyst need to perform a large number of experiments to identify the best configuration for the overall knowledge discovery process. Innovative, scalable, and parameter-free solutions need to be explored to streamline the analytics process. This paper presents an enhanced version of PASTA (a distributed self-tuning engine) applied to a crisis tweet collection to group a corpus of tweets into cohesive and well-separated clusters with minimal analyst intervention. Experimental results performed on real data collected during natural disasters show the effectiveness of PASTA in discovering interesting groups of correlated tweets without selecting neither the algorithms nor their parameters.

All in a twitter: Self-tuning strategies for a deeper understanding of a crisis tweet collection / Di Corso, Evelina; Ventura, Francesco; Cerquitelli, Tania. - ELETTRONICO. - (2017), pp. 3722-3726. ((Intervento presentato al convegno Big Data (Big Data), 2017 IEEE International Conference on tenutosi a Boston (USA) nel 11-14 Dec. 2017 [10.1109/BigData.2017.8258369].

All in a twitter: Self-tuning strategies for a deeper understanding of a crisis tweet collection

Di Corso, Evelina;Ventura, Francesco;Cerquitelli, Tania
2017

Abstract

Natural disasters have become more frequent during the past 20 years due to significant climate changes. These natural events are hotly debated on social networks like Twitter and a huge amount of short text messages are continuously and promptly exchanged with personal opinions, descriptions of the natural events and their corresponding consequences. The analysis of these large and complex data could help policy-makers to better understand the event as well as to set priorities. However, the correct configuration of the tweet mining process is still challenging due to variable data distribution and the availability of a large number of algorithms with different specific parameters. The analyst need to perform a large number of experiments to identify the best configuration for the overall knowledge discovery process. Innovative, scalable, and parameter-free solutions need to be explored to streamline the analytics process. This paper presents an enhanced version of PASTA (a distributed self-tuning engine) applied to a crisis tweet collection to group a corpus of tweets into cohesive and well-separated clusters with minimal analyst intervention. Experimental results performed on real data collected during natural disasters show the effectiveness of PASTA in discovering interesting groups of correlated tweets without selecting neither the algorithms nor their parameters.
978-1-5386-2715-0
File in questo prodotto:
File Dimensione Formato  
CameraReadyDSEM2017.pdf

accesso aperto

Descrizione: Articolo principale
Tipologia: 2. Post-print / Author's Accepted Manuscript
Licenza: PUBBLICO - Tutti i diritti riservati
Dimensione 185.07 kB
Formato Adobe PDF
185.07 kB Adobe PDF Visualizza/Apri
Pubblicazioni consigliate

Caricamento pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11583/2699172
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo