ToPIC (Tuning of Parameters for Inference of Concepts) is a distributed self-tuning engine whose aim is to cluster collections of textual data into correlated groups of documents through a topic modeling methodology (i.e., LDA). ToPIC includes automatic strategies to relieve the end-user of the burden of selecting proper values for the overall analytics process. ToPIC's current implementation runs on Apache Spark, a state-of-the-art distributed computing framework. As a case study, ToPIC has been validated on three real collections of textual documents characterized by different distributions. The experimental results show the effectiveness and efficiency of the proposed solution in analyzing collections of documents without tuning algorithm parameters and in discovering cohesive and well-separated groups of documents with a similar topic.
Useful ToPIC: Self-tuning strategies to enhance Latent Dirichlet Allocation / Proto, Stefano; DI CORSO, Evelina; Ventura, Francesco; Cerquitelli, Tania. - ELETTRONICO. - (2018), pp. 33-40. (Intervento presentato al convegno BigData Congress 2018 tenutosi a San Francisco (USA) nel July 2-7, 2018).
Useful ToPIC: Self-tuning strategies to enhance Latent Dirichlet Allocation
PROTO, STEFANO;Evelina Di Corso;Francesco Ventura;Tania Cerquitelli
2018
Abstract
ToPIC (Tuning of Parameters for Inference of Concepts) is a distributed self-tuning engine whose aim is to cluster collections of textual data into correlated groups of documents through a topic modeling methodology (i.e., LDA). ToPIC includes automatic strategies to relieve the end-user of the burden of selecting proper values for the overall analytics process. ToPIC's current implementation runs on Apache Spark, a state-of-the-art distributed computing framework. As a case study, ToPIC has been validated on three real collections of textual documents characterized by different distributions. The experimental results show the effectiveness and efficiency of the proposed solution in analyzing collections of documents without tuning algorithm parameters and in discovering cohesive and well-separated groups of documents with a similar topic.File | Dimensione | Formato | |
---|---|---|---|
topic-tuning-strategies.pdf
accesso aperto
Descrizione: Articolo principale
Tipologia:
2. Post-print / Author's Accepted Manuscript
Licenza:
PUBBLICO - Tutti i diritti riservati
Dimensione
1.84 MB
Formato
Adobe PDF
|
1.84 MB | Adobe PDF | Visualizza/Apri |
Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/11583/2710501
Attenzione
Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo