A Systematic Literature Review (SLR) identifies, evaluates, and synthesizes the literature available for a given topic. This generally requires a significant human workload and has subjectivity bias that could affect the results of such a review. Automated document classification can be a valuable tool for recommending the selection of studies. In this article, we propose an automated pre-selection approach based on text mining and semantic enrichment techniques. Each document is firstly processed by a named entity extractor. The DBpedia URIs coming from the entity linking process are used as external sources of information. Our system collects the bag of words of those sources and it adds them to the initial document. A Multinomial Naive Bayes classifier discriminates whether the enriched document belongs to the positive example set or not. We used an existing manually performed SLR as benchmark data set. We trained our system with different configurations of relevant documents and we tested the goodness of our approach with an empirical assessment. Results show a reduction of the manual workload of 18% that a human researcher has to spend, while holding a remarkable 95% of recall, important condition for the nature itself of SLRs. We measure the effect of the enrichment process to the precision of the classifier and we observed a gain up to 5%.
Semantic Enrichment for Recommendation of Primary Studies in a Systematic Literature Review / Rizzo, Giuseppe; Tomassetti, FEDERICO CESARE ARGENTINO; Vetro', Antonio; Ardito, Luca; Torchiano, Marco; Morisio, Maurizio; Troncy, Raphael. - In: DIGITAL SCHOLARSHIP IN THE HUMANITIES. - ISSN 2055-7671. - STAMPA. - 32:1(2017), pp. 195-208. [10.1093/llc/fqv031]
Semantic Enrichment for Recommendation of Primary Studies in a Systematic Literature Review
RIZZO, GIUSEPPE;TOMASSETTI, FEDERICO CESARE ARGENTINO;VETRO', ANTONIO;ARDITO, LUCA;TORCHIANO, MARCO;MORISIO, MAURIZIO;
2017
Abstract
A Systematic Literature Review (SLR) identifies, evaluates, and synthesizes the literature available for a given topic. This generally requires a significant human workload and has subjectivity bias that could affect the results of such a review. Automated document classification can be a valuable tool for recommending the selection of studies. In this article, we propose an automated pre-selection approach based on text mining and semantic enrichment techniques. Each document is firstly processed by a named entity extractor. The DBpedia URIs coming from the entity linking process are used as external sources of information. Our system collects the bag of words of those sources and it adds them to the initial document. A Multinomial Naive Bayes classifier discriminates whether the enriched document belongs to the positive example set or not. We used an existing manually performed SLR as benchmark data set. We trained our system with different configurations of relevant documents and we tested the goodness of our approach with an empirical assessment. Results show a reduction of the manual workload of 18% that a human researcher has to spend, while holding a remarkable 95% of recall, important condition for the nature itself of SLRs. We measure the effect of the enrichment process to the precision of the classifier and we observed a gain up to 5%.File | Dimensione | Formato | |
---|---|---|---|
semanticreview.pdf
Open Access dal 14/08/2017
Descrizione: Post print articolo
Tipologia:
2. Post-print / Author's Accepted Manuscript
Licenza:
Pubblico - Tutti i diritti riservati
Dimensione
712.07 kB
Formato
Adobe PDF
|
712.07 kB | Adobe PDF | Visualizza/Apri |
Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/11583/2617310
Attenzione
Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo