The ever-increasing number of published scientific articles has prompted the need for automated, data-driven approaches to summarizing the content of scientific articles. The Computational Linguistics Scientific Document Summarization Shared Task (CL-SciSumm 2019) has recently fostered the study and development of new text mining and machine learning solutions to the summarization problem customized to the academic domain. In CL-SciSumm, a Reference Paper (RP) is associated with a set of Citing Papers (CPs), all containing citations to the RP. In each CP, the text spans (i.e., citances) have been identified that pertain to a particular citation to the RP. The task of identifying the spans of text in the RP that most accurately reflect the citance is addressed using supervised approaches. This paper proposes a new, more effective solution to the CL-SciSumm discourse facet classification task, which entails identifying for each cited text span what facet of the paper it belongs to from a predefined set of facets. It proposes also to extend the set of traditional CL-SciSumm tasks with a new one, namely the discourse facet summarization task. The idea behind is to extract facet-specific descriptions of each RP consisting of a fixed-length collection of RP’s text spans. To tackle both the standard and the new tasks, we propose machine learning supported solutions based on the extraction of a selection of discriminating words, called pivot words. Predictive features based on pivot words are shown to be of great importance to rate the pertinence and relevance of a text span to a given facet. The newly proposed facet classification method performs significantly better than the best performing CL-SciSumm 2019 participant (i.e., the classification accuracy has increased by + 8%), whereas regression methods achieved promising results for the newly proposed summarization task.
Exploiting pivot words to classify and summarize discourse facets of scientific papers / La Quatra, M.; Cagliero, L.; Baralis, E.. - In: SCIENTOMETRICS. - ISSN 0138-9130. - STAMPA. - (2020), pp. 1-19.
Titolo: | Exploiting pivot words to classify and summarize discourse facets of scientific papers |
Autori: | |
Data di pubblicazione: | 2020 |
Rivista: | |
Digital Object Identifier (DOI): | http://dx.doi.org/10.1007/s11192-020-03532-3 |
Appare nelle tipologie: | 1.1 Articolo in rivista |
File in questo prodotto:
File | Descrizione | Tipologia | Licenza | |
---|---|---|---|---|
post_print_SCIM_2020.pdf | post-print autore, post-ref | 2. Post-print / Author's Accepted Manuscript | PUBBLICO - Tutti i diritti riservati | Embargo: 13/06/2021 Richiedi una copia |
LaQuatra2020_Article_ExploitingPivotWordsToClassify.pdf | 2a Post-print versione editoriale / Version of Record | Non Pubblico - Accesso privato/ristretto | Administrator Richiedi una copia |
http://hdl.handle.net/11583/2837187