Exploiting pivot words to classify and summarize discourse facets of scientific papers

La Quatra, M.; Cagliero, L.; Baralis, E.

doi:10.1007/s11192-020-03532-3

The ever-increasing number of published scientific articles has prompted the need for automated, data-driven approaches to summarizing the content of scientific articles. The Computational Linguistics Scientific Document Summarization Shared Task (CL-SciSumm 2019) has recently fostered the study and development of new text mining and machine learning solutions to the summarization problem customized to the academic domain. In CL-SciSumm, a Reference Paper (RP) is associated with a set of Citing Papers (CPs), all containing citations to the RP. In each CP, the text spans (i.e., citances) have been identified that pertain to a particular citation to the RP. The task of identifying the spans of text in the RP that most accurately reflect the citance is addressed using supervised approaches. This paper proposes a new, more effective solution to the CL-SciSumm discourse facet classification task, which entails identifying for each cited text span what facet of the paper it belongs to from a predefined set of facets. It proposes also to extend the set of traditional CL-SciSumm tasks with a new one, namely the discourse facet summarization task. The idea behind is to extract facet-specific descriptions of each RP consisting of a fixed-length collection of RP’s text spans. To tackle both the standard and the new tasks, we propose machine learning supported solutions based on the extraction of a selection of discriminating words, called pivot words. Predictive features based on pivot words are shown to be of great importance to rate the pertinence and relevance of a text span to a given facet. The newly proposed facet classification method performs significantly better than the best performing CL-SciSumm 2019 participant (i.e., the classification accuracy has increased by + 8%), whereas regression methods achieved promising results for the newly proposed summarization task.

Exploiting pivot words to classify and summarize discourse facets of scientific papers / La Quatra, M., Cagliero, L., Baralis, E.. - In: SCIENTOMETRICS. - ISSN 0138-9130. - STAMPA. - 125:(2020), pp. 3139-3157. [10.1007/s11192-020-03532-3]

Exploiting pivot words to classify and summarize discourse facets of scientific papers

La Quatra M.;Cagliero L.;Baralis E.

2020

Abstract

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno del prodotto
	
				2020
			
	Codice DOI
	
				https://dx.doi.org/10.1007/s11192-020-03532-3
			
	Titolo della Rivista
	
				SCIENTOMETRICS
			
	Appare nelle tipologie
	
				1.1 Articolo in rivista

File in questo prodotto:

File	Dimensione	Formato
post_print_SCIM_2020.pdf Open Access dal 14/06/2021 Descrizione: post-print autore, post-ref Tipologia: 2. Post-print / Author's Accepted Manuscript Licenza: Pubblico - Tutti i diritti riservati Dimensione 1.95 MB Formato Adobe PDF Visualizza/Apri	1.95 MB	Adobe PDF	Visualizza/Apri
s11192-020-03532-3.pdf accesso riservato Tipologia: 2a Post-print versione editoriale / Version of Record Licenza: Non Pubblico - Accesso privato/ristretto Dimensione 1.25 MB Formato Adobe PDF Visualizza/Apri Richiedi una copia	1.25 MB	Adobe PDF	Visualizza/Apri Richiedi una copia

Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11583/2837187

PORTO @ Archivio Istituzionale della Ricerca

Exploiting pivot words to classify and summarize discourse facets of scientific papers

La Quatra M.;Cagliero L.;Baralis E.

2020

Abstract

Scheda breve Scheda completa Scheda completa (DC)

Pubblicazioni consigliate

Informazioni

Conferma cancellazione

Scheda breve

Scheda completa

Scheda completa (DC)