Exploiting pivot words to classify and summarize discourse facets of scientific papers

La Quatra, M.; Cagliero, L.; Baralis, E.

doi:10.1007/s11192-020-03532-3

The ever-increasing number of published scientific articles has prompted the need for automated, data-driven approaches to summarizing the content of scientific articles. The Computational Linguistics Scientific Document Summarization Shared Task (CL-SciSumm 2019) has recently fostered the study and development of new text mining and machine learning solutions to the summarization problem customized to the academic domain. In CL-SciSumm, a Reference Paper (RP) is associated with a set of Citing Papers (CPs), all containing citations to the RP. In each CP, the text spans (i.e., citances) have been identified that pertain to a particular citation to the RP. The task of identifying the spans of text in the RP that most accurately reflect the citance is addressed using supervised approaches. This paper proposes a new, more effective solution to the CL-SciSumm discourse facet classification task, which entails identifying for each cited text span what facet of the paper it belongs to from a predefined set of facets. It proposes also to extend the set of traditional CL-SciSumm tasks with a new one, namely the discourse facet summarization task. The idea behind is to extract facet-specific descriptions of each RP consisting of a fixed-length collection of RP’s text spans. To tackle both the standard and the new tasks, we propose machine learning supported solutions based on the extraction of a selection of discriminating words, called pivot words. Predictive features based on pivot words are shown to be of great importance to rate the pertinence and relevance of a text span to a given facet. The newly proposed facet classification method performs significantly better than the best performing CL-SciSumm 2019 participant (i.e., the classification accuracy has increased by + 8%), whereas regression methods achieved promising results for the newly proposed summarization task.

Exploiting pivot words to classify and summarize discourse facets of scientific papers / La Quatra, M.; Cagliero, L.; Baralis, E.. - In: SCIENTOMETRICS. - ISSN 0138-9130. - STAMPA. - (2020), pp. 1-19. [10.1007/s11192-020-03532-3]

Exploiting pivot words to classify and summarize discourse facets of scientific papers

La Quatra M.;Cagliero L.;Baralis E.

2020

Abstract

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno del prodotto
	
			2020
		
	Codice DOI
	
			https://dx.doi.org/10.1007/s11192-020-03532-3
		
	Titolo della Rivista
	
			SCIENTOMETRICS
		
	Appare nelle tipologie
	
			1.1 Articolo in rivista

File in questo prodotto:

File	Dimensione	Formato
post_print_SCIM_2020.pdf Open Access dal 14/06/2021 Descrizione: post-print autore, post-ref Tipologia: 2. Post-print / Author's Accepted Manuscript Licenza: PUBBLICO - Tutti i diritti riservati Dimensione 1.95 MB Formato Adobe PDF Visualizza/Apri	1.95 MB	Adobe PDF	Visualizza/Apri
LaQuatra2020_Article_ExploitingPivotWordsToClassify.pdf non disponibili Tipologia: 2a Post-print versione editoriale / Version of Record Licenza: Non Pubblico - Accesso privato/ristretto Dimensione 1.23 MB Formato Adobe PDF Visualizza/Apri Richiedi una copia	1.23 MB	Adobe PDF	Visualizza/Apri Richiedi una copia

Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11583/2837187

PORTO @ Archivio Istituzionale della Ricerca

Exploiting pivot words to classify and summarize discourse facets of scientific papers

La Quatra M.;Cagliero L.;Baralis E.

2020

Abstract

Scheda breve Scheda completa Scheda completa (DC)

Pubblicazioni consigliate

Informazioni

Conferma cancellazione

Scheda breve

Scheda completa

Scheda completa (DC)