In this chapter we present the analysis of the Wikipedia collection by means of the ELiDa framework with the aim of enriching linked data. ELiDa is based on association rule mining, an exploratory technique to discover relevant correlations hidden in the analyzed data. To compactly store the large volume of extracted knowledge and efficiently retrieve it for further analysis, a persistent structure has been exploited. The domain expert is in charge of selecting the relevant knowledge by setting filtering parameters, assessing the quality of the extracted knowledge, and enriching the knowledge with the semantic expressiveness which cannot be automatically inferred. We consider, as representative document collections, seven datasets extracted from the Wikipedia collection. Each dataset has been analyzed from two point of views (i.e., transactions by documents, transactions by sentences) to highlight relevant knowledge at different levels of abstraction.
Semi-automatic knowledge extraction to enrich open linked data / Baralis, ELENA MARIA; Bruno, Giulia; Cerquitelli, Tania; Chiusano, SILVIA ANNA; Fiori, Alessandro; Grand, Alberto - In: Cases on Open-Linked Data and Semantic Web Applications / Patricia Ordoñez de Pablos, Miltiadis D. Lytras, Robert Tennyson, Jose Emilio Labra Gayo. - STAMPA. - [s.l] : IGI Global, 2013. - ISBN 9781466628274. - pp. 156-180 [10.4018/978-1-4666-2827-4.ch008]
Semi-automatic knowledge extraction to enrich open linked data
BARALIS, ELENA MARIA;BRUNO, GIULIA;CERQUITELLI, TANIA;CHIUSANO, SILVIA ANNA;FIORI, ALESSANDRO;GRAND, ALBERTO
2013
Abstract
In this chapter we present the analysis of the Wikipedia collection by means of the ELiDa framework with the aim of enriching linked data. ELiDa is based on association rule mining, an exploratory technique to discover relevant correlations hidden in the analyzed data. To compactly store the large volume of extracted knowledge and efficiently retrieve it for further analysis, a persistent structure has been exploited. The domain expert is in charge of selecting the relevant knowledge by setting filtering parameters, assessing the quality of the extracted knowledge, and enriching the knowledge with the semantic expressiveness which cannot be automatically inferred. We consider, as representative document collections, seven datasets extracted from the Wikipedia collection. Each dataset has been analyzed from two point of views (i.e., transactions by documents, transactions by sentences) to highlight relevant knowledge at different levels of abstraction.Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/11583/2502974
Attenzione
Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo