Due to the rapid growth of multimedia data and the diffusion of remote and mixed learning, teaching sessions are becoming more and more multi-modal. To deepen the knowledge of specific topics, learners can be interested in retrieving educational videos that complement the textual content of teaching books. However, retrieving educational videos can be particularly challenging when there is a lack of metadata information. To tackle the aforesaid issue, this paper explores the joint use of Deep Learning and Natural Language Processing techniques to retrieve cross-media educational resources (i.e., from text snippets to videos and vice versa). It applies NLP techniques to both the audio transcript of the videos and to the text snippets in the books in order to quantify the semantic relationships between pairs of educational resources of different media types. Then, it trains a Deep Learning model on top of the NLP-based features. The probabilities returned by the Deep Learning model are used to rank the candidate resources based on their relevance to a given query. The results achieved on a real collection of educational multimodal data show that the proposed approach performs better than state-of-the-art solutions. Furthermore, a preliminary attempt to apply the same approach to address a similar retrieval task (i.e., from text to image and vice versa) has shown promising results.

From teaching books to educational videos and vice versa: a cross-media content retrieval experience(2021), pp. 115-120. ((Intervento presentato al convegno COMPSAC nel 12-16 July 2021 [10.1109/COMPSAC51774.2021.00027].

From teaching books to educational videos and vice versa: a cross-media content retrieval experience

Canale, Lorenzo;Farinetti, Laura;Cagliero, Luca
2021

Abstract

Due to the rapid growth of multimedia data and the diffusion of remote and mixed learning, teaching sessions are becoming more and more multi-modal. To deepen the knowledge of specific topics, learners can be interested in retrieving educational videos that complement the textual content of teaching books. However, retrieving educational videos can be particularly challenging when there is a lack of metadata information. To tackle the aforesaid issue, this paper explores the joint use of Deep Learning and Natural Language Processing techniques to retrieve cross-media educational resources (i.e., from text snippets to videos and vice versa). It applies NLP techniques to both the audio transcript of the videos and to the text snippets in the books in order to quantify the semantic relationships between pairs of educational resources of different media types. Then, it trains a Deep Learning model on top of the NLP-based features. The probabilities returned by the Deep Learning model are used to rank the candidate resources based on their relevance to a given query. The results achieved on a real collection of educational multimodal data show that the proposed approach performs better than state-of-the-art solutions. Furthermore, a preliminary attempt to apply the same approach to address a similar retrieval task (i.e., from text to image and vice versa) has shown promising results.
978-1-6654-2463-9
File in questo prodotto:
File Dimensione Formato  
From_teaching_books_to_educational_videos_and_vice_versa_a_cross-media_content_retrieval_experience.pdf

non disponibili

Tipologia: 2a Post-print versione editoriale / Version of Record
Licenza: Non Pubblico - Accesso privato/ristretto
Dimensione 1.94 MB
Formato Adobe PDF
1.94 MB Adobe PDF   Visualizza/Apri   Richiedi una copia
IEEECOMPSAC2021_YouTubeVideoRetrieval(1).pdf

accesso aperto

Tipologia: 2. Post-print / Author's Accepted Manuscript
Licenza: PUBBLICO - Tutti i diritti riservati
Dimensione 606.34 kB
Formato Adobe PDF
606.34 kB Adobe PDF Visualizza/Apri
Pubblicazioni consigliate

Caricamento pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11583/2928056