Exploring the Adaptability of Large Speech Models to Non-Verbal Vocalization Task

Márquez Villacis, Juan José; D'Asaro, Federico; Rizzo, Giuseppe; Bottino, Andrea

Large Speech Models (LSMs), pre-trained on extensive speech corpora, have recently emerged as powerful foundations in the audio processing field, demonstrating strong transfer capabilities to downstream tasks such as speaker identification and emotion recognition. However, while these models excel on speech-centric tasks, limited research has investigated their adaptability to Non-Verbal Vocalization (NVV) tasks, which involve vocal bursts like laughter, sighs, shrieks, and moans. In this work, we examine how well LSMs, specifically Wav2Vec 2.0, HuBERT, WavLM, and Whisper, can be adapted to NVV tasks. We conduct experiments using both linear probing to evaluate the pre-trained knowledge relevant to NVVs, and Parameter-Efficient Fine-Tuning (PEFT) techniques, including LoRA, Adapters, and Prompt Tuning. Experimental results on several NVV datasets—ASVP-ESD, CNVVE, Non-Verbal Vocalization Dataset, ReCANVo, VIVAE—indicate that Whisper-based models consistently achieve superior performance, which is further enhanced through the application of LoRA. Additionally, our layer-wise analysis reveals that applying PEFT specifically to layers with lower NVV information is key to effective model adaptation, providing valuable insights for optimizing fine-tuning strategies in future work.

Exploring the Adaptability of Large Speech Models to Non-Verbal Vocalization Task / Márquez Villacis, Juan José; D'Asaro, Federico; Rizzo, Giuseppe; Bottino, Andrea. - (In corso di stampa). (Intervento presentato al convegno CLiC-it 2025 – Eleventh Italian Conference on Computational Linguistics tenutosi a Cagliari (ITA) nel September 24-26, 2025).

Exploring the Adaptability of Large Speech Models to Non-Verbal Vocalization Task

Márquez Villacis, Juan José;D'Asaro, Federico;Rizzo, Giuseppe;Bottino, Andrea

In corso di stampa

Abstract

Scheda breve

Scheda completa

Scheda completa (DC)

Anno del prodotto

In corso di stampa

Appare nelle tipologie

4.1 Contributo in Atti di convegno

File in questo prodotto:

File	Dimensione	Formato
CLIC_it_2025_NVV.pdf accesso aperto Tipologia: 2. Post-print / Author's Accepted Manuscript Licenza: Creative commons Dimensione 837.36 kB Formato Adobe PDF Visualizza/Apri	837.36 kB	Adobe PDF	Visualizza/Apri

Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11583/3002059

PORTO @ Archivio Istituzionale della Ricerca

Exploring the Adaptability of Large Speech Models to Non-Verbal Vocalization Task

Márquez Villacis, Juan José;D'Asaro, Federico;Rizzo, Giuseppe;Bottino, Andrea

In corso di stampa

Abstract

Scheda breve Scheda completa Scheda completa (DC)

Pubblicazioni consigliate

Informazioni

Conferma cancellazione

Scheda breve

Scheda completa

Scheda completa (DC)