Preserving diversity and inclusion is becoming a compelling need in both industry and academia. The ability to use appropriate forms of writing, speaking, and gestures is not widespread even in formal communications such as public calls, public announcements, official reports, and legal documents. The improper use of linguistic expressions can foment unacceptable forms of exclusion, stereotypes as well as forms of verbal violence against minorities, including women. Furthermore, existing machine translation tools are not designed to generate inclusive content. The present paper investigates a joint effort of the research communities of linguistics and Deep Learning Natural Language Understanding in fighting against non-inclusive, prejudiced language forms. It presents a methodology aimed at tackling the improper use of language in formal communication, with a particular attention paid to Romanic languages (Italian, in particular). State-of-the-art Deep Language Modeling architectures are exploited to automatically identify non-inclusive text snippets, suggest alternative forms, and produce inclusive text rephrasing. A preliminary evaluation conducted on a benchmark dataset shows promising results, i.e., 85% accuracy in predicting inclusive/non-inclusive communications.

E-MIMIC: Empowering Multilingual Inclusive Communication / Attanasio, Giuseppe; Greco, Salvatore; LA QUATRA, Moreno; Cagliero, Luca; Tonti, Michela; Cerquitelli, Tania; Raus, Rachele. - ELETTRONICO. - (2021), pp. 4227-4234. (Intervento presentato al convegno First International Workshop on Data science for equality, inclusion and well-being challenges tenutosi a Virtual, Online nel 15-18 December 2021) [10.1109/BigData52589.2021.9671868].

E-MIMIC: Empowering Multilingual Inclusive Communication

Giuseppe Attanasio;Salvatore Greco;Moreno La Quatra;Luca Cagliero;Michela Tonti;Tania Cerquitelli;
2021

Abstract

Preserving diversity and inclusion is becoming a compelling need in both industry and academia. The ability to use appropriate forms of writing, speaking, and gestures is not widespread even in formal communications such as public calls, public announcements, official reports, and legal documents. The improper use of linguistic expressions can foment unacceptable forms of exclusion, stereotypes as well as forms of verbal violence against minorities, including women. Furthermore, existing machine translation tools are not designed to generate inclusive content. The present paper investigates a joint effort of the research communities of linguistics and Deep Learning Natural Language Understanding in fighting against non-inclusive, prejudiced language forms. It presents a methodology aimed at tackling the improper use of language in formal communication, with a particular attention paid to Romanic languages (Italian, in particular). State-of-the-art Deep Language Modeling architectures are exploited to automatically identify non-inclusive text snippets, suggest alternative forms, and produce inclusive text rephrasing. A preliminary evaluation conducted on a benchmark dataset shows promising results, i.e., 85% accuracy in predicting inclusive/non-inclusive communications.
File in questo prodotto:
File Dimensione Formato  
E_MIMIC.pdf

accesso aperto

Descrizione: Post-print version of the manuscript
Tipologia: 2. Post-print / Author's Accepted Manuscript
Licenza: Pubblico - Tutti i diritti riservati
Dimensione 209.04 kB
Formato Adobe PDF
209.04 kB Adobe PDF Visualizza/Apri
E-MIMIC_Empowering_Multilingual_Inclusive_Communication.pdf

accesso riservato

Tipologia: 2a Post-print versione editoriale / Version of Record
Licenza: Non Pubblico - Accesso privato/ristretto
Dimensione 769.08 kB
Formato Adobe PDF
769.08 kB Adobe PDF   Visualizza/Apri   Richiedi una copia
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11583/2946252