We present a multi-agent classification solution for identifying misogynous and aggressive content in Italian tweets. A first agent uses modern Sentence Embedding techniques to encode tweets and a SVM classifier to produce initial labels. A second agent, based on TF-IDF and Misogyny Italian lexicons, is jointly adopted to improve the first agent on uncertain predictions. We evaluate our approach in the Automatic Misogyny Identification Shared Task of the EVALITA 2020 campaign. Results show that TF-IDF and lexicons effectively improve the supervised agent trained on sentence embeddings.

PoliTeam@AMI: Improving Sentence Embedding Similarity with Misogyny Lexicons for Automatic Misogyny Identification in Italian Tweets / Attanasio, Giuseppe; Pastor, Eliana. - ELETTRONICO. - 2765:(2020). ((Intervento presentato al convegno Evaluation Campaign of Natural Language Processing and Speech tools for Italian tenutosi a Online nel 17 December 2020.

PoliTeam@AMI: Improving Sentence Embedding Similarity with Misogyny Lexicons for Automatic Misogyny Identification in Italian Tweets

Attanasio, Giuseppe;Pastor, Eliana
2020

Abstract

We present a multi-agent classification solution for identifying misogynous and aggressive content in Italian tweets. A first agent uses modern Sentence Embedding techniques to encode tweets and a SVM classifier to produce initial labels. A second agent, based on TF-IDF and Misogyny Italian lexicons, is jointly adopted to improve the first agent on uncertain predictions. We evaluate our approach in the Automatic Misogyny Identification Shared Task of the EVALITA 2020 campaign. Results show that TF-IDF and lexicons effectively improve the supervised agent trained on sentence embeddings.
File in questo prodotto:
File Dimensione Formato  
PoliTeam_AMI2020_Attanasio_Pastor_Preprint.pdf

accesso aperto

Descrizione: Preprint
Tipologia: 1. Preprint / submitted version [pre- review]
Licenza: PUBBLICO - Tutti i diritti riservati
Dimensione 124.55 kB
Formato Adobe PDF
124.55 kB Adobe PDF Visualizza/Apri
EVALITA2020_paper142.pdf

accesso aperto

Descrizione: Articolo principale
Tipologia: 2a Post-print versione editoriale / Version of Record
Licenza: Creative commons
Dimensione 253.66 kB
Formato Adobe PDF
253.66 kB Adobe PDF Visualizza/Apri
Pubblicazioni consigliate

Caricamento pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11583/2854132