NLPGuard: A Framework for Mitigating the Use of Protected Attributes by NLP Classifiers

Greco, Salvatore; Zhou, Ke; Capra, Licia; Cerquitelli, Tania; Quercia, Daniele

doi:10.1145/3686924

AI regulations are expected to prohibit machine learning models from using sensitive attributes during training. However, the latest Natural Language Processing (NLP) classifiers, which rely on deep learning, operate as black-box systems, complicating the detection and remediation of such misuse. Traditional bias mitigation methods in NLP aim for comparable performance across different groups based on attributes like gender or race but fail to address the underlying issue of reliance on protected attributes. To partly fix that, we introduce NLPGuard, a framework for mitigating the reliance on protected attributes in NLP classifiers. NLPGuard takes an unlabeled dataset, an existing NLP classifier, and its training data as input, producing a modified training dataset that significantly reduces dependence on protected attributes without compromising accuracy. NLPGuard is applied to three classification tasks: identifying toxic language, sentiment analysis, and occupation classification. Our evaluation shows that current NLP classifiers heavily depend on protected attributes, with up to 23% of the most predictive words associated with these attributes. However, NLPGuard effectively reduces this reliance by up to 79%, while slightly improving accuracy.

NLPGuard: A Framework for Mitigating the Use of Protected Attributes by NLP Classifiers / Greco, Salvatore; Zhou, Ke; Capra, Licia; Cerquitelli, Tania; Quercia, Daniele. - 8:CSCW2(2024), pp. 1-25. [10.1145/3686924]

NLPGuard: A Framework for Mitigating the Use of Protected Attributes by NLP Classifiers

Greco, Salvatore;Zhou, Ke;Capra, Licia;Cerquitelli, Tania;Quercia, Daniele

2024

Abstract

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno del prodotto
	
				2024
			
	Codice DOI
	
				https://dx.doi.org/10.1145/3686924
			
	Titolo della Rivista
	
				PROCEEDINGS OF THE ACM ON HUMAN-COMPUTER INTERACTION
			
	Appare nelle tipologie
	
				1.1 Articolo in rivista

File in questo prodotto:

File	Dimensione	Formato
NLPGuard- A Framework for Mitigating the Use of Protected Attributes by NLP Classifiers.pdf accesso riservato Tipologia: 2a Post-print versione editoriale / Version of Record Licenza: Non Pubblico - Accesso privato/ristretto Dimensione 1.53 MB Formato Adobe PDF Visualizza/Apri Richiedi una copia	1.53 MB	Adobe PDF	Visualizza/Apri Richiedi una copia
2407.01697v1.pdf accesso aperto Tipologia: 2. Post-print / Author's Accepted Manuscript Licenza: Pubblico - Tutti i diritti riservati Dimensione 1.69 MB Formato Adobe PDF Visualizza/Apri	1.69 MB	Adobe PDF	Visualizza/Apri

Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11583/2990894

PORTO @ Archivio Istituzionale della Ricerca

NLPGuard: A Framework for Mitigating the Use of Protected Attributes by NLP Classifiers

Greco, Salvatore;Zhou, Ke;Capra, Licia;Cerquitelli, Tania;Quercia, Daniele

2024

Abstract

Scheda breve Scheda completa Scheda completa (DC)

Pubblicazioni consigliate

Informazioni

Conferma cancellazione

Scheda breve

Scheda completa

Scheda completa (DC)