Automated decision-making (ADM) systems may affect multiple aspects of our lives. In particular, they can result in systematic discrimination of specific population groups, in violation of the EU Charter of Fundamental Rights. One of the potential causes of discriminative behavior, i.e., unfairness, lies in the quality of the data used to train such ADM systems. Using a data quality measurement approach combined with risk management, both defined in ISO standards, we focus on balance characteristics and we aim to understand how balance indexes (Gini, Simpson, Shannon, Imbalance Ratio) identify discrimination risk in six large datasets containing the classification output of ADM systems. The best result is achieved using the Imbalance Ratio index. Gini and Shannon indexes tend to assume high values and for this reason they have modest results in both aspects: further experimentation with different thresholds is needed. In terms of policies, the risk-based approach is a core element of the EU approach to regulate algorithmic systems: in this context, balance measures can be easily assumed as risk indicators of propagation – or even amplification – of bias in the input data of ADM systems.

A data quality approach to the identification of discrimination risk in automated decision making systems / Vetrò, Antonio; Torchiano, Marco; Mecati, Mariachiara. - In: GOVERNMENT INFORMATION QUARTERLY. - ISSN 0740-624X. - STAMPA. - 38:4(2021). [10.1016/j.giq.2021.101619]

A data quality approach to the identification of discrimination risk in automated decision making systems

Antonio Vetrò;Marco Torchiano;Mariachiara Mecati
2021

Abstract

Automated decision-making (ADM) systems may affect multiple aspects of our lives. In particular, they can result in systematic discrimination of specific population groups, in violation of the EU Charter of Fundamental Rights. One of the potential causes of discriminative behavior, i.e., unfairness, lies in the quality of the data used to train such ADM systems. Using a data quality measurement approach combined with risk management, both defined in ISO standards, we focus on balance characteristics and we aim to understand how balance indexes (Gini, Simpson, Shannon, Imbalance Ratio) identify discrimination risk in six large datasets containing the classification output of ADM systems. The best result is achieved using the Imbalance Ratio index. Gini and Shannon indexes tend to assume high values and for this reason they have modest results in both aspects: further experimentation with different thresholds is needed. In terms of policies, the risk-based approach is a core element of the EU approach to regulate algorithmic systems: in this context, balance measures can be easily assumed as risk indicators of propagation – or even amplification – of bias in the input data of ADM systems.
File in questo prodotto:
File Dimensione Formato  
Articolo_balance_GIQ___Revisione_1.pdf

Open Access dal 05/09/2023

Descrizione: Versione post print
Tipologia: 2. Post-print / Author's Accepted Manuscript
Licenza: Creative commons
Dimensione 975.12 kB
Formato Adobe PDF
975.12 kB Adobe PDF Visualizza/Apri
1-s2.0-S0740624X21000551-main.pdf

non disponibili

Tipologia: 2a Post-print versione editoriale / Version of Record
Licenza: Non Pubblico - Accesso privato/ristretto
Dimensione 2.62 MB
Formato Adobe PDF
2.62 MB Adobe PDF   Visualizza/Apri   Richiedi una copia
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11583/2922214