To promote the responsible development and use of data-driven technologies –such as machine learning and artificial intelligence- principles of trustworthiness, accountability and fairness should be followed. The quality of the dataset on which these applications rely, is crucial to achieve compliance with the required ethical principles. Quantitative approaches to measure data quality are abundant in the literature and among practitioners, however they are not sufficient to cover all the principles and ethical challenges involved. In this paper, we show that complementing data quality with measurable dimensions of data documentation and of data balance helps to cover a wider range of ethical challenges connected to the use of datasets in algorithms. A synthetic report of the metrics applied (the Extended Data Brief) and a set of Risk Labels for the Ethical Challenges provide a practical overview of the potential ethical harms due to data composition. We believe that the proposed data labelling scheme will enable practitioners to improve the overall quality of datasets and to build more responsible data-driven software systems.

Experience: Bridging Data Measurement and Ethical Challenges with Extended Data Briefs / Rondina, Marco; Vetro', Antonio; Fabris, Alessandro; Silvello, Gianmaria; Susto, Gian Antonio; Torchiano, Marco; De Martin, Juan Carlos. - In: ACM JOURNAL OF DATA AND INFORMATION QUALITY. - ISSN 1936-1955. - (2025). [10.1145/3726872]

Experience: Bridging Data Measurement and Ethical Challenges with Extended Data Briefs

Rondina, Marco;Vetro', Antonio;Torchiano, Marco;De Martin, Juan Carlos
2025

Abstract

To promote the responsible development and use of data-driven technologies –such as machine learning and artificial intelligence- principles of trustworthiness, accountability and fairness should be followed. The quality of the dataset on which these applications rely, is crucial to achieve compliance with the required ethical principles. Quantitative approaches to measure data quality are abundant in the literature and among practitioners, however they are not sufficient to cover all the principles and ethical challenges involved. In this paper, we show that complementing data quality with measurable dimensions of data documentation and of data balance helps to cover a wider range of ethical challenges connected to the use of datasets in algorithms. A synthetic report of the metrics applied (the Extended Data Brief) and a set of Risk Labels for the Ethical Challenges provide a practical overview of the potential ethical harms due to data composition. We believe that the proposed data labelling scheme will enable practitioners to improve the overall quality of datasets and to build more responsible data-driven software systems.
File in questo prodotto:
File Dimensione Formato  
JDIQ_ExperienceExtendedDataBrief_JustAccepted.pdf

accesso aperto

Tipologia: 2. Post-print / Author's Accepted Manuscript
Licenza: Creative commons
Dimensione 384.59 kB
Formato Adobe PDF
384.59 kB Adobe PDF Visualizza/Apri
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11583/2999294