To promote the responsible development and use of data-driven technologies –such as machine learning and artificial intelligence- principles of trustworthiness, accountability and fairness should be followed. The quality of the dataset on which these applications rely, is crucial to achieve compliance with the required ethical principles. Quantitative approaches to measure data quality are abundant in the literature and among practitioners, however they are not sufficient to cover all the principles and ethical challenges involved. In this paper, we show that complementing data quality with measurable dimensions of data documentation and of data balance helps to cover a wider range of ethical challenges connected to the use of datasets in algorithms. A synthetic report of the metrics applied (the Extended Data Brief) and a set of Risk Labels for the Ethical Challenges provide a practical overview of the potential ethical harms due to data composition. We believe that the proposed data labelling scheme will enable practitioners to improve the overall quality of datasets and to build more responsible data-driven software systems.
Experience: Bridging Data Measurement and Ethical Challenges with Extended Data Briefs / Rondina, Marco; Vetro', Antonio; Fabris, Alessandro; Silvello, Gianmaria; Susto, Gian Antonio; Torchiano, Marco; De Martin, Juan Carlos. - In: ACM JOURNAL OF DATA AND INFORMATION QUALITY. - ISSN 1936-1955. - (2025). [10.1145/3726872]
Experience: Bridging Data Measurement and Ethical Challenges with Extended Data Briefs
Rondina, Marco;Vetro', Antonio;Torchiano, Marco;De Martin, Juan Carlos
2025
Abstract
To promote the responsible development and use of data-driven technologies –such as machine learning and artificial intelligence- principles of trustworthiness, accountability and fairness should be followed. The quality of the dataset on which these applications rely, is crucial to achieve compliance with the required ethical principles. Quantitative approaches to measure data quality are abundant in the literature and among practitioners, however they are not sufficient to cover all the principles and ethical challenges involved. In this paper, we show that complementing data quality with measurable dimensions of data documentation and of data balance helps to cover a wider range of ethical challenges connected to the use of datasets in algorithms. A synthetic report of the metrics applied (the Extended Data Brief) and a set of Risk Labels for the Ethical Challenges provide a practical overview of the potential ethical harms due to data composition. We believe that the proposed data labelling scheme will enable practitioners to improve the overall quality of datasets and to build more responsible data-driven software systems.File | Dimensione | Formato | |
---|---|---|---|
JDIQ_ExperienceExtendedDataBrief_JustAccepted.pdf
accesso aperto
Tipologia:
2. Post-print / Author's Accepted Manuscript
Licenza:
Creative commons
Dimensione
384.59 kB
Formato
Adobe PDF
|
384.59 kB | Adobe PDF | Visualizza/Apri |
Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/11583/2999294