Understanding peculiar and anomalous behavior of machine learning models for specific data subgroups is a fundamental building block of model performance and fairness evaluation. The analysis of these data subgroups can provide useful insights into model inner working and highlight its potentially discriminatory behavior. Current approaches to subgroup exploration ignore the presence of hierarchies in the data, and can only be applied to discretized attributes. The discretization process required for continuous attributes may significantly affect the identification of relevant subgroups. We propose a hierarchical subgroup exploration technique to identify anomalous subgroup behavior at multiple granularity levels, along with a technique for the hierarchical discretization of data attributes. The hierarchical discretization produces, for each continuous attribute, a hierarchy of intervals. The subsequent hierarchical exploration can exploit data hierarchies, selecting for each attribute the optimal granularity to identify subgroups that are both anomalous, and with enough elements to be statistically and practically significant. Compared to nonhierarchical approaches, we show that our hierarchical approach is more powerful in identifying anomalous subgroups and more stable with respect to discretization and exploration parameters.

A Hierarchical Approach to Anomalous Subgroup Discovery / Pastor, Eliana; Baralis, Elena; de Alfaro, Luca. - (2023), pp. 2647-2659. (Intervento presentato al convegno 39th IEEE International Conference on Data Engineering (ICDE 2023) tenutosi a Anaheim, California (USA) nel April 3–7, 2023) [10.1109/ICDE55515.2023.00203].

A Hierarchical Approach to Anomalous Subgroup Discovery

Pastor, Eliana;Baralis, Elena;de Alfaro, Luca
2023

Abstract

Understanding peculiar and anomalous behavior of machine learning models for specific data subgroups is a fundamental building block of model performance and fairness evaluation. The analysis of these data subgroups can provide useful insights into model inner working and highlight its potentially discriminatory behavior. Current approaches to subgroup exploration ignore the presence of hierarchies in the data, and can only be applied to discretized attributes. The discretization process required for continuous attributes may significantly affect the identification of relevant subgroups. We propose a hierarchical subgroup exploration technique to identify anomalous subgroup behavior at multiple granularity levels, along with a technique for the hierarchical discretization of data attributes. The hierarchical discretization produces, for each continuous attribute, a hierarchy of intervals. The subsequent hierarchical exploration can exploit data hierarchies, selecting for each attribute the optimal granularity to identify subgroups that are both anomalous, and with enough elements to be statistically and practically significant. Compared to nonhierarchical approaches, we show that our hierarchical approach is more powerful in identifying anomalous subgroups and more stable with respect to discretization and exploration parameters.
2023
979-8-3503-2227-9
File in questo prodotto:
File Dimensione Formato  
ICDE_H-DivEplorer_Hierarchiacal_approach.pdf

accesso aperto

Descrizione: Paper camera ready — author version
Tipologia: 2. Post-print / Author's Accepted Manuscript
Licenza: Pubblico - Tutti i diritti riservati
Dimensione 483.19 kB
Formato Adobe PDF
483.19 kB Adobe PDF Visualizza/Apri
A_Hierarchical_Approach_to_Anomalous_Subgroup_Discovery.pdf

accesso riservato

Tipologia: 2a Post-print versione editoriale / Version of Record
Licenza: Non Pubblico - Accesso privato/ristretto
Dimensione 1.12 MB
Formato Adobe PDF
1.12 MB Adobe PDF   Visualizza/Apri   Richiedi una copia
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11583/2976779