Understanding peculiar and anomalous behavior of machine learning models for specific data subgroups is a fundamental building block of model performance and fairness evaluation. The analysis of these data subgroups can provide useful insights into model inner working and highlight its potentially discriminatory behavior. Current approaches to subgroup exploration ignore the presence of hierarchies in the data, and can only be applied to discretized attributes. The discretization process required for continuous attributes may significantly affect the identification of relevant subgroups. We propose a hierarchical subgroup exploration technique to identify anomalous subgroup behavior at multiple granularity levels, along with a technique for the hierarchical discretization of data attributes. The hierarchical discretization produces, for each continuous attribute, a hierarchy of intervals. The subsequent hierarchical exploration can exploit data hierarchies, selecting for each attribute the optimal granularity to identify subgroups that are both anomalous, and with enough elements to be statistically and practically significant. Compared to nonhierarchical approaches, we show that our hierarchical approach is more powerful in identifying anomalous subgroups and more stable with respect to discretization and exploration parameters.
A Hierarchical Approach to Anomalous Subgroup Discovery / Pastor, Eliana; Baralis, Elena; de Alfaro, Luca. - (2023), pp. 2647-2659. (Intervento presentato al convegno 39th IEEE International Conference on Data Engineering (ICDE 2023) tenutosi a Anaheim, California (USA) nel April 3–7, 2023) [10.1109/ICDE55515.2023.00203].
A Hierarchical Approach to Anomalous Subgroup Discovery
Pastor, Eliana;Baralis, Elena;de Alfaro, Luca
2023
Abstract
Understanding peculiar and anomalous behavior of machine learning models for specific data subgroups is a fundamental building block of model performance and fairness evaluation. The analysis of these data subgroups can provide useful insights into model inner working and highlight its potentially discriminatory behavior. Current approaches to subgroup exploration ignore the presence of hierarchies in the data, and can only be applied to discretized attributes. The discretization process required for continuous attributes may significantly affect the identification of relevant subgroups. We propose a hierarchical subgroup exploration technique to identify anomalous subgroup behavior at multiple granularity levels, along with a technique for the hierarchical discretization of data attributes. The hierarchical discretization produces, for each continuous attribute, a hierarchy of intervals. The subsequent hierarchical exploration can exploit data hierarchies, selecting for each attribute the optimal granularity to identify subgroups that are both anomalous, and with enough elements to be statistically and practically significant. Compared to nonhierarchical approaches, we show that our hierarchical approach is more powerful in identifying anomalous subgroups and more stable with respect to discretization and exploration parameters.File | Dimensione | Formato | |
---|---|---|---|
ICDE_H-DivEplorer_Hierarchiacal_approach.pdf
accesso aperto
Descrizione: Paper camera ready — author version
Tipologia:
2. Post-print / Author's Accepted Manuscript
Licenza:
Pubblico - Tutti i diritti riservati
Dimensione
483.19 kB
Formato
Adobe PDF
|
483.19 kB | Adobe PDF | Visualizza/Apri |
A_Hierarchical_Approach_to_Anomalous_Subgroup_Discovery.pdf
accesso riservato
Tipologia:
2a Post-print versione editoriale / Version of Record
Licenza:
Non Pubblico - Accesso privato/ristretto
Dimensione
1.12 MB
Formato
Adobe PDF
|
1.12 MB | Adobe PDF | Visualizza/Apri Richiedi una copia |
Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/11583/2976779