Machine learning models may perform differently on different data subgroups, which we represent as itemsets (i.e., conjunctions of simple predicates). The identification of these critical data subgroups plays an important role in many applications, for example model validation and testing, or evaluation of model fairness. Typically, domain expert help is required to identify relevant (or sensitive) subgroups. We propose the notion of divergence over itemsets as a measure of different classification behavior on data subgroups, and the use of frequent pattern mining techniques for their identification. A quantification of the contribution of different attribute values to divergence, based on the mathematical foundations provided by Shapley values, allows us to identify both critical and peculiar behaviors of attributes. Extended experiments show the effectiveness of the approach in identifying critical subgroup behaviors.
Looking for Trouble: Analyzing Classifier Behavior via Pattern Divergence / Pastor, Eliana; de Alfaro, Luca; Baralis, Elena. - ELETTRONICO. - (2021), pp. 1400-1412. (Intervento presentato al convegno SIGMOD/PODS '21: International Conference on Management of Data tenutosi a Virtual Event, China nel June 20–25, 2021) [10.1145/3448016.3457284].
Looking for Trouble: Analyzing Classifier Behavior via Pattern Divergence
Pastor, Eliana;Baralis, Elena
2021
Abstract
Machine learning models may perform differently on different data subgroups, which we represent as itemsets (i.e., conjunctions of simple predicates). The identification of these critical data subgroups plays an important role in many applications, for example model validation and testing, or evaluation of model fairness. Typically, domain expert help is required to identify relevant (or sensitive) subgroups. We propose the notion of divergence over itemsets as a measure of different classification behavior on data subgroups, and the use of frequent pattern mining techniques for their identification. A quantification of the contribution of different attribute values to divergence, based on the mathematical foundations provided by Shapley values, allows us to identify both critical and peculiar behaviors of attributes. Extended experiments show the effectiveness of the approach in identifying critical subgroup behaviors.File | Dimensione | Formato | |
---|---|---|---|
DivExplorer_SIGMOD_2021_Pastor_deAlfaro_Baralis.pdf
accesso aperto
Descrizione: Articolo principale (postprint referato)
Tipologia:
2. Post-print / Author's Accepted Manuscript
Licenza:
PUBBLICO - Tutti i diritti riservati
Dimensione
1.01 MB
Formato
Adobe PDF
|
1.01 MB | Adobe PDF | Visualizza/Apri |
3448016.3457284.pdf
non disponibili
Tipologia:
2a Post-print versione editoriale / Version of Record
Licenza:
Non Pubblico - Accesso privato/ristretto
Dimensione
1.58 MB
Formato
Adobe PDF
|
1.58 MB | Adobe PDF | Visualizza/Apri Richiedi una copia |
Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/11583/2900694