Machine learning models may perform differently on different data subgroups, which we represent as itemsets (i.e., conjunctions of simple predicates). The identification of these critical data subgroups plays an important role in many applications, for example model validation and testing, or evaluation of model fairness. Typically, domain expert help is required to identify relevant (or sensitive) subgroups. We propose the notion of divergence over itemsets as a measure of different classification behavior on data subgroups, and the use of frequent pattern mining techniques for their identification. A quantification of the contribution of different attribute values to divergence, based on the mathematical foundations provided by Shapley values, allows us to identify both critical and peculiar behaviors of attributes. Extended experiments show the effectiveness of the approach in identifying critical subgroup behaviors.

Looking for Trouble: Analyzing Classifier Behavior via Pattern Divergence / Pastor, Eliana; de Alfaro, Luca; Baralis, Elena. - ELETTRONICO. - (2021), pp. 1400-1412. (Intervento presentato al convegno SIGMOD/PODS '21: International Conference on Management of Data tenutosi a Virtual Event, China nel June 20–25, 2021) [10.1145/3448016.3457284].

Looking for Trouble: Analyzing Classifier Behavior via Pattern Divergence

Pastor, Eliana;Baralis, Elena
2021

Abstract

Machine learning models may perform differently on different data subgroups, which we represent as itemsets (i.e., conjunctions of simple predicates). The identification of these critical data subgroups plays an important role in many applications, for example model validation and testing, or evaluation of model fairness. Typically, domain expert help is required to identify relevant (or sensitive) subgroups. We propose the notion of divergence over itemsets as a measure of different classification behavior on data subgroups, and the use of frequent pattern mining techniques for their identification. A quantification of the contribution of different attribute values to divergence, based on the mathematical foundations provided by Shapley values, allows us to identify both critical and peculiar behaviors of attributes. Extended experiments show the effectiveness of the approach in identifying critical subgroup behaviors.
2021
978-1-4503-8343-1
File in questo prodotto:
File Dimensione Formato  
DivExplorer_SIGMOD_2021_Pastor_deAlfaro_Baralis.pdf

accesso aperto

Descrizione: Articolo principale (postprint referato)
Tipologia: 2. Post-print / Author's Accepted Manuscript
Licenza: PUBBLICO - Tutti i diritti riservati
Dimensione 1.01 MB
Formato Adobe PDF
1.01 MB Adobe PDF Visualizza/Apri
3448016.3457284.pdf

non disponibili

Tipologia: 2a Post-print versione editoriale / Version of Record
Licenza: Non Pubblico - Accesso privato/ristretto
Dimensione 1.58 MB
Formato Adobe PDF
1.58 MB Adobe PDF   Visualizza/Apri   Richiedi una copia
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11583/2900694