When a machine learning algorithm is able to obtain the same performance given a complete training set, and a small subset of samples from the same training set, the subset is termed coreset. As using a coreset improves training speed and allows human experts to gain a better understanding of the data, by reducing the number of samples to be examined, coreset discovery is an active line of research. Often in literature the problem of coreset discovery is framed as i. single-objective, attempting to find the candidate coreset that best represents the training set, and ii. independent from the machine learning algorithm used. In this work, an approach to evolutionary coreset discovery is presented. Building on preliminary results, the proposed approach uses a multi-objective evolutionary algorithm to find compromises between two conflicting objectives, i. minimizing the number of samples in a candidate coreset, and ii. maximizing the accuracy of a target classifier, trained with the coreset, on the whole original training set. Experimental results on popular classification benchmarks show that the proposed approach is able to identify candidate coresets with better accuracy and generality than state-of-the-art coreset discovery algorithms found in literature.

Evolutionary discovery of coresets for classification / Barbiero, Pietro; Squillero, Giovanni; Tonda, Alberto. - (2019), pp. 1747-1754. (Intervento presentato al convegno GECCO'19 tenutosi a Prague (Czech Republic) nel July 13 - 17, 2019) [10.1145/3319619.3326846].

Evolutionary discovery of coresets for classification

Squillero, Giovanni;Tonda, Alberto
2019

Abstract

When a machine learning algorithm is able to obtain the same performance given a complete training set, and a small subset of samples from the same training set, the subset is termed coreset. As using a coreset improves training speed and allows human experts to gain a better understanding of the data, by reducing the number of samples to be examined, coreset discovery is an active line of research. Often in literature the problem of coreset discovery is framed as i. single-objective, attempting to find the candidate coreset that best represents the training set, and ii. independent from the machine learning algorithm used. In this work, an approach to evolutionary coreset discovery is presented. Building on preliminary results, the proposed approach uses a multi-objective evolutionary algorithm to find compromises between two conflicting objectives, i. minimizing the number of samples in a candidate coreset, and ii. maximizing the accuracy of a target classifier, trained with the coreset, on the whole original training set. Experimental results on popular classification benchmarks show that the proposed approach is able to identify candidate coresets with better accuracy and generality than state-of-the-art coreset discovery algorithms found in literature.
2019
9781450367486
File in questo prodotto:
File Dimensione Formato  
3319619.3326846.pdf

non disponibili

Tipologia: 2a Post-print versione editoriale / Version of Record
Licenza: Non Pubblico - Accesso privato/ristretto
Dimensione 1.39 MB
Formato Adobe PDF
1.39 MB Adobe PDF   Visualizza/Apri   Richiedi una copia
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11583/2980287