When a machine learning algorithm is able to obtain the same performance given a complete training set, and a small subset of samples from the same training set, the subset is termed coreset. As using a coreset improves training speed and allows human experts to gain a better understanding of the data, by reducing the number of samples to be examined, coreset discovery is an active line of research. Often in literature the problem of coreset discovery is framed as i. single-objective, attempting to find the candidate coreset that best represents the training set, and ii. independent from the machine learning algorithm used. In this work, an approach to evolutionary coreset discovery is presented. Building on preliminary results, the proposed approach uses a multi-objective evolutionary algorithm to find compromises between two conflicting objectives, i. minimizing the number of samples in a candidate coreset, and ii. maximizing the accuracy of a target classifier, trained with the coreset, on the whole original training set. Experimental results on popular classification benchmarks show that the proposed approach is able to identify candidate coresets with better accuracy and generality than state-of-the-art coreset discovery algorithms found in literature.
Evolutionary discovery of coresets for classification / Barbiero, Pietro; Squillero, Giovanni; Tonda, Alberto. - (2019), pp. 1747-1754. (Intervento presentato al convegno GECCO'19 tenutosi a Prague (Czech Republic) nel July 13 - 17, 2019) [10.1145/3319619.3326846].
Evolutionary discovery of coresets for classification
Squillero, Giovanni;Tonda, Alberto
2019
Abstract
When a machine learning algorithm is able to obtain the same performance given a complete training set, and a small subset of samples from the same training set, the subset is termed coreset. As using a coreset improves training speed and allows human experts to gain a better understanding of the data, by reducing the number of samples to be examined, coreset discovery is an active line of research. Often in literature the problem of coreset discovery is framed as i. single-objective, attempting to find the candidate coreset that best represents the training set, and ii. independent from the machine learning algorithm used. In this work, an approach to evolutionary coreset discovery is presented. Building on preliminary results, the proposed approach uses a multi-objective evolutionary algorithm to find compromises between two conflicting objectives, i. minimizing the number of samples in a candidate coreset, and ii. maximizing the accuracy of a target classifier, trained with the coreset, on the whole original training set. Experimental results on popular classification benchmarks show that the proposed approach is able to identify candidate coresets with better accuracy and generality than state-of-the-art coreset discovery algorithms found in literature.File | Dimensione | Formato | |
---|---|---|---|
3319619.3326846.pdf
accesso riservato
Tipologia:
2a Post-print versione editoriale / Version of Record
Licenza:
Non Pubblico - Accesso privato/ristretto
Dimensione
1.39 MB
Formato
Adobe PDF
|
1.39 MB | Adobe PDF | Visualizza/Apri Richiedi una copia |
Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/11583/2980287