In machine learning a coreset is defined as a subset of the training set using which an algorithm obtains performances similar to what it would deliver if trained over the whole original data. Advantages of coresets include improving training speed and easing human understanding. Coreset discovery is an open line of research as limiting the training might also impair the quality of the result. Differently, virtual points, here called archetypes, might be far more informative for a machine learning algorithm. Starting from this intuition, a novel evolutionary approach to archetype set discovery is presented: starting from a population seeded with candidate coresets, a multi-objective evolutionary algorithm is set to modify them and eventually create archetype sets, to minimize both number of points in the set and classification error. Experimental results on popular benchmarks show that the proposed approach is able to deliver results that allow a classifier to obtain lower error and better ability of generalizing on unseen data than state-of-the-art coreset discovery techniques.
Beyond coreset discovery: evolutionary archetypes / Barbiero, Pietro; Squillero, Giovanni; Tonda, Alberto. - (2019), pp. 47-48. (Intervento presentato al convegno GECCO'19 tenutosi a Prague (Czech Republic) nel July 13 - 17, 2019) [10.1145/3319619.3326789].
Beyond coreset discovery: evolutionary archetypes
Squillero, Giovanni;Tonda, Alberto
2019
Abstract
In machine learning a coreset is defined as a subset of the training set using which an algorithm obtains performances similar to what it would deliver if trained over the whole original data. Advantages of coresets include improving training speed and easing human understanding. Coreset discovery is an open line of research as limiting the training might also impair the quality of the result. Differently, virtual points, here called archetypes, might be far more informative for a machine learning algorithm. Starting from this intuition, a novel evolutionary approach to archetype set discovery is presented: starting from a population seeded with candidate coresets, a multi-objective evolutionary algorithm is set to modify them and eventually create archetype sets, to minimize both number of points in the set and classification error. Experimental results on popular benchmarks show that the proposed approach is able to deliver results that allow a classifier to obtain lower error and better ability of generalizing on unseen data than state-of-the-art coreset discovery techniques.File | Dimensione | Formato | |
---|---|---|---|
3319619.3326789.pdf
non disponibili
Tipologia:
2a Post-print versione editoriale / Version of Record
Licenza:
Non Pubblico - Accesso privato/ristretto
Dimensione
555.03 kB
Formato
Adobe PDF
|
555.03 kB | Adobe PDF | Visualizza/Apri Richiedi una copia |
Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/11583/2980285