In machine learning a coreset is defined as a subset of the training set using which an algorithm obtains performances similar to what it would deliver if trained over the whole original data. Advantages of coresets include improving training speed and easing human understanding. Coreset discovery is an open line of research as limiting the training might also impair the quality of the result. Differently, virtual points, here called archetypes, might be far more informative for a machine learning algorithm. Starting from this intuition, a novel evolutionary approach to archetype set discovery is presented: starting from a population seeded with candidate coresets, a multi-objective evolutionary algorithm is set to modify them and eventually create archetype sets, to minimize both number of points in the set and classification error. Experimental results on popular benchmarks show that the proposed approach is able to deliver results that allow a classifier to obtain lower error and better ability of generalizing on unseen data than state-of-the-art coreset discovery techniques.

Beyond coreset discovery: evolutionary archetypes / Barbiero, Pietro; Squillero, Giovanni; Tonda, Alberto. - (2019), pp. 47-48. (Intervento presentato al convegno GECCO'19 tenutosi a Prague (Czech Republic) nel July 13 - 17, 2019) [10.1145/3319619.3326789].

Beyond coreset discovery: evolutionary archetypes

Squillero, Giovanni;Tonda, Alberto
2019

Abstract

In machine learning a coreset is defined as a subset of the training set using which an algorithm obtains performances similar to what it would deliver if trained over the whole original data. Advantages of coresets include improving training speed and easing human understanding. Coreset discovery is an open line of research as limiting the training might also impair the quality of the result. Differently, virtual points, here called archetypes, might be far more informative for a machine learning algorithm. Starting from this intuition, a novel evolutionary approach to archetype set discovery is presented: starting from a population seeded with candidate coresets, a multi-objective evolutionary algorithm is set to modify them and eventually create archetype sets, to minimize both number of points in the set and classification error. Experimental results on popular benchmarks show that the proposed approach is able to deliver results that allow a classifier to obtain lower error and better ability of generalizing on unseen data than state-of-the-art coreset discovery techniques.
2019
9781450367486
File in questo prodotto:
File Dimensione Formato  
3319619.3326789.pdf

accesso riservato

Tipologia: 2a Post-print versione editoriale / Version of Record
Licenza: Non Pubblico - Accesso privato/ristretto
Dimensione 555.03 kB
Formato Adobe PDF
555.03 kB Adobe PDF   Visualizza/Apri   Richiedi una copia
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11583/2980285