In the field of machine learning, coresets are defined as subsets of the training set that can be used to obtain a good approximation of the behavior that a given algorithm would have on the whole training set. Advantages of using coresets instead of the training set include improving training speed and allowing for a better human understanding of the dataset. Not surprisingly, coreset discovery is an active research line, with several notable contributions in literature. Nevertheless, restricting the search for representative samples to the available data points might impair the final result. In this work, neural networks are used to create sets of virtual data points, named archetypes, with the objective to represent the information contained in a training set, in the same way a coreset does. Starting from a given training set, a hierarchical clustering neural network is trained and the weight vectors of the leaves are used as archetypes on which the classifiers are trained. Experimental results on several benchmarks show that the proposed approach is competitive with traditional coreset discovery techniques, delivering results with higher accuracy, and showing a greater ability to generalize to unseen test data.

Discovering Hierarchical Neural Archetype Sets / Ciravegna, Gabriele; Barbiero, Pietro; Cirrincione, Giansalvo; Squillero, Giovanni; Tonda, Alberto (SMART INNOVATION, SYSTEMS AND TECHNOLOGIES). - In: Progresses in Artificial Intelligence and Neural SystemsSTAMPA. - Cham : Springer, 2020. - ISBN 978-981-15-5092-8. - pp. 255-267 [10.1007/978-981-15-5093-5_24]

Discovering Hierarchical Neural Archetype Sets

Ciravegna, Gabriele;Squillero, Giovanni;
2020

Abstract

In the field of machine learning, coresets are defined as subsets of the training set that can be used to obtain a good approximation of the behavior that a given algorithm would have on the whole training set. Advantages of using coresets instead of the training set include improving training speed and allowing for a better human understanding of the dataset. Not surprisingly, coreset discovery is an active research line, with several notable contributions in literature. Nevertheless, restricting the search for representative samples to the available data points might impair the final result. In this work, neural networks are used to create sets of virtual data points, named archetypes, with the objective to represent the information contained in a training set, in the same way a coreset does. Starting from a given training set, a hierarchical clustering neural network is trained and the weight vectors of the leaves are used as archetypes on which the classifiers are trained. Experimental results on several benchmarks show that the proposed approach is competitive with traditional coreset discovery techniques, delivering results with higher accuracy, and showing a greater ability to generalize to unseen test data.
2020
978-981-15-5092-8
978-981-15-5093-5
Progresses in Artificial Intelligence and Neural Systems
File in questo prodotto:
File Dimensione Formato  
civ2020.compressed.pdf

non disponibili

Tipologia: 2a Post-print versione editoriale / Version of Record
Licenza: Non Pubblico - Accesso privato/ristretto
Dimensione 5.51 MB
Formato Adobe PDF
5.51 MB Adobe PDF   Visualizza/Apri   Richiedi una copia
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11583/2846432