The training of supervised Machine Learning (ML) and Artificial Intelligence (AI) algorithms is strongly affected by the goodness of the input data. To this end, this paper proposes an innovative synthetic ground truth generation algorithm. The methodology is based on applying a data reduction with Symbolic Aggregate Approximation (SAX). In addition, a Classification And Regression Tree (CART) is employed to identify the best granularity of the data reduction. The proposed algorithm has been applied to telecommunication (TLC) sites dataset by analyzing their electricity consumption patterns. The presented approach substantially reduced the dispersion of the dataset compared to the raw dataset, thus reducing the effort required to train the supervised algorithms.
Synthetic Ground Truth Generation of an Electricity Consumption Dataset / Mascali, Lorenzo; Eiraudo, Simone; Barbierato, Luca; Schiera, Daniele Salvatore; Giannantonio, Roberta; Patti, Edoardo; Bottaccioli, Lorenzo; Lanzini, Andrea. - (2022), pp. 1-6. (Intervento presentato al convegno 5th International Conference on Smart Energy Systems and Technologies (SEST 2022) tenutosi a Eindhoven (The Netherlands) nel 5-7 September, 2022) [10.1109/SEST53650.2022.9898444].
Synthetic Ground Truth Generation of an Electricity Consumption Dataset
Mascali, Lorenzo;Eiraudo, Simone;Barbierato, Luca;Schiera, Daniele Salvatore;Giannantonio, Roberta;Patti, Edoardo;Bottaccioli, Lorenzo;Lanzini, Andrea
2022
Abstract
The training of supervised Machine Learning (ML) and Artificial Intelligence (AI) algorithms is strongly affected by the goodness of the input data. To this end, this paper proposes an innovative synthetic ground truth generation algorithm. The methodology is based on applying a data reduction with Symbolic Aggregate Approximation (SAX). In addition, a Classification And Regression Tree (CART) is employed to identify the best granularity of the data reduction. The proposed algorithm has been applied to telecommunication (TLC) sites dataset by analyzing their electricity consumption patterns. The presented approach substantially reduced the dispersion of the dataset compared to the raw dataset, thus reducing the effort required to train the supervised algorithms.File | Dimensione | Formato | |
---|---|---|---|
SEST_2022___Lorenzo_Mascali_NO COPYRIGHT.pdf
accesso aperto
Tipologia:
2. Post-print / Author's Accepted Manuscript
Licenza:
Pubblico - Tutti i diritti riservati
Dimensione
1.11 MB
Formato
Adobe PDF
|
1.11 MB | Adobe PDF | Visualizza/Apri |
Synthetic_Ground_Truth_Generation_of_an_Electricity_Consumption_Dataset.pdf
accesso riservato
Tipologia:
2a Post-print versione editoriale / Version of Record
Licenza:
Non Pubblico - Accesso privato/ristretto
Dimensione
1.16 MB
Formato
Adobe PDF
|
1.16 MB | Adobe PDF | Visualizza/Apri Richiedi una copia |
Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/11583/2971834