The training of supervised Machine Learning (ML) and Artificial Intelligence (AI) algorithms is strongly affected by the goodness of the input data. To this end, this paper proposes an innovative synthetic ground truth generation algorithm. The methodology is based on applying a data reduction with Symbolic Aggregate Approximation (SAX). In addition, a Classification And Regression Tree (CART) is employed to identify the best granularity of the data reduction. The proposed algorithm has been applied to telecommunication (TLC) sites dataset by analyzing their electricity consumption patterns. The presented approach substantially reduced the dispersion of the dataset compared to the raw dataset, thus reducing the effort required to train the supervised algorithms.

Synthetic Ground Truth Generation of an Electricity Consumption Dataset / Mascali, Lorenzo; Eiraudo, Simone; Barbierato, Luca; Schiera, Daniele Salvatore; Giannantonio, Roberta; Patti, Edoardo; Bottaccioli, Lorenzo; Lanzini, Andrea. - (2022), pp. 1-6. ((Intervento presentato al convegno 5th International Conference on Smart Energy Systems and Technologies (SEST 2022) tenutosi a Eindhoven (The Netherlands) nel 5-7 September, 2022 [10.1109/SEST53650.2022.9898444].

Synthetic Ground Truth Generation of an Electricity Consumption Dataset

Mascali, Lorenzo;Eiraudo, Simone;Barbierato, Luca;Schiera, Daniele Salvatore;Giannantonio, Roberta;Patti, Edoardo;Bottaccioli, Lorenzo;Lanzini, Andrea
2022

Abstract

The training of supervised Machine Learning (ML) and Artificial Intelligence (AI) algorithms is strongly affected by the goodness of the input data. To this end, this paper proposes an innovative synthetic ground truth generation algorithm. The methodology is based on applying a data reduction with Symbolic Aggregate Approximation (SAX). In addition, a Classification And Regression Tree (CART) is employed to identify the best granularity of the data reduction. The proposed algorithm has been applied to telecommunication (TLC) sites dataset by analyzing their electricity consumption patterns. The presented approach substantially reduced the dispersion of the dataset compared to the raw dataset, thus reducing the effort required to train the supervised algorithms.
978-1-6654-0557-7
File in questo prodotto:
File Dimensione Formato  
SEST_2022___Lorenzo_Mascali_NO COPYRIGHT.pdf

accesso aperto

Tipologia: 2. Post-print / Author's Accepted Manuscript
Licenza: PUBBLICO - Tutti i diritti riservati
Dimensione 1.11 MB
Formato Adobe PDF
1.11 MB Adobe PDF Visualizza/Apri
Synthetic_Ground_Truth_Generation_of_an_Electricity_Consumption_Dataset.pdf

non disponibili

Tipologia: 2a Post-print versione editoriale / Version of Record
Licenza: Non Pubblico - Accesso privato/ristretto
Dimensione 1.16 MB
Formato Adobe PDF
1.16 MB Adobe PDF   Visualizza/Apri   Richiedi una copia
Pubblicazioni consigliate

Caricamento pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11583/2971834