Automating predictive machine learning entails the capability of properly triggering the update of the trained models. To this aim, the degradation of predictive models has to be continuously evaluated over time to detect data distribution drifts between the original training set and the new data. Traditionally, prediction performance is used as a degradation metric. However, prediction quality indices require ground-truth class labels to be known for the newly classified data, making them unsuitable for real-time applications, as ground-truth labels might be totally absent or be available only later. In this paper, we propose a novel unsupervised methodology to automatically detect prediction-quality degradation of machine learning models. Thanks to the unsupervised approach and a novel scalable estimation technique, we provide an effective and efficient solution to the above-mentioned problem with soft real-time constraints. Specifically, our approach is able to detect class-based concept drift, i.e., when new data contain samples that do not fit the set of class labels known by the currently-trained predictive model. Experiments on synthetic and real-world public datasets show the effectiveness of the proposed methodology in automatically detecting and describing concept drift caused by changes in the class-label data distributions. Thanks to its scalability performance, the proposed approach is suitable for soft real-time applications such as predictive maintenance, Industry 4.0, and text mining.

Towards a real-time unsupervised estimation of predictive model degradation / Cerquitelli, Tania; Proto, Stefano; Ventura, Francesco; Apiletti, Daniele; Baralis, ELENA MARIA. - ELETTRONICO. - (2019). (Intervento presentato al convegno BIRTE '19 International Workshop on Real-Time Business Intelligence and Analytics tenutosi a Los Angeles, CA, USA nel August 26, 2019) [10.1145/3350489.3350494].

Towards a real-time unsupervised estimation of predictive model degradation

Tania Cerquitelli;Stefano Proto;Francesco Ventura;Daniele Apiletti;Elena Baralis
2019

Abstract

Automating predictive machine learning entails the capability of properly triggering the update of the trained models. To this aim, the degradation of predictive models has to be continuously evaluated over time to detect data distribution drifts between the original training set and the new data. Traditionally, prediction performance is used as a degradation metric. However, prediction quality indices require ground-truth class labels to be known for the newly classified data, making them unsuitable for real-time applications, as ground-truth labels might be totally absent or be available only later. In this paper, we propose a novel unsupervised methodology to automatically detect prediction-quality degradation of machine learning models. Thanks to the unsupervised approach and a novel scalable estimation technique, we provide an effective and efficient solution to the above-mentioned problem with soft real-time constraints. Specifically, our approach is able to detect class-based concept drift, i.e., when new data contain samples that do not fit the set of class labels known by the currently-trained predictive model. Experiments on synthetic and real-world public datasets show the effectiveness of the proposed methodology in automatically detecting and describing concept drift caused by changes in the class-label data distributions. Thanks to its scalability performance, the proposed approach is suitable for soft real-time applications such as predictive maintenance, Industry 4.0, and text mining.
2019
978-1-4503-7660-0
File in questo prodotto:
Non ci sono file associati a questo prodotto.
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11583/2749759
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo