Concept drift is a common phenomenon in data streams where the statistical properties of the target variable change over time. Traditionally, drift is assumed to occur globally, affecting the entire dataset uniformly. However, this assumption does not always hold true in real-world scenarios where only specific subpopulations within the data may experience drift. This paper explores the concept of localized drift and evaluates the performance of several drift detection techniques in identifying such localized changes. We introduce a synthetic dataset based on the Agrawal generator, where drift is induced in a randomly chosen subgroup. Our experiments demonstrate that commonly adopted drift detection methods may fail to detect drift when it is confined to a small subpopulation. We propose and test various drift detection approaches to quantify their effectiveness in this localized drift scenario. We make the source code for the generation of the synthetic benchmark available at https://github.com/fgiobergia/subgroup-agrawal-drift.

A Synthetic Benchmark to Explore Limitations of Localized Drift Detections / Giobergia, Flavio; Pastor, Eliana; de Alfaro, Luca; Baralis, Elena. - 15013:(2025), pp. 101-110. (Intervento presentato al convegno DELTA 2024 - Workshop at KDD 2024 tenutosi a Barcelona (ESP) nel August 26, 2024) [10.1007/978-3-031-82346-6_7].

A Synthetic Benchmark to Explore Limitations of Localized Drift Detections

Giobergia, Flavio;Pastor, Eliana;Baralis, Elena
2025

Abstract

Concept drift is a common phenomenon in data streams where the statistical properties of the target variable change over time. Traditionally, drift is assumed to occur globally, affecting the entire dataset uniformly. However, this assumption does not always hold true in real-world scenarios where only specific subpopulations within the data may experience drift. This paper explores the concept of localized drift and evaluates the performance of several drift detection techniques in identifying such localized changes. We introduce a synthetic dataset based on the Agrawal generator, where drift is induced in a randomly chosen subgroup. Our experiments demonstrate that commonly adopted drift detection methods may fail to detect drift when it is confined to a small subpopulation. We propose and test various drift detection approaches to quantify their effectiveness in this localized drift scenario. We make the source code for the generation of the synthetic benchmark available at https://github.com/fgiobergia/subgroup-agrawal-drift.
2025
9783031823459
9783031823466
File in questo prodotto:
File Dimensione Formato  
Giobergia_2025_a_synthetic.pdf

accesso riservato

Tipologia: 2a Post-print versione editoriale / Version of Record
Licenza: Non Pubblico - Accesso privato/ristretto
Dimensione 1.29 MB
Formato Adobe PDF
1.29 MB Adobe PDF   Visualizza/Apri   Richiedi una copia
2408.14687v1.pdf

accesso riservato

Tipologia: 1. Preprint / submitted version [pre- review]
Licenza: Non Pubblico - Accesso privato/ristretto
Dimensione 721.12 kB
Formato Adobe PDF
721.12 kB Adobe PDF   Visualizza/Apri   Richiedi una copia
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11583/2998005