Recent technological advancements have enabled generating and collecting huge amounts of data in a daily manner. This data is used for different purposes that may impact us on an unprecedented scale. Understanding the data, including detecting its outliers, is a critical step before utilizing it. Outlier detection has been studied well in the literature but the existing approaches fail to scale to these very large settings. In this paper, we propose DBSCOUT, an efficient exact algorithm for outlier detection with a linear complexity that can run in parallel over multiple independent machines, making it a fit for the settings with billions of tuples. Besides the theoretical analysis, our experiment results confirm orders of magnitude improvement over the existing work, proving the efficiency, scalability, and effectiveness of our approach.

DBSCOUT: A Density-based Method for Scalable Outlier Detection in Very Large Datasets / Corain, Matteo; Garza, Paolo; Asudeh, Abolfazl. - ELETTRONICO. - (2021), pp. 37-48. ((Intervento presentato al convegno 2021 IEEE 37th International Conference on Data Engineering (ICDE) tenutosi a Chania, Greece nel 19-22 April 2021 [10.1109/ICDE51399.2021.00011].

DBSCOUT: A Density-based Method for Scalable Outlier Detection in Very Large Datasets

Corain, Matteo;Garza, Paolo;
2021

Abstract

Recent technological advancements have enabled generating and collecting huge amounts of data in a daily manner. This data is used for different purposes that may impact us on an unprecedented scale. Understanding the data, including detecting its outliers, is a critical step before utilizing it. Outlier detection has been studied well in the literature but the existing approaches fail to scale to these very large settings. In this paper, we propose DBSCOUT, an efficient exact algorithm for outlier detection with a linear complexity that can run in parallel over multiple independent machines, making it a fit for the settings with billions of tuples. Besides the theoretical analysis, our experiment results confirm orders of magnitude improvement over the existing work, proving the efficiency, scalability, and effectiveness of our approach.
978-1-7281-9184-3
File in questo prodotto:
File Dimensione Formato  
DBSCOUT.pdf

non disponibili

Descrizione: Versione post-print dell'articolo
Tipologia: 2a Post-print versione editoriale / Version of Record
Licenza: Non Pubblico - Accesso privato/ristretto
Dimensione 1.83 MB
Formato Adobe PDF
1.83 MB Adobe PDF   Visualizza/Apri   Richiedi una copia
DBSCOUTAcceptedVersion.pdf

accesso aperto

Descrizione: Versione articolo accettato
Tipologia: 2. Post-print / Author's Accepted Manuscript
Licenza: PUBBLICO - Tutti i diritti riservati
Dimensione 1.82 MB
Formato Adobe PDF
1.82 MB Adobe PDF Visualizza/Apri
Pubblicazioni consigliate

Caricamento pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: http://hdl.handle.net/11583/2912196