Most of existing cardinality estimation algorithms do not support natively interval queries under a sliding window model and are thereby insensitive to data recency. We present Staggered-HyperLogLog (ST-HLL), a probabilistic data structure that takes inspiration from HyperLogLog (HLL) and provides nearly continuous-time estimation of cardinality rates, rather than absolute counts. Our solution has zero-bit overhead with respect to vanilla HLL and negligible additional computational complexity. It is based on a periodic staggered reset of HLL registers and a register equalization operation at query times to compensate for staggered counting. We tested ST-HLL over both synthetic and real Internet traffic traces, showing its ability to track variations of the flow cardinality, quickly adapting to variations under non-stationary flow arrival processes. We show that for the same amount of memory footprint, our algorithm improves the accuracy up to a factor 2x with respect to the state-of-the-art solution, Sliding HLL.

Staggered HLL: Near-continuous-time cardinality estimation with no overhead / Cornacchia, Alessandro; Bianchi, Giuseppe; Bianco, Andrea; Giaccone, Paolo. - In: COMPUTER COMMUNICATIONS. - ISSN 0140-3664. - STAMPA. - 193:(2022), pp. 168-175. [10.1016/j.comcom.2022.06.038]

Staggered HLL: Near-continuous-time cardinality estimation with no overhead

Cornacchia, Alessandro;Bianco, Andrea;Giaccone, Paolo
2022

Abstract

Most of existing cardinality estimation algorithms do not support natively interval queries under a sliding window model and are thereby insensitive to data recency. We present Staggered-HyperLogLog (ST-HLL), a probabilistic data structure that takes inspiration from HyperLogLog (HLL) and provides nearly continuous-time estimation of cardinality rates, rather than absolute counts. Our solution has zero-bit overhead with respect to vanilla HLL and negligible additional computational complexity. It is based on a periodic staggered reset of HLL registers and a register equalization operation at query times to compensate for staggered counting. We tested ST-HLL over both synthetic and real Internet traffic traces, showing its ability to track variations of the flow cardinality, quickly adapting to variations under non-stationary flow arrival processes. We show that for the same amount of memory footprint, our algorithm improves the accuracy up to a factor 2x with respect to the state-of-the-art solution, Sliding HLL.
File in questo prodotto:
File Dimensione Formato  
1-s2.0-S0140366422002407-main.pdf

non disponibili

Descrizione: Published version
Tipologia: 2a Post-print versione editoriale / Version of Record
Licenza: Non Pubblico - Accesso privato/ristretto
Dimensione 1.06 MB
Formato Adobe PDF
1.06 MB Adobe PDF   Visualizza/Apri   Richiedi una copia
post-print.pdf

Open Access dal 28/06/2024

Descrizione: Accepted version
Tipologia: 2. Post-print / Author's Accepted Manuscript
Licenza: Creative commons
Dimensione 1.09 MB
Formato Adobe PDF
1.09 MB Adobe PDF Visualizza/Apri
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11583/2970067