In recent years, data-driven approaches have attracted the interest of the research community. Considering network monitoring, unsupervised machine learning solutions such as clustering are particularly appealing to let the network analysts observe patterns, and track the evolution of traffic over time. In this paper, we present a novel unsupervised methodology to automatically process and analyze batches of HTTP traffic, looking just at the URL structure. First, we describe IDBSCAN, Iterative-DBSCAN. We design it to obtain well-shaped clusters, and to simplify the choice of parameters — often a cumbersome step for the network analyst. Second, we show LENTA, Longitudinal Exploration for Network Traffic Analysis, which allows to automatically observe the evolution over time of traffic, naturally highlighting trends and pinpointing anomalies. We first evaluate IDBSCAN and LENTA on synthetic data to compare their performance against well-known algorithms. Then we apply them on a real case, facing the analysis of hundred thousands of URLs collected from a live network. Results show both the goodness of clusters produced by IDBSCAN and LENTA ability to highlight changes in traffic, facilitating the analyst job.

Clustering and evolutionary approach for longitudinal web traffic analysis / Morichetta, Andrea; Mellia, Marco. - In: PERFORMANCE EVALUATION. - ISSN 0166-5316. - STAMPA. - 135:(2019), p. 102033. [10.1016/j.peva.2019.102033]

Clustering and evolutionary approach for longitudinal web traffic analysis

Morichetta, Andrea;Mellia, Marco
2019

Abstract

In recent years, data-driven approaches have attracted the interest of the research community. Considering network monitoring, unsupervised machine learning solutions such as clustering are particularly appealing to let the network analysts observe patterns, and track the evolution of traffic over time. In this paper, we present a novel unsupervised methodology to automatically process and analyze batches of HTTP traffic, looking just at the URL structure. First, we describe IDBSCAN, Iterative-DBSCAN. We design it to obtain well-shaped clusters, and to simplify the choice of parameters — often a cumbersome step for the network analyst. Second, we show LENTA, Longitudinal Exploration for Network Traffic Analysis, which allows to automatically observe the evolution over time of traffic, naturally highlighting trends and pinpointing anomalies. We first evaluate IDBSCAN and LENTA on synthetic data to compare their performance against well-known algorithms. Then we apply them on a real case, facing the analysis of hundred thousands of URLs collected from a live network. Results show both the goodness of clusters produced by IDBSCAN and LENTA ability to highlight changes in traffic, facilitating the analyst job.
File in questo prodotto:
File Dimensione Formato  
2019_LENTA_PEVA.pdf

Open Access dal 31/08/2021

Descrizione: Preprint del camera ready
Tipologia: 2. Post-print / Author's Accepted Manuscript
Licenza: Creative commons
Dimensione 911.01 kB
Formato Adobe PDF
911.01 kB Adobe PDF Visualizza/Apri
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11583/2750474
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo