Network telescopes are ranges of IP addresses with nothing connected. They are contacted by botnets and scanners that look for possible victims. Each telescope exposes a partial view, and merging the information with that coming from other telescopes is fundamental. Machine learning allows us to build models to solve classification tasks automatically. However, the continuous evolution of traffic calls for a continuous update of such a model. This work explores applying collaborative Artificial Intelligence solutions via Federated Learning (FL) to build a global model without sharing the raw (and sensitive) data, also limiting data exchange. We leverage a two-stage pipeline: (i) a self-supervised upstream task generates and updates an incremental compact representation of the senders hitting the telescope; (ii) such embeddings serve as input for a downstream classification task to identify possible offenders. We compare the embedding that a single telescope generates with those obtained via FL from data collected by multiple telescopes and evaluate the benefits of the incremental approach. We show that FL can produce embeddings of better quality than a single network telescope can, increasing the model accuracy (+6%) and coverage (+12%) while limiting the amount of data exchanged (from GBs to MBs)
Incremental Federated Host Embeddings for Network Telescopes Traffic Analysis / Huang, Kai; Gioacchini, Luca; Mellia, Marco; Vassio, Luca. - ELETTRONICO. - (2024), pp. 41-46. (Intervento presentato al convegno IEEE 44th International Conference on Distributed Computing Systems (ICDCS) tenutosi a Jersey City, NJ (USA) nel 23-23 July 2024) [10.1109/icdcsw63686.2024.00013].
Incremental Federated Host Embeddings for Network Telescopes Traffic Analysis
Huang, Kai;Gioacchini, Luca;Mellia, Marco;Vassio, Luca
2024
Abstract
Network telescopes are ranges of IP addresses with nothing connected. They are contacted by botnets and scanners that look for possible victims. Each telescope exposes a partial view, and merging the information with that coming from other telescopes is fundamental. Machine learning allows us to build models to solve classification tasks automatically. However, the continuous evolution of traffic calls for a continuous update of such a model. This work explores applying collaborative Artificial Intelligence solutions via Federated Learning (FL) to build a global model without sharing the raw (and sensitive) data, also limiting data exchange. We leverage a two-stage pipeline: (i) a self-supervised upstream task generates and updates an incremental compact representation of the senders hitting the telescope; (ii) such embeddings serve as input for a downstream classification task to identify possible offenders. We compare the embedding that a single telescope generates with those obtained via FL from data collected by multiple telescopes and evaluate the benefits of the incremental approach. We show that FL can produce embeddings of better quality than a single network telescope can, increasing the model accuracy (+6%) and coverage (+12%) while limiting the amount of data exchanged (from GBs to MBs)File | Dimensione | Formato | |
---|---|---|---|
Incremental_Federated_Host_Embeddings_for_Network_Telescopes_Traffic_Analysis.pdf
accesso riservato
Descrizione: Versione finale
Tipologia:
2a Post-print versione editoriale / Version of Record
Licenza:
Non Pubblico - Accesso privato/ristretto
Dimensione
319.93 kB
Formato
Adobe PDF
|
319.93 kB | Adobe PDF | Visualizza/Apri Richiedi una copia |
2024_AIDCS_DarkVec_federated.pdf
accesso aperto
Descrizione: Camera ready
Tipologia:
2. Post-print / Author's Accepted Manuscript
Licenza:
Pubblico - Tutti i diritti riservati
Dimensione
521.58 kB
Formato
Adobe PDF
|
521.58 kB | Adobe PDF | Visualizza/Apri |
Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/11583/2992302