Network telescopes are ranges of IP addresses with nothing connected. They are contacted by botnets and scanners that look for possible victims. Each telescope exposes a partial view, and merging the information with that coming from other telescopes is fundamental. Machine learning allows us to build models to solve classification tasks automatically. However, the continuous evolution of traffic calls for a continuous update of such a model. This work explores applying collaborative Artificial Intelligence solutions via Federated Learning (FL) to build a global model without sharing the raw (and sensitive) data, also limiting data exchange. We leverage a two-stage pipeline: (i) a self-supervised upstream task generates and updates an incremental compact representation of the senders hitting the telescope; (ii) such embeddings serve as input for a downstream classification task to identify possible offenders. We compare the embedding that a single telescope generates with those obtained via FL from data collected by multiple telescopes and evaluate the benefits of the incremental approach. We show that FL can produce embeddings of better quality than a single network telescope can, increasing the model accuracy (+6%) and coverage (+12%) while limiting the amount of data exchanged (from GBs to MBs)

Incremental Federated Host Embeddings for Network Telescopes Traffic Analysis / Huang, Kai; Gioacchini, Luca; Mellia, Marco; Vassio, Luca. - ELETTRONICO. - (2024), pp. 41-46. (Intervento presentato al convegno IEEE 44th International Conference on Distributed Computing Systems (ICDCS) tenutosi a Jersey City, NJ (USA) nel 23-23 July 2024) [10.1109/icdcsw63686.2024.00013].

Incremental Federated Host Embeddings for Network Telescopes Traffic Analysis

Huang, Kai;Gioacchini, Luca;Mellia, Marco;Vassio, Luca
2024

Abstract

Network telescopes are ranges of IP addresses with nothing connected. They are contacted by botnets and scanners that look for possible victims. Each telescope exposes a partial view, and merging the information with that coming from other telescopes is fundamental. Machine learning allows us to build models to solve classification tasks automatically. However, the continuous evolution of traffic calls for a continuous update of such a model. This work explores applying collaborative Artificial Intelligence solutions via Federated Learning (FL) to build a global model without sharing the raw (and sensitive) data, also limiting data exchange. We leverage a two-stage pipeline: (i) a self-supervised upstream task generates and updates an incremental compact representation of the senders hitting the telescope; (ii) such embeddings serve as input for a downstream classification task to identify possible offenders. We compare the embedding that a single telescope generates with those obtained via FL from data collected by multiple telescopes and evaluate the benefits of the incremental approach. We show that FL can produce embeddings of better quality than a single network telescope can, increasing the model accuracy (+6%) and coverage (+12%) while limiting the amount of data exchanged (from GBs to MBs)
2024
979-8-3503-5471-3
File in questo prodotto:
File Dimensione Formato  
Incremental_Federated_Host_Embeddings_for_Network_Telescopes_Traffic_Analysis.pdf

non disponibili

Descrizione: Versione finale
Tipologia: 2a Post-print versione editoriale / Version of Record
Licenza: Non Pubblico - Accesso privato/ristretto
Dimensione 319.93 kB
Formato Adobe PDF
319.93 kB Adobe PDF   Visualizza/Apri   Richiedi una copia
2024_AIDCS_DarkVec_federated.pdf

accesso aperto

Descrizione: Camera ready
Tipologia: 2. Post-print / Author's Accepted Manuscript
Licenza: PUBBLICO - Tutti i diritti riservati
Dimensione 521.58 kB
Formato Adobe PDF
521.58 kB Adobe PDF Visualizza/Apri
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11583/2992302