Network security relies on effective measurements and analysis for identifying malicious traffic. Recent proposals aim at automatically learning compact and informative representations (i.e. embeddings) of network traffic that capture salient features. These representations can serve multiple downstream tasks, streamlining the machine learning pipeline. Researchers have proposed techniques borrowed from Natural Language Processing (NLP) and Graph Neural Networks (GNN) to learn such embeddings, with both lines delivering promising results. This paper investigates the benefits of combining comple-mentary sources of information represented by embeddings learnt via different techniques and from different data. We rely on classifiers based on traditional features engineering and on automatic embedding generation (borrowing from NLP and GNN) to classify hosts observed from darknets and honeypots. We then stack these base classifiers trained on each embedding through meta-learning to combine the complementary information sources to improve performance. Our results show that meta-learning outperforms each single classifier. Importantly, the proposed meta-learner provides explainability on the importance of the embedding types and the impact of each data source on the outcome. All in all, this work is a step forward in the search for more effective, general, understandable, and practical representations that could carry multiple traffic characteristics.

Explainable Stacking Models based on Complementary Traffic Embeddings / Gioacchini, Luca; Santos, Welton; Lopes, Barbara; Drago, Idilio; Mellia, Marco; Almeida Jussara, M.; Gonçalves Marcos, André. - (2024), pp. 261-272. (Intervento presentato al convegno 2024 IEEE European Symposium on Security and Privacy Workshops (EuroS&PW) tenutosi a Vienna (AUT) nel 08-12 July 2024) [10.1109/EuroSPW61312.2024.00035].

Explainable Stacking Models based on Complementary Traffic Embeddings

Gioacchini Luca;Drago Idilio;Mellia Marco;
2024

Abstract

Network security relies on effective measurements and analysis for identifying malicious traffic. Recent proposals aim at automatically learning compact and informative representations (i.e. embeddings) of network traffic that capture salient features. These representations can serve multiple downstream tasks, streamlining the machine learning pipeline. Researchers have proposed techniques borrowed from Natural Language Processing (NLP) and Graph Neural Networks (GNN) to learn such embeddings, with both lines delivering promising results. This paper investigates the benefits of combining comple-mentary sources of information represented by embeddings learnt via different techniques and from different data. We rely on classifiers based on traditional features engineering and on automatic embedding generation (borrowing from NLP and GNN) to classify hosts observed from darknets and honeypots. We then stack these base classifiers trained on each embedding through meta-learning to combine the complementary information sources to improve performance. Our results show that meta-learning outperforms each single classifier. Importantly, the proposed meta-learner provides explainability on the importance of the embedding types and the impact of each data source on the outcome. All in all, this work is a step forward in the search for more effective, general, understandable, and practical representations that could carry multiple traffic characteristics.
2024
979-8-3503-6729-4
File in questo prodotto:
File Dimensione Formato  
2024_WTMC_Stacking.pdf

accesso aperto

Descrizione: Post Print Paper Version
Tipologia: 2. Post-print / Author's Accepted Manuscript
Licenza: PUBBLICO - Tutti i diritti riservati
Dimensione 591.91 kB
Formato Adobe PDF
591.91 kB Adobe PDF Visualizza/Apri
Explainable_Stacking_Models_based_on_Complementary_Traffic_Embeddings.pdf

non disponibili

Tipologia: 2a Post-print versione editoriale / Version of Record
Licenza: Non Pubblico - Accesso privato/ristretto
Dimensione 659.09 kB
Formato Adobe PDF
659.09 kB Adobe PDF   Visualizza/Apri   Richiedi una copia
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11583/2991923