Network security relies on effective measurements and analysis for identifying malicious traffic. Recent proposals aim at automatically learning compact and informative representations (i.e. embeddings) of network traffic that capture salient features. These representations can serve multiple downstream tasks, streamlining the machine learning pipeline. Researchers have proposed techniques borrowed from Natural Language Processing (NLP) and Graph Neural Networks (GNN) to learn such embeddings, with both lines delivering promising results. This paper investigates the benefits of combining comple-mentary sources of information represented by embeddings learnt via different techniques and from different data. We rely on classifiers based on traditional features engineering and on automatic embedding generation (borrowing from NLP and GNN) to classify hosts observed from darknets and honeypots. We then stack these base classifiers trained on each embedding through meta-learning to combine the complementary information sources to improve performance. Our results show that meta-learning outperforms each single classifier. Importantly, the proposed meta-learner provides explainability on the importance of the embedding types and the impact of each data source on the outcome. All in all, this work is a step forward in the search for more effective, general, understandable, and practical representations that could carry multiple traffic characteristics.
Explainable Stacking Models based on Complementary Traffic Embeddings / Gioacchini, Luca; Santos, Welton; Lopes, Barbara; Drago, Idilio; Mellia, Marco; Almeida Jussara, M.; Gonçalves Marcos, André. - (2024), pp. 261-272. (Intervento presentato al convegno 2024 IEEE European Symposium on Security and Privacy Workshops (EuroS&PW) tenutosi a Vienna (AUT) nel 08-12 July 2024) [10.1109/EuroSPW61312.2024.00035].
Explainable Stacking Models based on Complementary Traffic Embeddings
Gioacchini Luca;Drago Idilio;Mellia Marco;
2024
Abstract
Network security relies on effective measurements and analysis for identifying malicious traffic. Recent proposals aim at automatically learning compact and informative representations (i.e. embeddings) of network traffic that capture salient features. These representations can serve multiple downstream tasks, streamlining the machine learning pipeline. Researchers have proposed techniques borrowed from Natural Language Processing (NLP) and Graph Neural Networks (GNN) to learn such embeddings, with both lines delivering promising results. This paper investigates the benefits of combining comple-mentary sources of information represented by embeddings learnt via different techniques and from different data. We rely on classifiers based on traditional features engineering and on automatic embedding generation (borrowing from NLP and GNN) to classify hosts observed from darknets and honeypots. We then stack these base classifiers trained on each embedding through meta-learning to combine the complementary information sources to improve performance. Our results show that meta-learning outperforms each single classifier. Importantly, the proposed meta-learner provides explainability on the importance of the embedding types and the impact of each data source on the outcome. All in all, this work is a step forward in the search for more effective, general, understandable, and practical representations that could carry multiple traffic characteristics.File | Dimensione | Formato | |
---|---|---|---|
2024_WTMC_Stacking.pdf
accesso aperto
Descrizione: Post Print Paper Version
Tipologia:
2. Post-print / Author's Accepted Manuscript
Licenza:
PUBBLICO - Tutti i diritti riservati
Dimensione
591.91 kB
Formato
Adobe PDF
|
591.91 kB | Adobe PDF | Visualizza/Apri |
Explainable_Stacking_Models_based_on_Complementary_Traffic_Embeddings.pdf
non disponibili
Tipologia:
2a Post-print versione editoriale / Version of Record
Licenza:
Non Pubblico - Accesso privato/ristretto
Dimensione
659.09 kB
Formato
Adobe PDF
|
659.09 kB | Adobe PDF | Visualizza/Apri Richiedi una copia |
Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/11583/2991923