The widespread adoption of encryption in computer network traffic is increasing the difficulty of analyzing such traffic for security purposes. The data set presented in this data article is composed of network statistics computed on captures of TCP flows, originated by executing various network stress and web crawling tools, along with statistics of benign web browsing traffic. Furthermore, this data article describes a set of Machine Learning models, trained using the described data set, which can classify network traffic by the tool category (network stress tool, web crawler, web browser), the specific tool (e.g., Firefox), and also the tool version (e.g., Firefox 68) used to generate it. These models are compatible with the analysis of traffic with encrypted payload since statistics are evaluated only on the TCP headers of the packets. The data presented in this article can be useful to train and assess the performance of new Machine Learning models for tool classification.

Data set and machine learning models for the classification of network traffic originators / Canavese, D.; Regano, L.; Basile, C.; Ciravegna, G.; Lioy, A.. - In: DATA IN BRIEF. - ISSN 2352-3409. - ELETTRONICO. - 41:(2022), p. 107968. [10.1016/j.dib.2022.107968]

Data set and machine learning models for the classification of network traffic originators

Canavese D.;Regano L.;Basile C.;Ciravegna G.;Lioy A.
2022

Abstract

The widespread adoption of encryption in computer network traffic is increasing the difficulty of analyzing such traffic for security purposes. The data set presented in this data article is composed of network statistics computed on captures of TCP flows, originated by executing various network stress and web crawling tools, along with statistics of benign web browsing traffic. Furthermore, this data article describes a set of Machine Learning models, trained using the described data set, which can classify network traffic by the tool category (network stress tool, web crawler, web browser), the specific tool (e.g., Firefox), and also the tool version (e.g., Firefox 68) used to generate it. These models are compatible with the analysis of traffic with encrypted payload since statistics are evaluated only on the TCP headers of the packets. The data presented in this article can be useful to train and assess the performance of new Machine Learning models for tool classification.
2022
File in questo prodotto:
File Dimensione Formato  
1-s2.0-S2352340922001792-main.pdf

accesso aperto

Descrizione: PDF of the editor version (open-access)
Tipologia: 2a Post-print versione editoriale / Version of Record
Licenza: Creative commons
Dimensione 899.63 kB
Formato Adobe PDF
899.63 kB Adobe PDF Visualizza/Apri
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11583/2962162