Towards website domain name classification using graph based semi-supervised learning

Faroughi, Azadeh; Morichetta, Andrea; Vassio, Luca; Figueiredo, Flavio; Mellia, Marco; Javidan, Reza

doi:10.1016/j.comnet.2021.107865

In this work, we tackle the problem of classifying websites domain names to a category, e.g., mapping bbc.com to the "News and Media" class. Domain name classification is challenging due to the high number of class labels and the highly skewed class distributions. Differently from prior efforts that need to crawl and use the web pages’ actual content, we rely only on traffic logs passively collected, observing traffic regularly flowing in the network, without the burden to crawl and parse web pages. We exploit the information carried by network logs, using just the name of the websites and the sequence of visited websites by users. For this, we propose and evaluate different classification methods based on machine learning. Using a large dataset with hundreds of thousands of domain names and 25 different categories, we show that semi-supervised learning methods are more suitable for this task than traditional supervised approaches. Using graphs, we incorporate in the classifier aspects not strictly related to the labeled data, and we can classify most of the unlabeled domains. However, in this framework, classification scores are lower than those usually found when exploiting the page-specific content. Our work is the first to perform an extensive evaluation of domain name classification using only passive flow-level logs to the best of our knowledge.

Towards website domain name classification using graph based semi-supervised learning / Faroughi, A., Morichetta, A., Vassio, L., Figueiredo, F., Mellia, M., Javidan, R.. - In: COMPUTER NETWORKS. - ISSN 1389-1286. - ELETTRONICO. - 188 (107865):(2021). [10.1016/j.comnet.2021.107865]

Towards website domain name classification using graph based semi-supervised learning

Faroughi, Azadeh;Morichetta, Andrea;Vassio, Luca;Figueiredo, Flavio;Mellia, Marco;Javidan, Reza

2021

Abstract

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno del prodotto
	
				2021
			
	Codice DOI
	
				https://dx.doi.org/10.1016/j.comnet.2021.107865
			
	Titolo della Rivista
	
				COMPUTER NETWORKS
			
	Appare nelle tipologie
	
				1.1 Articolo in rivista

File in questo prodotto:

File	Dimensione	Formato
1-s2.0-S1389128621000384-main.pdf accesso riservato Descrizione: versione editoriale Tipologia: 2a Post-print versione editoriale / Version of Record Licenza: Non Pubblico - Accesso privato/ristretto Dimensione 1.66 MB Formato Adobe PDF Visualizza/Apri Richiedi una copia	1.66 MB	Adobe PDF	Visualizza/Apri Richiedi una copia
Website_classification_ComNet.pdf Open Access dal 29/01/2023 Tipologia: 2. Post-print / Author's Accepted Manuscript Licenza: Creative commons Dimensione 789.09 kB Formato Adobe PDF Visualizza/Apri	789.09 kB	Adobe PDF	Visualizza/Apri

Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11583/2874806

PORTO @ Archivio Istituzionale della Ricerca

Towards website domain name classification using graph based semi-supervised learning

Faroughi, Azadeh;Morichetta, Andrea;Vassio, Luca;Figueiredo, Flavio;Mellia, Marco;Javidan, Reza

2021

Abstract

Scheda breve Scheda completa Scheda completa (DC)

Pubblicazioni consigliate

Informazioni

Conferma cancellazione

Scheda breve

Scheda completa

Scheda completa (DC)