With the advent of Over-The-Top content providers (OTTs), Internet Service Providers (ISPs) saw their portfolio of services shrink to the low margin role of data transporters. In order to counter this effect, some ISPs started to follow big OTTs like Facebook and Google in trying to turn their data into a valuable asset. In this paper, we explore the questions of what meaningful information can be extracted from network data, and what interesting insights it can provide. To this end, we tackle the first challenge of detecting “user-URLs”, i.e., those links that were clicked by users as opposed to those objects automatically downloaded by browsers and applications. We devise algorithms to pinpoint such URLs, and validate them on manually collected ground truth traces. We then apply them on a three-day long traffic trace spanning more than 19,000 residential users that generated around 190 million HTTP transactions. We find that only 1.6% of these observed URLs were actually clicked by users. As a first application for our methods, we answer the question of which platforms participate most in promoting the Internet content. Surprisingly, we find that, despite its notoriety, only 11% of the user URL visits are coming from Google Search.
Gold Mining in a River of Internet Content Traffic / Zied Ben, Houidi; Giuseppe, Scavo; Samir Ghamri, Doudane; Finamore, Alessandro; Traverso, Stefano; Mellia, Marco. - STAMPA. - 8406:(2014), pp. 91-103. (Intervento presentato al convegno 6th International Workshop on Traffic Monitoring and Analysis, TMA tenutosi a London nel 14/4/2014) [10.1007/978-3-642-54999-1_8].
Gold Mining in a River of Internet Content Traffic
FINAMORE, ALESSANDRO;TRAVERSO, STEFANO;MELLIA, Marco
2014
Abstract
With the advent of Over-The-Top content providers (OTTs), Internet Service Providers (ISPs) saw their portfolio of services shrink to the low margin role of data transporters. In order to counter this effect, some ISPs started to follow big OTTs like Facebook and Google in trying to turn their data into a valuable asset. In this paper, we explore the questions of what meaningful information can be extracted from network data, and what interesting insights it can provide. To this end, we tackle the first challenge of detecting “user-URLs”, i.e., those links that were clicked by users as opposed to those objects automatically downloaded by browsers and applications. We devise algorithms to pinpoint such URLs, and validate them on manually collected ground truth traces. We then apply them on a three-day long traffic trace spanning more than 19,000 residential users that generated around 190 million HTTP transactions. We find that only 1.6% of these observed URLs were actually clicked by users. As a first application for our methods, we answer the question of which platforms participate most in promoting the Internet content. Surprisingly, we find that, despite its notoriety, only 11% of the user URL visits are coming from Google Search.| File | Dimensione | Formato | |
|---|---|---|---|
| 
									
										
										
										
										
											
												
												
												    
												
											
										
									
									
										
										
											TMA-llncs.pdf
										
																				
									
										
											 accesso aperto 
											Tipologia:
											1. Preprint / submitted version [pre- review]
										 
									
									
									
									
										
											Licenza:
											
											
												Pubblico - Tutti i diritti riservati
												
												
												
											
										 
									
									
										Dimensione
										271.15 kB
									 
									
										Formato
										Adobe PDF
									 
										
										
								 | 
								271.15 kB | Adobe PDF | Visualizza/Apri | 
Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/11583/2539888
			
		
	
	
	
			      	Attenzione
Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo
