Understanding how people interact with the web is key for a variety of applications, e.g., from the design of effective web pages to the definition of successful online marketing campaigns. Browsing behavior has been traditionally represented and studied by means of clickstreams, i.e., graphs whose vertices are web pages, and edges are the paths followed by users. Obtaining large and representative data to extract clickstreams is, however, challenging. The evolution of the web questions whether browsing behavior is changing and, by consequence, whether properties of clickstreams are changing. This article presents a longitudinal study of clickstreams from 2013 to 2016. We evaluate an anonymized dataset of HTTP traces captured in a large ISP, where thousands of households are connected. We first propose a methodology to identify actual URLs requested by users from the massive set of requests automatically fired by browsers when rendering web pages. Then, we characterize web usage patterns and clickstreams, taking into account both the temporal evolution and the impact of the device used to explore the web. Our analyses precisely quantify various aspects of clickstreams and uncover interesting patterns, such as the typical short paths followed by people while navigating the web, the fast increasing trend in browsing from mobile devices, and the different roles of search engines and social networks in promoting content. Finally, we contribute a dataset of anonymized clickstreams to the community to foster new studies.

You, the Web, and Your Device: Longitudinal Characterization of Browsing Habits / Vassio, Luca; Drago, Idilio; Mellia, Marco; Ben Houidi, Zied; Lamali, Mohamed Lamine. - In: ACM TRANSACTIONS ON THE WEB. - ISSN 1559-1131. - STAMPA. - 12:4(2018), pp. 1-30. [10.1145/3231466]

You, the Web, and Your Device: Longitudinal Characterization of Browsing Habits

Vassio, Luca;Drago, Idilio;Mellia, Marco;
2018

Abstract

Understanding how people interact with the web is key for a variety of applications, e.g., from the design of effective web pages to the definition of successful online marketing campaigns. Browsing behavior has been traditionally represented and studied by means of clickstreams, i.e., graphs whose vertices are web pages, and edges are the paths followed by users. Obtaining large and representative data to extract clickstreams is, however, challenging. The evolution of the web questions whether browsing behavior is changing and, by consequence, whether properties of clickstreams are changing. This article presents a longitudinal study of clickstreams from 2013 to 2016. We evaluate an anonymized dataset of HTTP traces captured in a large ISP, where thousands of households are connected. We first propose a methodology to identify actual URLs requested by users from the massive set of requests automatically fired by browsers when rendering web pages. Then, we characterize web usage patterns and clickstreams, taking into account both the temporal evolution and the impact of the device used to explore the web. Our analyses precisely quantify various aspects of clickstreams and uncover interesting patterns, such as the typical short paths followed by people while navigating the web, the fast increasing trend in browsing from mobile devices, and the different roles of search engines and social networks in promoting content. Finally, we contribute a dataset of anonymized clickstreams to the community to foster new studies.
File in questo prodotto:
File Dimensione Formato  
a24-vassio.pdf

non disponibili

Descrizione: versione finale
Tipologia: 2a Post-print versione editoriale / Version of Record
Licenza: Non Pubblico - Accesso privato/ristretto
Dimensione 1.77 MB
Formato Adobe PDF
1.77 MB Adobe PDF   Visualizza/Apri   Richiedi una copia
1806.07158.pdf

accesso aperto

Descrizione: camera ready
Tipologia: 2. Post-print / Author's Accepted Manuscript
Licenza: PUBBLICO - Tutti i diritti riservati
Dimensione 1.22 MB
Formato Adobe PDF
1.22 MB Adobe PDF Visualizza/Apri
Pubblicazioni consigliate

Caricamento pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: http://hdl.handle.net/11583/2718494
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo