While on the Internet, individuals encounter invisible services that collect personal information, also known as third-party Web trackers (trackers for short). Linked to advertisement, social sharing, and analytic services in general, hundreds of companies de facto track and build profiles of people. Therefore, actually individuals leak personal and corporate information to trackers whose (legitimate or not) businesses revolve around the value of collected data. The implications are serious, from a person unwillingly exposing private information to an unknown third-party, to a company being unable to control the flow of its information to the outside world. As a result, users have lost control over their private data in the Internet. The scope of this thesis is threefold: show firstly how Web trackers are popular and how users are involved in this phenomenon; propose secondly algorithms and methodologies to automatically pinpoint these services and, more in general, malicious traffic; introduce finally CROWDSURF, a platform for comprehensive and collaborative auditing of data that flows to Internet services. Many results show a worrying scenario. Web trackers are omnipresent. They are embedded in almost all websites (more than 70%), including the most popular ones, and some of these are able to track continuously 98% of the internauts. Users, that often do not know anything about the phenomenon, use countermeasures that suffer many problems and sometimes act not clearly and with no transparency. With the aim to provide new tools to overcome limitations of actual solutions, I propose two automatic methodologies. Both two algorithms show excellent results: using a very small dataset, the first methodology identifies 34 new third-party Web trackers not present in available blacklists; second algorithm clusters perfectly malicious traffic, e.g., malware, advertising services, and third-party tracking services. These methodologies could easily be used to realize a new generation of anti-tracking solutions, overcoming one of the biggest problem of this generation, that pinpoint trackers manually. Finally, CROWDSURF presents the features that an anti-tracking solution should have. This platform is very preliminary and presents practical challenges that must be faced, but could be the milestone for a new generation of solutions able to give back to users the control of information exchanged on the Internet.
Big Data Methodologies and Applications to Privacy and Web Tracking in the Internet / Metwalley, Hassan. - (2017).
Big Data Methodologies and Applications to Privacy and Web Tracking in the Internet
METWALLEY, HASSAN
2017
Abstract
While on the Internet, individuals encounter invisible services that collect personal information, also known as third-party Web trackers (trackers for short). Linked to advertisement, social sharing, and analytic services in general, hundreds of companies de facto track and build profiles of people. Therefore, actually individuals leak personal and corporate information to trackers whose (legitimate or not) businesses revolve around the value of collected data. The implications are serious, from a person unwillingly exposing private information to an unknown third-party, to a company being unable to control the flow of its information to the outside world. As a result, users have lost control over their private data in the Internet. The scope of this thesis is threefold: show firstly how Web trackers are popular and how users are involved in this phenomenon; propose secondly algorithms and methodologies to automatically pinpoint these services and, more in general, malicious traffic; introduce finally CROWDSURF, a platform for comprehensive and collaborative auditing of data that flows to Internet services. Many results show a worrying scenario. Web trackers are omnipresent. They are embedded in almost all websites (more than 70%), including the most popular ones, and some of these are able to track continuously 98% of the internauts. Users, that often do not know anything about the phenomenon, use countermeasures that suffer many problems and sometimes act not clearly and with no transparency. With the aim to provide new tools to overcome limitations of actual solutions, I propose two automatic methodologies. Both two algorithms show excellent results: using a very small dataset, the first methodology identifies 34 new third-party Web trackers not present in available blacklists; second algorithm clusters perfectly malicious traffic, e.g., malware, advertising services, and third-party tracking services. These methodologies could easily be used to realize a new generation of anti-tracking solutions, overcoming one of the biggest problem of this generation, that pinpoint trackers manually. Finally, CROWDSURF presents the features that an anti-tracking solution should have. This platform is very preliminary and presents practical challenges that must be faced, but could be the milestone for a new generation of solutions able to give back to users the control of information exchanged on the Internet.Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/11583/2667668
Attenzione
Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo