Big Data for Traffic Monitoring and Management

Trevisan, Martino

doi:10.6092/polito/porto/2726624

The last two decades witnessed tremendous advances in the Information and Com- munications Technologies. Beside improvements in computational power and storage capacity, communication networks carry nowadays an amount of data which was not envisaged only few years ago. Together with their pervasiveness, network complexity increased at the same pace, leaving operators and researchers with few instruments to understand what happens in the networks, and, on the global scale, on the Internet. Fortunately, recent advances in data science and machine learning come to the res- cue of network analysts, and allow analyses with a level of complexity and spatial/tem- poral scope not possible only 10 years ago. In my thesis, I take the perspective of an In- ternet Service Provider (ISP), and illustrate challenges and possibilities of analyzing the traffic coming from modern operational networks. I make use of big data and machine learning algorithms, and apply them to datasets coming from passive measurements of ISP and University Campus networks. The marriage between data science and network measurements is complicated by the complexity of machine learning algorithms, and by the intrinsic multi-dimensionality and variability of this kind of data. As such, my work proposes and evaluates novel techniques, inspired from popular machine learning approaches, but carefully tailored to operate with network traffic. In this thesis, I first provide a thorough characterization of the Internet traffic from 2013 to 2018. I show the most important trends in the composition of traffic and users’ habits across the last 5 years, and describe how the network infrastructure of Internet big players changed in order to support faster and larger traffic. Then, I show the chal- lenges in classifying network traffic, with particular attention to encryption and to the convergence of Internet around few big players. To overcome the limitations of classical approaches, I propose novel algorithms for traffic classification and management lever- aging machine learning techniques, and, in particular, big data approaches. Exploiting temporal correlation among network events, and benefiting from large datasets of op- erational traffic, my algorithms learn common traffic patterns of web services, and use them for (i) traffic classification and (ii) fine-grained traffic management. My proposals are always validated in experimental environments, and, then, deployed in real opera- tional networks, from which I report the most interesting findings I obtain. I also focus on the Quality of Experience (QoE) of web users, as their satisfaction represents the final objective of computer networks. Again, I show that using big data approaches, the network can achieve visibility on the quality of web browsing of users. In general, the algorithms I propose help ISPs have a detailed view of traffic that flows in their network, allowing fine-grained traffic classification and management, and real-time monitoring of users QoE.

Big Data for Traffic Monitoring and Management / Trevisan, Martino. - (2019 Feb 27). [10.6092/polito/porto/2726624]