A classification algorithm is a versatile tool, that can serve as a predictor for the future or as an analytical tool to understand the past. Several obstacles prevent classification from scaling to a large Volume, Velocity, Variety or Value. The aim of this thesis is to scale distributed classification algorithms beyond current limits, assess the state-of-practice of Big Data machine learning frameworks and validate the effectiveness of a data science process in improving urban safety. We found in massive datasets with a number of large-domain categorical features a difficult challenge for existing classification algorithms. We propose associative classification as a possible answer, and develop several novel techniques to distribute the training of an associative classifier among parallel workers and improve the final quality of the model. The experiments, run on a real large-scale dataset with more than 4 billion records, confirmed the quality of the approach. To assess the state-of-practice of Big Data machine learning frameworks and streamline the process of integration and fine-tuning of the building blocks, we developed a generic, self-tuning tool to extract knowledge from network traffic measurements. The result is a system that offers human-readable models of the data with minimal user intervention, validated by experiments on large collections of real-world passive network measurements. A good portion of this dissertation is dedicated to the study of a data science process to improve urban safety. First, we shed some light on the feasibility of a system to monitor social messages from a city for emergency relief. We then propose a methodology to mine temporal patterns in social issues, like crimes. Finally, we propose a system to integrate the findings of Data Science on the citizenry’s perception of safety and communicate its results to decision makers in a timely manner. We applied and tested the system in a real Smart City scenario, set in Turin, Italy.
Classification algorithms for Big Data with applications in the urban security domain / Venturini, Luca. - (2018 Jun 22). [10.6092/polito/porto/2709959]
Classification algorithms for Big Data with applications in the urban security domain
VENTURINI, LUCA
2018
Abstract
A classification algorithm is a versatile tool, that can serve as a predictor for the future or as an analytical tool to understand the past. Several obstacles prevent classification from scaling to a large Volume, Velocity, Variety or Value. The aim of this thesis is to scale distributed classification algorithms beyond current limits, assess the state-of-practice of Big Data machine learning frameworks and validate the effectiveness of a data science process in improving urban safety. We found in massive datasets with a number of large-domain categorical features a difficult challenge for existing classification algorithms. We propose associative classification as a possible answer, and develop several novel techniques to distribute the training of an associative classifier among parallel workers and improve the final quality of the model. The experiments, run on a real large-scale dataset with more than 4 billion records, confirmed the quality of the approach. To assess the state-of-practice of Big Data machine learning frameworks and streamline the process of integration and fine-tuning of the building blocks, we developed a generic, self-tuning tool to extract knowledge from network traffic measurements. The result is a system that offers human-readable models of the data with minimal user intervention, validated by experiments on large collections of real-world passive network measurements. A good portion of this dissertation is dedicated to the study of a data science process to improve urban safety. First, we shed some light on the feasibility of a system to monitor social messages from a city for emergency relief. We then propose a methodology to mine temporal patterns in social issues, like crimes. Finally, we propose a system to integrate the findings of Data Science on the citizenry’s perception of safety and communicate its results to decision makers in a timely manner. We applied and tested the system in a real Smart City scenario, set in Turin, Italy.File | Dimensione | Formato | |
---|---|---|---|
conv_thesisventurini_v3-gs-conv.pdf
accesso aperto
Descrizione: Tesi
Tipologia:
Tesi di dottorato
Licenza:
Pubblico - Tutti i diritti riservati
Dimensione
9.57 MB
Formato
Adobe PDF
|
9.57 MB | Adobe PDF | Visualizza/Apri |
Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/11583/2709959
Attenzione
Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo