This thesis deals with recurrent neural networks, a particular class of artificial neural networks which can learn a generative model of input sequences. The input is mapped, through a feedback loop and a nonlinear activation function, into a hidden state, which is then projected into the output space, obtaining either a probability distribution or the new input for the next timestep. This work consists mainly of two parts: a theoretical study for helping the understanding of recurrent neural networks framework, which is not yet deeply investigated, and their application to nonlinear prediction problems, since recurrent neural networks are really powerful models suitable for solving several practical tasks in different fields. For what concerns the theoretical part, we analyse the weaknesses of stateoftheart models and tackle them in order to improve the performance of a recurrent neural network. Firstly, we contribute in the understanding of the dynamical properties of a recurrent neural network, highlighting the close relation between the definition of stable limit cycles and the echo state property of an echo state network. We provide sufficient conditions for the convergence of the hidden state to a trajectory, which is uniquely determined by the input signal, independently of the initial states. This may help extend the memory of the network and increase the design options for the network. Moreover, we develop a novel approach to address the main problem in training recurrent neural networks, the socalled vanishing gradient problem. Our new method allows us to train a very simple recurrent neural network, making the gradient not to vanish even after many timesteps. Exploiting the singular value decomposition of the vanishing factors in the gradient and random matrices theory, we find that the singular values have to be confined in a narrow interval and derive conditions about their root mean square value. Then, we also improve the efficiency of the training of a recurrent neural network, defining a new method for speeding up this process. Thanks to a least square regularization, we can initialize the parameters of the network, in order to set them closer to the minimum and running fewer epochs of classical training algorithms. Moreover, it is also possible to completely train the network with our initialization method, running more iterations of it without losing in performance with respect to classical training algorithms. Finally, it is also possible to use it as a realtime learning algorithm, adjusting the parameters to the new data through one iteration of our initialization. In the last part of this thesis, we apply recurrent neural networks to nonlinear prediction problems. We consider prediction of numerical sequences, estimating the following input choosing it from a probability distribution. We study an automatic text generation problem, where we need to predict the following character in order to compose words and sentences, and a path prediction of walking mobile users in the central area of a city, as a sequence of crossroads. Then, we analyse the prediction of video frames, discovering a wide range of applications related to the prediction of movements. We study the collision problem of bouncing balls, taking into account only the sequence of video frames without any knowledge about the physical characteristics of the problem, and the distribution over days of mobile user in a city and in a whole region. Finally, we address the stateoftheart problem of missing data imputation, analysing the incomplete spectrogram of audio signals. We restore audio signals with missing timefrequency data, demonstrating via numerical experiments that a performance improvement can be achieved involving recurrent neural networks.
Recurrent neural networks: methods and applications to nonlinear predictions / Bay, Alessandro.  (2017). [10.6092/polito/porto/2677460]
Recurrent neural networks: methods and applications to nonlinear predictions
BAY, ALESSANDRO
2017
Abstract
This thesis deals with recurrent neural networks, a particular class of artificial neural networks which can learn a generative model of input sequences. The input is mapped, through a feedback loop and a nonlinear activation function, into a hidden state, which is then projected into the output space, obtaining either a probability distribution or the new input for the next timestep. This work consists mainly of two parts: a theoretical study for helping the understanding of recurrent neural networks framework, which is not yet deeply investigated, and their application to nonlinear prediction problems, since recurrent neural networks are really powerful models suitable for solving several practical tasks in different fields. For what concerns the theoretical part, we analyse the weaknesses of stateoftheart models and tackle them in order to improve the performance of a recurrent neural network. Firstly, we contribute in the understanding of the dynamical properties of a recurrent neural network, highlighting the close relation between the definition of stable limit cycles and the echo state property of an echo state network. We provide sufficient conditions for the convergence of the hidden state to a trajectory, which is uniquely determined by the input signal, independently of the initial states. This may help extend the memory of the network and increase the design options for the network. Moreover, we develop a novel approach to address the main problem in training recurrent neural networks, the socalled vanishing gradient problem. Our new method allows us to train a very simple recurrent neural network, making the gradient not to vanish even after many timesteps. Exploiting the singular value decomposition of the vanishing factors in the gradient and random matrices theory, we find that the singular values have to be confined in a narrow interval and derive conditions about their root mean square value. Then, we also improve the efficiency of the training of a recurrent neural network, defining a new method for speeding up this process. Thanks to a least square regularization, we can initialize the parameters of the network, in order to set them closer to the minimum and running fewer epochs of classical training algorithms. Moreover, it is also possible to completely train the network with our initialization method, running more iterations of it without losing in performance with respect to classical training algorithms. Finally, it is also possible to use it as a realtime learning algorithm, adjusting the parameters to the new data through one iteration of our initialization. In the last part of this thesis, we apply recurrent neural networks to nonlinear prediction problems. We consider prediction of numerical sequences, estimating the following input choosing it from a probability distribution. We study an automatic text generation problem, where we need to predict the following character in order to compose words and sentences, and a path prediction of walking mobile users in the central area of a city, as a sequence of crossroads. Then, we analyse the prediction of video frames, discovering a wide range of applications related to the prediction of movements. We study the collision problem of bouncing balls, taking into account only the sequence of video frames without any knowledge about the physical characteristics of the problem, and the distribution over days of mobile user in a city and in a whole region. Finally, we address the stateoftheart problem of missing data imputation, analysing the incomplete spectrogram of audio signals. We restore audio signals with missing timefrequency data, demonstrating via numerical experiments that a performance improvement can be achieved involving recurrent neural networks.File  Dimensione  Formato  

PhdThesis_BAY.pdf
accesso aperto
Descrizione: Doctoral thesis
Tipologia:
Tesi di dottorato
Licenza:
PUBBLICO  Tutti i diritti riservati
Dimensione
4.96 MB
Formato
Adobe PDF

4.96 MB  Adobe PDF  Visualizza/Apri 
Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/11583/2677460
Attenzione
Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo