Tesi etd-09092025-235122 |
Link copiato negli appunti
Tipo di tesi
Tesi di laurea magistrale
Autore
MALEKNIA, ALEX ALI'
URN
etd-09092025-235122
Titolo
Dynamical structure of vanishing gradient and overfitting in multi-layer perceptrons
Dipartimento
MATEMATICA
Corso di studi
MATEMATICA
Relatori
relatore Prof. Agazzi, Andrea
relatore Prof. Sato, Yuzuru
relatore Prof. Sato, Yuzuru
Parole chiave
- dynamical systems
- gradient descent
- minimal model
- multi-layer perceptrons
- overfitting
- vanishing gradient
Data inizio appello
26/09/2025
Consultabilità
Tesi non consultabile
Riassunto
In this work, our goal is to study the most notable problems of neural network training, vanishing gradient and overfitting, from a dynamical systems perspective, and find a minimal setting where they can be observed.
To this aim, we focus our research on multi-layer perceptrons with a single hidden layer trained with gradient descent algorithm. By means of numerical experiments, we observe that the critical phenomena presented in more complex models also arise in this simple setting. In particular, it is sufficient to have a one-neuron MLP as a target function and a two-neuron MLP to observe plateaus and overfitting in the learning curve.
In the more interesting case of 2 target neurons approximated by a 4-neuron neural network, we observed a phenomenon that we have not seen reported in the literature yet: in the overfitting phase, the learning curve of generalization error shows plateaus while increasing.
Following, a more theoretical analysis is conduced to unravel some details about the dynamical properties of different regions in the parameters' space: plateau regions, optimal region and overfitting region.
The first two sets of parameters attract the dynamics and slow it down, despite not being critical regions; the latter object, on the other hand, is an invariant set for the empirical error and, up to symmetries, we will prove it to be the unique global minimum.
These results convey the idea that overfitting is due to the change of stability of the optimal region caused by the presence of noise in the data points.
To this aim, we focus our research on multi-layer perceptrons with a single hidden layer trained with gradient descent algorithm. By means of numerical experiments, we observe that the critical phenomena presented in more complex models also arise in this simple setting. In particular, it is sufficient to have a one-neuron MLP as a target function and a two-neuron MLP to observe plateaus and overfitting in the learning curve.
In the more interesting case of 2 target neurons approximated by a 4-neuron neural network, we observed a phenomenon that we have not seen reported in the literature yet: in the overfitting phase, the learning curve of generalization error shows plateaus while increasing.
Following, a more theoretical analysis is conduced to unravel some details about the dynamical properties of different regions in the parameters' space: plateau regions, optimal region and overfitting region.
The first two sets of parameters attract the dynamics and slow it down, despite not being critical regions; the latter object, on the other hand, is an invariant set for the empirical error and, up to symmetries, we will prove it to be the unique global minimum.
These results convey the idea that overfitting is due to the change of stability of the optimal region caused by the presence of noise in the data points.
File
| Nome file | Dimensione |
|---|---|
Tesi non consultabile. |
|