Tesi etd-09092025-235122

Tipo di tesi

Tesi di laurea magistrale

Autore

MALEKNIA, ALEX ALI'

URN

etd-09092025-235122

Titolo

Dynamical structure of vanishing gradient and overfitting in multi-layer perceptrons

Dipartimento

MATEMATICA

Corso di studi

MATEMATICA

Relatori

relatore Prof. Agazzi, Andrea
relatore Prof. Sato, Yuzuru

Parole chiave

dynamical systems
gradient descent
minimal model
multi-layer perceptrons
overfitting
vanishing gradient

Data inizio appello

26/09/2025

Consultabilità

Tesi non consultabile

Riassunto

In this work, our goal is to study the most notable problems of neural network training, vanishing gradient and overfitting, from a dynamical systems perspective, and find a minimal setting where they can be observed.
To this aim, we focus our research on multi-layer perceptrons with a single hidden layer trained with gradient descent algorithm. By means of numerical experiments, we observe that the critical phenomena presented in more complex models also arise in this simple setting. In particular, it is sufficient to have a one-neuron MLP as a target function and a two-neuron MLP to observe plateaus and overfitting in the learning curve.
In the more interesting case of 2 target neurons approximated by a 4-neuron neural network, we observed a phenomenon that we have not seen reported in the literature yet: in the overfitting phase, the learning curve of generalization error shows plateaus while increasing.
Following, a more theoretical analysis is conduced to unravel some details about the dynamical properties of different regions in the parameters' space: plateau regions, optimal region and overfitting region.
The first two sets of parameters attract the dynamics and slow it down, despite not being critical regions; the latter object, on the other hand, is an invariant set for the empirical error and, up to symmetries, we will prove it to be the unique global minimum.
These results convey the idea that overfitting is due to the change of stability of the optimal region caused by the presence of noise in the data points.

File

Nome file	Dimensione
Tesi non consultabile. Contatta l’autore

ETD

Archivio digitale delle tesi discusse presso l’Università di Pisa

Tesi etd-09092025-235122