Tesi etd-10042020-220325

Tipo di tesi

Tesi di laurea magistrale

Autore

BOMBARI, SIMONE

URN

etd-10042020-220325

Titolo

The dynamics of Stochastic Gradient Descent in the loss landscape of Deep Neural Networks

Dipartimento

FISICA

Corso di studi

FISICA

Relatori

relatore Prof. Soatto, Stefano
relatore Prof. Cataldo, Enrico

Parole chiave

computer vision
deep learning
loss landscape
machine learning
optimization
stochastic gradient descent

Data inizio appello

26/10/2020

Consultabilità

Tesi non consultabile

Riassunto

The deep learning optimization community has observed how the neural networks generalization ability is strongly related to the flatness of the loss landscape in the point the optimization algorithm converged to. Experiments show that SGD is more likely to converge to flat minima, unlike its deterministic counterpart, GD. In this work we try to build a mathematical model able to clarify this phenomenon, using a variation of the Eyring-Kramers law, a formula used in physics to describe the mean transition time of a Brownian particle between local minima in a potential landscape.

Later, we discuss the validity of the continuous approach for these purposes, showing how the SGD dynamics does not fulfill the necessary requirements for our architecture, since it is substantially a strongly discrete process. This result casts doubts on the validity of continuous-time approximation commonly used to analyze SGD dynamics through the theory of stochastic differential equations.

We finally try, with empirical experiments, to better investigate the loss landscape and the SGD trajectory of a real training process on a real neural network. We are therefore able to get an overview of the loss landscape topology, that we claim is in analogy with a tower of colanders. In particular, we find a natural constraint between the loss and the highest eigenvalue of its Hessian, meaning that we cannot achieve low values of the loss function, without entering in narrow areas of the landscape.

File

Nome file	Dimensione
Tesi non consultabile. Contatta l’autore

ETD

Archivio digitale delle tesi discusse presso l’Università di Pisa

Tesi etd-10042020-220325