logo SBA

ETD

Archivio digitale delle tesi discusse presso l’Università di Pisa

Tesi etd-10032019-105944


Tipo di tesi
Tesi di laurea magistrale
Autore
PAPINI, ANDREA
URN
etd-10032019-105944
Titolo
A Mathematical Framework for Stochastic Gradient Descent Algorithms
Dipartimento
MATEMATICA
Corso di studi
MATEMATICA
Relatori
relatore Dott. Bacciu, Davide
correlatore Prof. Romito, Marco
controrelatore Dott. Trevisan, Dario
Parole chiave
  • deep learning
  • dynamical system
  • non-convex optimization
  • stochastic differential equation
  • stochastic gradient descent
Data inizio appello
25/10/2019
Consultabilità
Completa
Riassunto
We develop the mathematical foundations of the stochastic modified equations (SME) framework for analyzing the dynamics of stochastic gradient algorithms, where the latter is approximated by a class of stochastic differential equations with small noise parameters. We prove that this approximation can be understood mathematically as a weak approximation, which leads to a number of precise and useful results on the approximations of stochastic gradient descent (SGD), momentum SGD and stochastic Nesterov's accelerated gradient method in the general setting of stochastic objectives. We also demonstrate through explicit calculations that this continuous-time approach can uncover important analytical insights into the stochastic gradient algorithms under consideration that may not be easy to obtain in a purely discrete-time setting. In particular we prove that SGD minimizes an average potential over the posterior distribution of weights along with an entropic regularization term. This potential is however not the original loss function in general. So SGD does perform variational inference, but for a different loss than the one used to compute the gradients. We conclude the thesis giving some new insight in the Gradient Noise in the stochastic gradient descent questioning the Guassianity assumption in the large data regime.
File