logo SBA

ETD

Archivio digitale delle tesi discusse presso l’Università di Pisa

Tesi etd-10072020-090015


Tipo di tesi
Tesi di laurea magistrale
Autore
DEMELAS, FRANCESCO
URN
etd-10072020-090015
Titolo
A mean field analysis of two-layers neural networks with general convex loss function
Dipartimento
MATEMATICA
Corso di studi
MATEMATICA
Relatori
relatore Romito, Marco
Parole chiave
  • convex loss functions
  • distributional dynamics
  • stochastic gradient descent
  • neural networks
  • propagation of chaos
  • mean field
Data inizio appello
23/10/2020
Consultabilità
Completa
Riassunto
Nowadays neural networks are a powerful tool, even if there are few mathematical results that explain the effectiveness of this approach.Until a few years ago, one of the powerful results guaranteed that any continuous function can be well approximated by a two-layers neural network with convex activation functions and enough hidden nodes.However this tells us nothing about the practical choice of the
parameters.Typically the Stochastic Gradient Descent (SGD), or one of its variants, is used to update them.In the last years several results have been discovered in order to analyse the convergence of parameters using the SGD, in particular using the mean field approach.The key idea is to consider a risk
function defined over a set of distributions of the parameters. This allows us to study the convergence through a PDE, known as distributional dynamics (DD), using common tools of mathematical analysis. Many results use a quadratic loss function, thus optimize the mean square error. In this works we extend this analysis for a general convex loss function.This generalization is fundamental, because the success of a learning problem can be enhanced by the choice of the most suitable loss function.We start by proving that the empirical distributions weakly converge to the solution of the DD for any finite time. Then we analyse the time convergence of the distributions, finding that, under suitable assumptions, the continuous distributions weakly converge to a fixed point of the DD.
File