Tesi etd-02012024-174031

Tipo di tesi

Tesi di laurea magistrale

Autore

ANGILE', EMANUELE

URN

etd-02012024-174031

Titolo

A Study of Implicit Bias in the Training Algorithms of Diagonal Linear Networks

Dipartimento

MATEMATICA

Corso di studi

MATEMATICA

Relatori

relatore Agazzi, Andrea

Parole chiave

implicit bias
machine learning
stochastic differential equation
stochastic gradient descent

Data inizio appello

23/02/2024

Consultabilità

Tesi non consultabile

Riassunto

Understanding the performance of neural networks poses one of the most intriguing challenges in the current machine learning landscape. One question that arises is: Why do they achieve good generalization performance, even without explicit use of regularization? Particularly, in overparameterized models where the training objective has many global minima, optimizing using a specific algorithm —typically gradient-based— implicitly biases the solutions toward certain special global minima.

Our work centers around the analysis of Diagonal Linear Networks (DLNs), a reparametrization of linear predictors. Despite its simplicity, this framework provides insights into how stochasticity contributes to generalization, leading to sparser solutions. We compare the dynamics of Gradient Descent (GD) and Stochastic Gradient Descent (SGD), examining the continuous processes that modelize them. In the deterministic case, we have the ODE associated with the Gradient Flow. On the other hand, for the stochastic counterpart, we construct the corresponding SDE, with a particular emphasis on the diffusion term. Motivated by the experimental results, we further investigate Momentum SGD (MSGD), as it exhibits superior performance in terms of generalization. We demonstrate that the SDE related to MSGD introduces an effective parameter that experiences a further decrease compared to SGD's counterpart.

As our work centers on the analysis of continuous processes, we conclude by establishing the connection between the discrete process (coinciding with the steps of the algorithm) and the continuous one. We present a classical result of weak convergence between the solution of an SDE and its discretization with a step-size approaching 0.

File

Nome file	Dimensione
Tesi non consultabile. Contatta l’autore

ETD

Archivio digitale delle tesi discusse presso l’Università di Pisa

Tesi etd-02012024-174031