Tesi etd-09072025-171516

Tipo di tesi

Tesi di laurea magistrale

Autore

MAZZUCCO, LUCA

URN

etd-09072025-171516

Titolo

Large deviations for deep transformer models

Dipartimento

MATEMATICA

Corso di studi

MATEMATICA

Relatori

relatore Prof. Agazzi, Andrea

Parole chiave

Bayesian neural networks
large deviation principle
transformers

Data inizio appello

26/09/2025

Consultabilità

Completa

Riassunto

This thesis investigates the Large Deviation Principle (LDP) for Transformer models, a central architecture in modern deep learning. Large deviation theory provides a rigorous framework to quantify the probability of rare events and, when applied to neural networks, it captures fluctuations around their deterministic Gaussian process limits. These rare fluctuations are key to understanding the stability of learning dynamics and the universality of large-scale models.
We focus on a simplified transformer-like architecture, where query and key weights are assumed fixed or pre-trained, so that the model takes the form of a deep linear network. We build on recent results in the literature: it is known that in a Bayesian setting with Gaussian priors and Gaussian noise, and in the double large-scale limit (neurons, samples, and input dimension diverging at fixed ratios), the posterior covariance kernel can be expressed through the minimizer of an action functional.
The main contribution of this thesis is to reinterpret such results within the large deviations framework, providing a partial unification of perspectives developed in distinct contexts. In particular, when the layer width diverges while the input dimension and dataset size remain finite, we identify the action as a rate function, thereby connecting this line of work with the broader literature on large deviations for covariance processes.

File

Nome file	Dimensione
Tesi_Mazzucco.pdf	636.87 Kb
Contatta l’autore

ETD

Archivio digitale delle tesi discusse presso l’Università di Pisa

Tesi etd-09072025-171516