logo SBA

ETD

Archivio digitale delle tesi discusse presso l’Università di Pisa

Tesi etd-09072025-171516


Tipo di tesi
Tesi di laurea magistrale
Autore
MAZZUCCO, LUCA
URN
etd-09072025-171516
Titolo
Large deviations for deep transformer models
Dipartimento
MATEMATICA
Corso di studi
MATEMATICA
Relatori
relatore Prof. Agazzi, Andrea
Parole chiave
  • Bayesian neural networks
  • large deviation principle
  • transformers
Data inizio appello
26/09/2025
Consultabilità
Completa
Riassunto
This thesis investigates the Large Deviation Principle (LDP) for Transformer models, a central architecture in modern deep learning. Large deviation theory provides a rigorous framework to quantify the probability of rare events and, when applied to neural networks, it captures fluctuations around their deterministic Gaussian process limits. These rare fluctuations are key to understanding the stability of learning dynamics and the universality of large-scale models.
We focus on a simplified transformer-like architecture, where query and key weights are assumed fixed or pre-trained, so that the model takes the form of a deep linear network. We build on recent results in the literature: it is known that in a Bayesian setting with Gaussian priors and Gaussian noise, and in the double large-scale limit (neurons, samples, and input dimension diverging at fixed ratios), the posterior covariance kernel can be expressed through the minimizer of an action functional.
The main contribution of this thesis is to reinterpret such results within the large deviations framework, providing a partial unification of perspectives developed in distinct contexts. In particular, when the layer width diverges while the input dimension and dataset size remain finite, we identify the action as a rate function, thereby connecting this line of work with the broader literature on large deviations for covariance processes.
File