Tesi etd-09112024-144322 |
Link copiato negli appunti
Tipo di tesi
Tesi di laurea magistrale
Autore
BRUNO, GIUSEPPE
URN
etd-09112024-144322
Titolo
Clustering behavior in a mean-field transformer model
Dipartimento
MATEMATICA
Corso di studi
MATEMATICA
Relatori
relatore Agazzi, Andrea
Parole chiave
- clustering
- machine learning
- mean-field
Data inizio appello
27/09/2024
Consultabilità
Completa
Riassunto
Transformers have become a cornerstone in the architecture of large language models, primarily due to their self-attention mechanism. Building on the framework established by Geshkovski et al. (2023), this thesis studies the evolution of tokens within a deep stack of Transformer layers as a continuous-time flow on the unit sphere. More specifically, the token dynamics are modeled as a mean-field interacting particle system, i.e. their empirical measure obeys a mean-field partial differential equation (PDE), with Wasserstein gradient flow structure.
We provide a mathematical investigation of the long-term behavior of this system, with a particular focus on the emergence and persistence of metastable phases and clustering phenomena, key elements in applications like next-token prediction.
The analysis begins with the empirical measure of tokens uniformly sampled on the sphere and traces the system’s evolution through three distinct parts. In the linear phase, perturbations collapse toward a dominant mode. This is followed by a quasi-linear phase, where the nonlinear evolution of this mode leads to non-vanishing deviations from the initial uniform distribution. Finally, in the collapsing phase, the system transitions into multiple clusters before ultimately converging to a single point.
Crucial to this analysis is establishing long-term bounds on the solution of the associated mean-field PDE in negative Sobolev spaces, achieved through the spectral properties of the linearized PDE, Lagrangian flow techniques, and Grenier's iterative method. This multi-phase approach reveals explicit relationships between parameters such as temperature, number of tokens, and dimensionality, and the resulting symmetries, number of clusters, and their time scales.
Furthermore, most of these results extend to a broader class of interaction potentials. The thesis also examines the stability of the uniform measure under the influence of noise, and numerical simulations are performed to validate the theoretical predictions, demonstrating that the metastable clusters exhibit the properties anticipated by the analysis.
We provide a mathematical investigation of the long-term behavior of this system, with a particular focus on the emergence and persistence of metastable phases and clustering phenomena, key elements in applications like next-token prediction.
The analysis begins with the empirical measure of tokens uniformly sampled on the sphere and traces the system’s evolution through three distinct parts. In the linear phase, perturbations collapse toward a dominant mode. This is followed by a quasi-linear phase, where the nonlinear evolution of this mode leads to non-vanishing deviations from the initial uniform distribution. Finally, in the collapsing phase, the system transitions into multiple clusters before ultimately converging to a single point.
Crucial to this analysis is establishing long-term bounds on the solution of the associated mean-field PDE in negative Sobolev spaces, achieved through the spectral properties of the linearized PDE, Lagrangian flow techniques, and Grenier's iterative method. This multi-phase approach reveals explicit relationships between parameters such as temperature, number of tokens, and dimensionality, and the resulting symmetries, number of clusters, and their time scales.
Furthermore, most of these results extend to a broader class of interaction potentials. The thesis also examines the stability of the uniform measure under the influence of noise, and numerical simulations are performed to validate the theoretical predictions, demonstrating that the metastable clusters exhibit the properties anticipated by the analysis.
File
Nome file | Dimensione |
---|---|
TesiFinale.pdf | 2.57 Mb |
Contatta l’autore |