Thesis etd-09112024-144322 |
Link copiato negli appunti
Thesis type
Tesi di laurea magistrale
Author
BRUNO, GIUSEPPE
URN
etd-09112024-144322
Thesis title
Clustering behavior in a mean-field transformer model
Department
MATEMATICA
Course of study
MATEMATICA
Supervisors
relatore Agazzi, Andrea
Keywords
- clustering
- machine learning
- mean-field
Graduation session start date
27/09/2024
Availability
Full
Summary
Transformers have become a cornerstone in the architecture of large language models, primarily due to their self-attention mechanism. Building on the framework established by Geshkovski et al. (2023), this thesis studies the evolution of tokens within a deep stack of Transformer layers as a continuous-time flow on the unit sphere. More specifically, the token dynamics are modeled as a mean-field interacting particle system, i.e. their empirical measure obeys a mean-field partial differential equation (PDE), with Wasserstein gradient flow structure.
We provide a mathematical investigation of the long-term behavior of this system, with a particular focus on the emergence and persistence of metastable phases and clustering phenomena, key elements in applications like next-token prediction.
The analysis begins with the empirical measure of tokens uniformly sampled on the sphere and traces the system’s evolution through three distinct parts. In the linear phase, perturbations collapse toward a dominant mode. This is followed by a quasi-linear phase, where the nonlinear evolution of this mode leads to non-vanishing deviations from the initial uniform distribution. Finally, in the collapsing phase, the system transitions into multiple clusters before ultimately converging to a single point.
Crucial to this analysis is establishing long-term bounds on the solution of the associated mean-field PDE in negative Sobolev spaces, achieved through the spectral properties of the linearized PDE, Lagrangian flow techniques, and Grenier's iterative method. This multi-phase approach reveals explicit relationships between parameters such as temperature, number of tokens, and dimensionality, and the resulting symmetries, number of clusters, and their time scales.
Furthermore, most of these results extend to a broader class of interaction potentials. The thesis also examines the stability of the uniform measure under the influence of noise, and numerical simulations are performed to validate the theoretical predictions, demonstrating that the metastable clusters exhibit the properties anticipated by the analysis.
We provide a mathematical investigation of the long-term behavior of this system, with a particular focus on the emergence and persistence of metastable phases and clustering phenomena, key elements in applications like next-token prediction.
The analysis begins with the empirical measure of tokens uniformly sampled on the sphere and traces the system’s evolution through three distinct parts. In the linear phase, perturbations collapse toward a dominant mode. This is followed by a quasi-linear phase, where the nonlinear evolution of this mode leads to non-vanishing deviations from the initial uniform distribution. Finally, in the collapsing phase, the system transitions into multiple clusters before ultimately converging to a single point.
Crucial to this analysis is establishing long-term bounds on the solution of the associated mean-field PDE in negative Sobolev spaces, achieved through the spectral properties of the linearized PDE, Lagrangian flow techniques, and Grenier's iterative method. This multi-phase approach reveals explicit relationships between parameters such as temperature, number of tokens, and dimensionality, and the resulting symmetries, number of clusters, and their time scales.
Furthermore, most of these results extend to a broader class of interaction potentials. The thesis also examines the stability of the uniform measure under the influence of noise, and numerical simulations are performed to validate the theoretical predictions, demonstrating that the metastable clusters exhibit the properties anticipated by the analysis.
File
| Nome file | Dimensione |
|---|---|
| TesiFinale.pdf | 2.57 Mb |
Contatta l’autore |
|