Tesi etd-10022024-201851 |
Link copiato negli appunti
Tipo di tesi
Tesi di laurea magistrale
Autore
SCARDERA, SEBASTIANO
URN
etd-10022024-201851
Titolo
NTK Analysis of Knowledge Distillation
Dipartimento
MATEMATICA
Corso di studi
MATEMATICA
Relatori
relatore Prof. Trevisan, Dario
relatore Dott. Cassarà, Pietro
relatore Dott. Cassarà, Pietro
Parole chiave
- knowledge distillation
- machine learning
- NTK
- statistical learning
Data inizio appello
25/10/2024
Consultabilità
Completa
Riassunto
Our work aims to analyze the Knowledge Distillation in the overparameterized regime.
In a preliminary section, we give an overview of the KD technique. Then we introduce the model and describe its dynamics in the lazy training regime. We show that the dynamics of the student model can be described by a non-symmetric kernel and, in particular, by its spectral properties.
Because of its structure and lack of symmetry, studying the spectrum of the kernel leads to a complex theoretical analysis. To proceed further with our analysis, we first applied a block Gershgorin theorem to localize the eigenvalues of the kernel. At a later stage, we found an application of the Courant-Fischer theorem that improved the last result because we can calculate the intervals in which the eigenvalues lie. This tool, integrated into the study of dynamics in the NTK framework, allows us
to obtain new results in the analysis of the convergence of the dynamics of the student model.
In a preliminary section, we give an overview of the KD technique. Then we introduce the model and describe its dynamics in the lazy training regime. We show that the dynamics of the student model can be described by a non-symmetric kernel and, in particular, by its spectral properties.
Because of its structure and lack of symmetry, studying the spectrum of the kernel leads to a complex theoretical analysis. To proceed further with our analysis, we first applied a block Gershgorin theorem to localize the eigenvalues of the kernel. At a later stage, we found an application of the Courant-Fischer theorem that improved the last result because we can calculate the intervals in which the eigenvalues lie. This tool, integrated into the study of dynamics in the NTK framework, allows us
to obtain new results in the analysis of the convergence of the dynamics of the student model.
File
Nome file | Dimensione |
---|---|
Frontespizio.pdf | 240.21 Kb |
Tesi.pdf | 842.82 Kb |
Contatta l’autore |