logo SBA

ETD

Archivio digitale delle tesi discusse presso l’Università di Pisa

Tesi etd-01192026-144120


Tipo di tesi
Tesi di dottorato di ricerca
Autore
COLEMAN, ERIC NUERTEY
URN
etd-01192026-144120
Titolo
Beyond Static Models: Enabling Continual Parameter Efficient Finetuning for Large Foundation Models
Settore scientifico disciplinare
INF/01 - INFORMATICA
Corso di studi
INFORMATICA
Relatori
tutor Prof. Bacciu, Davide
supervisore Prof. Lomonaco, Vincenzo
Parole chiave
  • computer vision
  • continual learning
  • machine learning
Data inizio appello
27/01/2026
Consultabilità
Completa
Riassunto
Large-scale pre-trained models have become the backbone of modern computer vision, yet their deployment in dynamic environments remains constrained by computational limitations and catastrophic forgetting. While these models demonstrate remarkable capabilities on individual tasks, adapting them to continuously evolving visual domains requires prohibitively expensive retraining or leads to severe performance degradation on previously learned tasks. Traditional fine-tuning approaches consume extensive resources, making continual adaptation impractical for most practitioners and environmentally unsustainable.

This thesis addresses the fundamental challenge of enabling efficient continual learning for large vision models through novel parameter-efficient approaches. The work begins with a comprehensive survey that establishes the theoretical foundations and identifies the critical need for parameter-efficient continual fine-tuning (PECFT) as a distinct research area. Early investigations into interference patterns in large language models reveal the phenomenon of in-context interference, providing crucial insights that inform subsequent methodological developments.

Building on these foundations, the thesis introduces three complementary methodologies that maintain performance while dramatically reducing computational requirements. First, an adaptive LoRA merging technique is developed that dynamically computes optimal combination weights for different visual domains, eliminating the need for manual hyperparameter tuning while achieving superior adaptation performance. Second, Hierarchical Adapters Merging (HAM) is presented, a framework that organizes learned adaptations into similarity-based groups, enabling efficient scaling to long sequences of visual tasks while maintaining a fixed parameter budget. Third, GRAD-BEN (Gradient Aligned Distillation and Beta Ensembling) extends these principles to challenging multimodal scenarios, specifically Few-Shot Domain-Incremental Learning. GRAD-BEN leverages vision-language models and integrates multi-modal prompting, gradient-aligned distillation, and Beta-based temporal ensembling to enable robust adaptation under severe domain shifts with minimal supervision, without relying on memory buffers or task identifiers.

Through comprehensive evaluation on standard computer vision benchmarks including CIFAR-100, CUB-200, Tiny-ImageNet, DomainNet, CoRE50, and CDDB-Hard, this work demonstrates state-of-the-art performance in long-sequence continual learning scenarios and resource-constrained multimodal settings. The proposed methods achieve comparable or superior accuracy to full fine-tuning while requiring only a fraction of the computational resources. These contributions advance the field toward practical continual learning systems capable of adapting to the dynamic nature of real-world computer vision applications while maintaining computational sustainability.
File