Tesi etd-01192026-144120

Tipo di tesi

Tesi di dottorato di ricerca

Autore

COLEMAN, ERIC NUERTEY

URN

etd-01192026-144120

Titolo

Beyond Static Models: Enabling Continual Parameter Efficient Finetuning for Large Foundation Models

Settore scientifico disciplinare

INF/01 - INFORMATICA

Corso di studi

INFORMATICA

Relatori

tutor Prof. Bacciu, Davide
supervisore Prof. Lomonaco, Vincenzo

Parole chiave

computer vision
continual learning
machine learning

Data inizio appello

27/01/2026

Consultabilità

Completa

Riassunto

Large-scale pre-trained models have become the backbone of modern computer vision, yet their deployment in dynamic environments remains constrained by computational limitations and catastrophic forgetting. While these models demonstrate remarkable capabilities on individual tasks, adapting them to continuously evolving visual domains requires prohibitively expensive retraining or leads to severe performance degradation on previously learned tasks. Traditional fine-tuning approaches consume extensive resources, making continual adaptation impractical for most practitioners and environmentally unsustainable.

This thesis addresses the fundamental challenge of enabling efficient continual learning for large vision models through novel parameter-efficient approaches. The work begins with a comprehensive survey that establishes the theoretical foundations and identifies the critical need for parameter-efficient continual fine-tuning (PECFT) as a distinct research area. Early investigations into interference patterns in large language models reveal the phenomenon of in-context interference, providing crucial insights that inform subsequent methodological developments.

Building on these foundations, the thesis introduces three complementary methodologies that maintain performance while dramatically reducing computational requirements. First, an adaptive LoRA merging technique is developed that dynamically computes optimal combination weights for different visual domains, eliminating the need for manual hyperparameter tuning while achieving superior adaptation performance. Second, Hierarchical Adapters Merging (HAM) is presented, a framework that organizes learned adaptations into similarity-based groups, enabling efficient scaling to long sequences of visual tasks while maintaining a fixed parameter budget. Third, GRAD-BEN (Gradient Aligned Distillation and Beta Ensembling) extends these principles to challenging multimodal scenarios, specifically Few-Shot Domain-Incremental Learning. GRAD-BEN leverages vision-language models and integrates multi-modal prompting, gradient-aligned distillation, and Beta-based temporal ensembling to enable robust adaptation under severe domain shifts with minimal supervision, without relying on memory buffers or task identifiers.

Through comprehensive evaluation on standard computer vision benchmarks including CIFAR-100, CUB-200, Tiny-ImageNet, DomainNet, CoRE50, and CDDB-Hard, this work demonstrates state-of-the-art performance in long-sequence continual learning scenarios and resource-constrained multimodal settings. The proposed methods achieve comparable or superior accuracy to full fine-tuning while requiring only a fraction of the computational resources. These contributions advance the field toward practical continual learning systems capable of adapting to the dynamic nature of real-world computer vision applications while maintaining computational sustainability.

File

Nome file	Dimensione
TESI_ERI...MAN__.pdf	6.79 Mb
Contatta l’autore

ETD

Archivio digitale delle tesi discusse presso l’Università di Pisa

Tesi etd-01192026-144120