Tesi etd-06302025-110805

Tipo di tesi

Tesi di laurea magistrale

Autore

NAWAZ, NIMRA

URN

etd-06302025-110805

Titolo

BrainWash Adversarial Attack on Continual Learning to Maximize Forgetting

Dipartimento

INFORMATICA

Corso di studi

DATA SCIENCE AND BUSINESS INFORMATICS

Relatori

relatore Bacciu, Davide
supervisore Carta, Antonio

Parole chiave

adversarial attack
black-box attacks
brainwash
catastrophic forgetting
continual learning
data poisoning

Data inizio appello

18/07/2025

Consultabilità

Completa

Riassunto

In this work, I undertook an implementation of a poisoning attack, BrainWash, an attack that targets systems that use continuous learning. It poisons a task so that when it is learned, the system forgets previously learned tasks more frequently. The original paper was evaluated on regularization-based continual learning frameworks, but I broadened the scope of the evaluation to include replay-based methods, specifically Experience Replay (ER) and Experience Replay with Asymmetric Cross-Entropy (ER-ACE). These methods seek to mitigate the problem of forgetting by retaining a limited set of past examples and replaying them, which is computationally efficient and preferred in practice because these methods have a small memory footprint, enable on-device learning, and support privacy by not collecting large amounts of data.

Our main objective is to present an approach to data poisoning that specifically seeks to create forgetting in continual learning models. Using the Brainwash attack, I show that across a broad range of continual learning baselines, including Experience Replay (ER) and ER-ACE, the forgetting goals of adversarial noise are met even with historically established methods. Results indicate that only via an appropriate noise budget did a trained continual learner catastrophically forget previously learned tasks. These findings emphasize underlying weaknesses in many of the current replay-based strategies.

Furthermore, we extended the original work on regularization-based methods where the attacker has access to the victim's model (white-box setting), to black-box attack settings, where the attacker does not have direct knowledge of the victim model’s architecture or parameters. The attacker trains an independent model and then runs the BrainWash noise optimization on this model, so when the victim model learns a task, which contains noise optimized by the black-box model still forgets the previously learnt tasks.

File

Nome file	Dimensione
Nimra_Th...Final.pdf	2.73 Mb
Contatta l’autore

ETD

Archivio digitale delle tesi discusse presso l’Università di Pisa

Tesi etd-06302025-110805