ETD

Archivio digitale delle tesi discusse presso l’Università di Pisa

Tesi etd-02092026-143230

Tipo di tesi

Tesi di laurea magistrale

URN

etd-02092026-143230

Titolo

Curiosity-based Reinforcement Learning

Dipartimento

INGEGNERIA DELL'INFORMAZIONE

Corso di studi

INGEGNERIA ROBOTICA E DELL'AUTOMAZIONE

Parole chiave

artificial intelligence
control
curiosity-based reinforcement learning
manipulation
navigation
reinforcement learning
sparse reward

Data inizio appello

24/02/2026

Consultabilità

Completa

Riassunto (Inglese)

The thesis initially focused on the study of the main Reinforcement Learning methods, highlighting their principal limitations and their temporal evolution. Subsequently, the sparse reward problem was analyzed in detail, which is associated with the lack of updates of the current policy due to the low frequency of feedback provided by the environment. To address this issue, several curiosity-based methods proposed in the literature were examined. Among these, two different methodologies were implemented and experimentally compared with the selected Reinforcement Learning algorithm in applications characterized by sparse rewards and increasing levels of complexity.
Three different types of tasks were therefore considered: the control of an inverted pendulum, a manipulation task based on visual observations only, and a navigation task in the presence of obstacles, which combines visual observations with other types of sensory information. In the first experiment, it was observed that the integration of curiosity modules does not provide significant benefits, while incurring a higher computational cost. In the second experiment, instead, the contribution of the modules appears to be more evident. It was therefore concluded that curiosity-based methods are determinant for achieving a performance improvement in the specific application scenario.
In the final experiment, it was observed that the introduction of multimodal observations led to a marked deterioration in the performance of one of the two methods, mainly due to the higher sensitivity of the module to visual observations. This behavior can be attributed to a component fixed at the beginning of the training process, which is unable to adapt the relative importance of the received observations. Conversely, the second method updates the internal feature extractor of the model, thereby avoiding this limitation and ensuring, also in this case, superior performance compared to the other evaluated methods.

Riassunto (Italiano)

Il lavoro di tesi si è focalizzato inizialmente sullo studio dei principali metodi di Reinforcement Learning evidenziandone le principali criticità e la relativa evoluzione temporale. Successivamente, si è analizzata nel dettaglio la problematica dei reward sparsi, associata al mancato aggiornamento della policy corrente a causa della ridotta frequenza del feedback fornito dall’ambiente. Per la risoluzione di tale problematica sono stati analizzati i diversi metodi presenti in letteratura basati sulla curiosità. Tra questi, due differenti metodologie sono state implementate e confrontate sperimentalmente con l’algoritmo di Reinforcement Learning selezionato, in applicazioni caratterizzate dalla presenza di reward sparsi e da un livello di complessità crescente. Sono state dunque prese in considerazione tre differenti tipologie di task: il controllo di un pendolo inverso, un task di manipolazione con l’utilizzo delle osservazioni visive e un compito di navigazione in presenza di ostacoli, che combina le osservazioni visive con altre osservazioni di differente tipologia. Nel primo esperimento si è osservato che l’integrazione dei moduli di curiosità non apporta benefici significativi, a fronte di un costo computazionale superiore. Nel secondo esperimento, invece, il contributo dei moduli risulta più evidente. Si è concluso quindi che i metodi di curiosità risultano determinanti per ottenere un incremento delle prestazioni, nello specifico caso applicativo. Nell’ultimo esperimento si è invece constatato come l’introduzione di osservazioni multimodali abbia portato ad un marcato deterioramento delle prestazioni di uno dei due metodi, principalmente dovuto alla maggiore sensibilità del modulo alle osservazioni visive. Questo comportamento è riconducibile a una componente fissata all’inizio dell’addestramento, la quale non può variare l’importanza delle osservazioni ricevute. Differentemente dal primo, il secondo metodo aggiorna il feature extractor interno al modello, evitando dunque tale problematica e garantendo anche in questo caso prestazioni superiori rispetto ai metodi confrontati.

File

Nome file	Dimensione
Tesi_Mag...nnoni.pdf	23.61 Mb
Contatta l’autore