Tesi etd-02092026-143230 |
Link copiato negli appunti
Tipo di tesi
Tesi di laurea magistrale
Autore
MANNONI, GIANLUCA
URN
etd-02092026-143230
Titolo
Curiosity-based Reinforcement Learning
Dipartimento
INGEGNERIA DELL'INFORMAZIONE
Corso di studi
INGEGNERIA ROBOTICA E DELL'AUTOMAZIONE
Relatori
relatore Prof. Buttazzo, Giorgio C.
Parole chiave
- artificial intelligence
- control
- curiosity-based reinforcement learning
- manipulation
- navigation
- reinforcement learning
- sparse reward
Data inizio appello
24/02/2026
Consultabilità
Completa
Riassunto (Inglese)
Riassunto (Italiano)
Il lavoro di tesi si è focalizzato inizialmente sullo studio dei principali metodi di Reinforcement Learning evidenziandone le principali criticità e la relativa evoluzione temporale. Successivamente, si è analizzata nel dettaglio la problematica dei reward sparsi, associata al mancato aggiornamento della policy corrente a causa della ridotta frequenza del feedback fornito dall’ambiente. Per la risoluzione di tale problematica sono stati analizzati i diversi metodi presenti in letteratura basati sulla curiosità. Tra questi, due differenti metodologie sono state implementate e confrontate sperimentalmente con l’algoritmo di Reinforcement Learning selezionato, in applicazioni caratterizzate dalla presenza di reward sparsi e da un livello di complessità crescente. Sono state dunque prese in considerazione tre differenti tipologie di task: il controllo di un pendolo inverso, un task di manipolazione con l’utilizzo delle osservazioni visive e un compito di navigazione in presenza di ostacoli, che combina le osservazioni visive con altre osservazioni di differente tipologia. Nel primo esperimento si è osservato che l’integrazione dei moduli di curiosità non apporta benefici significativi, a fronte di un costo computazionale superiore. Nel secondo esperimento, invece, il contributo dei moduli risulta più evidente. Si è concluso quindi che i metodi di curiosità risultano determinanti per ottenere un incremento delle prestazioni, nello specifico caso applicativo. Nell’ultimo esperimento si è invece constatato come l’introduzione di osservazioni multimodali abbia portato ad un marcato deterioramento delle prestazioni di uno dei due metodi, principalmente dovuto alla maggiore sensibilità del modulo alle osservazioni visive. Questo comportamento è riconducibile a una componente fissata all’inizio dell’addestramento, la quale non può variare l’importanza delle osservazioni ricevute. Differentemente dal primo, il secondo metodo aggiorna il feature extractor interno al modello, evitando dunque tale problematica e garantendo anche in questo caso prestazioni superiori rispetto ai metodi confrontati.
The thesis initially focused on the study of the main Reinforcement Learning methods, highlighting their principal limitations and their temporal evolution. Subsequently, the sparse reward problem was analyzed in detail, which is associated with the lack of updates of the current policy due to the low frequency of feedback provided by the environment. To address this issue, several curiosity-based methods proposed in the literature were examined. Among these, two different methodologies were implemented and experimentally compared with the selected Reinforcement Learning algorithm in applications characterized by sparse rewards and increasing levels of complexity.
Three different types of tasks were therefore considered: the control of an inverted pendulum, a manipulation task based on visual observations only, and a navigation task in the presence of obstacles, which combines visual observations with other types of sensory information. In the first experiment, it was observed that the integration of curiosity modules does not provide significant benefits, while incurring a higher computational cost. In the second experiment, instead, the contribution of the modules appears to be more evident. It was therefore concluded that curiosity-based methods are determinant for achieving a performance improvement in the specific application scenario.
In the final experiment, it was observed that the introduction of multimodal observations led to a marked deterioration in the performance of one of the two methods, mainly due to the higher sensitivity of the module to visual observations. This behavior can be attributed to a component fixed at the beginning of the training process, which is unable to adapt the relative importance of the received observations. Conversely, the second method updates the internal feature extractor of the model, thereby avoiding this limitation and ensuring, also in this case, superior performance compared to the other evaluated methods.
The thesis initially focused on the study of the main Reinforcement Learning methods, highlighting their principal limitations and their temporal evolution. Subsequently, the sparse reward problem was analyzed in detail, which is associated with the lack of updates of the current policy due to the low frequency of feedback provided by the environment. To address this issue, several curiosity-based methods proposed in the literature were examined. Among these, two different methodologies were implemented and experimentally compared with the selected Reinforcement Learning algorithm in applications characterized by sparse rewards and increasing levels of complexity.
Three different types of tasks were therefore considered: the control of an inverted pendulum, a manipulation task based on visual observations only, and a navigation task in the presence of obstacles, which combines visual observations with other types of sensory information. In the first experiment, it was observed that the integration of curiosity modules does not provide significant benefits, while incurring a higher computational cost. In the second experiment, instead, the contribution of the modules appears to be more evident. It was therefore concluded that curiosity-based methods are determinant for achieving a performance improvement in the specific application scenario.
In the final experiment, it was observed that the introduction of multimodal observations led to a marked deterioration in the performance of one of the two methods, mainly due to the higher sensitivity of the module to visual observations. This behavior can be attributed to a component fixed at the beginning of the training process, which is unable to adapt the relative importance of the received observations. Conversely, the second method updates the internal feature extractor of the model, thereby avoiding this limitation and ensuring, also in this case, superior performance compared to the other evaluated methods.
File
| Nome file | Dimensione |
|---|---|
| Tesi_Mag...nnoni.pdf | 23.61 Mb |
Contatta l’autore |
|