Tesi etd-06112025-171234

Tipo di tesi

Tesi di laurea magistrale

Autore

RUSSO, FEDERICA

URN

etd-06112025-171234

Titolo

Algoritmi di Reinforcement Learning per obstacle avoidance di un veicolo autonomo

Dipartimento

INGEGNERIA DELL'INFORMAZIONE

Corso di studi

INGEGNERIA ROBOTICA E DELL'AUTOMAZIONE

Relatori

relatore Prof. Buttazzo, Giorgio C.
relatore Ing. Nesti, Federico

Parole chiave

autonomous driving
guida autonoma
obstacle avoidance
Reinforcement Learrning

Data inizio appello

18/07/2025

Consultabilità

Completa

Riassunto

I veicoli autonomi costituiscono un ambito di ricerca avanzato e in rapida evoluzione in robotica e in intelligenza artificiale. Nello specifico, il problema dell’obstacle avoidance è uno degli aspetti maggiormente affrontati per la navigazione di un veicolo autonomo. Tra gli approcci possibili per affrontare tale problema, accanto a metodi tradizionali, vi è quello del Reinforcement Learning (RL). Gli algoritmi di RL hanno il vantaggio di far muovere un veicolo senza avere la conoscenza a priori di mappe dell’ambiente. Questa tesi esplora l’utilizzo di algoritmi di Reinforcement Learning per dotare un rover a quattro ruote, munito di sensore LiDAR, della capacità di evitare ostacoli in modo autonomo. Il rover è stato modellato come un uniciclo e sono stati implementati tre differenti algoritmi di Reinforcement Learning: Q-learning, Deep Q-learning (DQN) e Deep Deterministic Policy Gradient (DDPG). L’obiettivo della tesi è duplice, da un lato mostrare che tali algoritmi consentono al veicolo di evitare gli ostacoli, dall’altro effettuare delle analisi comparative tra gli algoritmi RL mostrandone le differenze di prestazioni. Gli agenti di RL sono stati allenati su numerose piste, generate in maniera randomica, dove è stato definito un punto di partenza e un punto di arrivo e quattro ostacoli disposti causalmente lungo la pista. L’obiettivo della fase di allenamento è riuscire a sviluppare degli algoritmi che consentano di raggiungere il punto di arrivo di una pista in presenza di ostacoli. Per verificare la riuscita dell’allenamento e valutare le performance di ogni algoritmo sono state create 100 piste e, per quantificare le prestazioni, sono state considerate diverse metriche: la percentuale di successo (ossia il raggiungimento del punto di arrivo), la distanza minima durante il percorso del rover dagli ostacoli e dalla pista, la velocità angolare media indice dello sforzo di controllo e la lunghezza della traiettoria compiuta dal veicolo rispetto alla lunghezza della traiettoria a minimo spazio percorso. La traiettoria a minimo spazio percorso è stata calcolata mediante un framework di ottimizzazione che ha mostrato risultati migliori rispetto all’algoritmo A*. I risultati sugli algoritmi RL hanno mostrato che: la percentuale di successo per l’agente Q-learning è del 98%, del DQN è del 99% e del DDPG è del 92%, il Q-learning è l’agente che tende a muoversi più vicino agli ostacoli mentre l’agente DDPG è quello che garantisce uno sforzo di controllo inferiore rispetto agli altri. Non ci sono invece differenze significative in termini di lunghezza della traiettoria percorsa dal rover con i tre agenti. Infine, gli agenti di RL sono stati testati mediante prove sperimentali su un rover Agilex Scout Mini al fine di verificare il corretto funzionamento degli algoritmi ampiamente testati in simulazione. Dalle prove sperimentali, effettuate in ambiente controllato, è emersa una buona risposta degli algoritmi sia in assenza che in presenza di ostacoli.

Autonomous vehicles are an advanced and rapidly evolving field of research in robotics and artificial intelligence. Specifically, the problem of obstacle avoidance is one of the most frequently addressed aspects of navigating an autonomous vehicle. Among the possible approaches to address this problem, alongside traditional methods, is Reinforcement Learning (RL). RL algorithms have the advantage of making a vehicle move without a priori knowledge of maps of the environment. This thesis explores the use of Reinforcement Learning algorithms to provide a four-wheeled rover, equipped with a LiDAR sensor, with the ability to avoid obstacles autonomously. The rover was modelled as a unicycle and three different Reinforcement Learning algorithms were implemented: Q-learning, Deep Q-learning (DQN) and Deep Deterministic Policy Gradient (DDPG). The aim of the thesis is twofold, on the one hand to show that these algorithms enable the vehicle to avoid obstacles, and on the other hand to carry out comparative analyses between the RL algorithms by showing the differences in their performance. RL agents were trained on a number of randomly generated tracks, where a start and end point and four obstacles causally arranged along the track were defined. The objective of the training phase is to develop algorithms for reaching the end point of a track in the presence of obstacles. To verify the success of the training and evaluate the performance of each algorithm, 100 tracks were created and several metrics were considered to quantify performance: the success rate (i.e. reaching the arrival point), the minimum distance during the rover's path from the obstacles and the track, the average angular velocity as an indication of the control effort, and the length of the trajectory completed by the vehicle in relation to the length of the trajectory at minimum path space. The minimum-travel-space trajectory was calculated using an optimisation framework that showed better results than the A* algorithm. The results on the RL-algorithms showed that: the success rate for the Q-learning agent is 98%, of the DQN is 99% and of the DDPG is 92%, the Q-learning is the agent that tends to move closer to obstacles while the DDPG is the one that provides less control effort than the others. On the other hand, there are no significant differences in terms of the length of the trajectory travelled by the rover with the three agents. Finally, the RL agents were tested by means of experimental trials on an Agilex Scout Mini rover in order to verify the correct behaviour of the algorithms extensively tested in simulation. The experimental tests, carried out in a controlled environment, revealed a good response of the algorithms both in the absence and presence of obstacles.

File

Nome file	Dimensione
Tesi_Fed...Russo.pdf	6.56 Mb
Contatta l’autore

ETD

Archivio digitale delle tesi discusse presso l’Università di Pisa

Tesi etd-06112025-171234