Tesi etd-01272025-185906

Tipo di tesi

Tesi di laurea magistrale

Autore

BRUNELLI, ADELMO

URN

etd-01272025-185906

Titolo

Reward Machine for Real World Navigation

Dipartimento

INGEGNERIA DELL'INFORMAZIONE

Corso di studi

ARTIFICIAL INTELLIGENCE AND DATA ENGINEERING

Relatori

relatore Cimino, Mario Giovanni Cosimo Antonio
relatore Russo, Alessandra

Parole chiave

Applied Logic
Isaac Simulator
Nav2
Neuro-Symbolic Approach
Python
Reinforcement learning
Reward machine
Robotics
ROS2
Unitree
WebRTC
YOLO

Data inizio appello

21/02/2025

Consultabilità

Non consultabile

Data di rilascio

21/02/2065

Riassunto

In my thesis, I developed and implemented a novel algorithm for neuro-symbolic reinforcement learning, focusing on simulation-based robotic tasks in a complex environment. Specifically, I used the Isaac simulation environment to train a Go2 Unitree robot to perform tasks characterized by non-Markovian properties. The primary objective was to combine symbolic reasoning with reinforcement learning to enable the robot to handle sequential decision-making and adapt to varying environmental conditions while improving its overall efficiency and accuracy. The research began by designing a simulated environment that mimics a room containing four distinct boxes, two of which were labeled as "dangerous" and the other two as "safe". This environment served as the foundation for testing and validating the algorithm. To create a realistic scenario, additional environmental factors, such as varying lighting conditions, were introduced. These elements increased the complexity of the environment and tested the robot's ability to adapt under dynamic conditions. The robot's first task involved constructing a detailed map of its surroundings using the SLAM (Simultaneous Localization and Mapping) framework. SLAM enabled the robot to identify spatial features and localize itself within the environment, which was essential for subsequent navigation. This mapping process was carried out iteratively, allowing the robot to refine its understanding of the environment as it explored more areas and encountered new obstacles. Once the environment was mapped, the robot utilized the Nav2 framework to autonomously calculate the optimal paths for navigating between the boxes. Nav2 provided robust path-planning capabilities, allowing the robot to move efficiently and adapt to dynamic changes within the environment. The navigation process was non-deterministic, meaning the robot was instructed to approach any box at random. This added an element of unpredictability, ensuring that the algorithm was tested under diverse conditions. When the robot reached proximity to a box, it was required to identify its type—either safe or dangerous. If the box was deemed dangerous, the robot calculated a probability associated with its danger level and recorded this event in a sequence referred to as the trace. The trace played a crucial role in the learning process. It represented a sequence of events that captured the robot's interactions with the environment, including identifying and categorizing boxes. This sequence was then used by a Noisy Learner algorithm to robustly train a Reward Machine. The Reward Machine was a symbolic representation of the task, designed to generalize the robot's behavior and decision-making beyond the specific scenarios encountered during training. By integrating symbolic reasoning into the learning process, the algorithm aimed to enhance the robot's ability to adapt to new and complex tasks, even when faced with partial or noisy information. This approach also enabled the system to account for long-term dependencies in sequential decision-making tasks, a challenge in traditional reinforcement learning methods. Finally, the algorithm and the Reward Machine were tested in a real-world setting using an actual robot. The simulated environment was recreated in the physical world with careful attention to detail, ensuring fidelity between the simulation and reality. The algorithm was successfully executed, demonstrating its practical applicability and robustness in a real-world scenario. This step not only validated the effectiveness of the proposed approach but also highlighted the potential for future applications in various fields, such as industrial automation, rescue operations, and autonomous exploration.

File

Nome file	Dimensione
La tesi non è consultabile. Contatta l’autore

ETD

Archivio digitale delle tesi discusse presso l’Università di Pisa

Tesi etd-01272025-185906