logo SBA

ETD

Archivio digitale delle tesi discusse presso l’Università di Pisa

Tesi etd-02062026-123950


Tipo di tesi
Tesi di laurea magistrale
Autore
BURGISI, MARTINA
URN
etd-02062026-123950
Titolo
YOLO-based Multimodal Rock Detection for a Lunar Rover with Sim-to-Real Domain Bridging
Dipartimento
INGEGNERIA DELL'INFORMAZIONE
Corso di studi
ARTIFICIAL INTELLIGENCE AND DATA ENGINEERING
Relatori
relatore Prof. Cimino, Mario Giovanni Cosimo Antonio
relatore Ing. Cowley, Aidan
Parole chiave
  • autonomous driving
  • autonomous navigation
  • curriculum learning
  • depth
  • EAC
  • ESA
  • European Astronaut Centre
  • European Space Agency
  • hybrid dataset
  • LUNA Analog Facility
  • lunar surface
  • moon
  • moon's surface
  • multimodal
  • object detection
  • obstacle detection
  • planetary exploration
  • rgbd
  • robot
  • robotic
  • rock detection
  • rover
  • sim-to-real
  • space exploration
  • style transfer
  • YOLO
  • YOLO-based
Data inizio appello
27/02/2026
Consultabilità
Non consultabile
Data di rilascio
27/02/2096
Riassunto (Inglese)
Riassunto (Italiano)
This work addresses the problem of obstacle detection in lunar scenarios characterized by extreme illumination conditions, investigating the combined role of sim-to-real domain adaptation and multimodal RGB–Depth perception. The study first establishes strong RGB-only baselines trained under controlled illumination, analyzing the impact of dataset composition on generalization performance. Real images, synthetic data, and sim-to-real stylized samples are evaluated to quantify their contribution to detection accuracy. Results show that, while style-transferred synthetic data improves generalization under favorable lighting, RGB-only detectors suffer significant performance degradation under strong illumination variations and low-visibility scenarios.

To address these limitations, a mid-fusion RGB–Depth detection architecture is proposed, integrating geometric information into a YOLO-based framework. A progressive fine-tuning strategy is introduced to stabilize training and mitigate catastrophic forgetting when extending RGB-pretrained models to multimodal inputs. Extensive experiments demonstrate that the proposed RGB–Depth approach significantly improves detection robustness compared to RGB-only baselines. In particular, depth information enables reliable object detection in scenarios where photometric cues become unreliable, without degrading performance under standard illumination conditions.
File