Tesi etd-03162026-101954 |
Link copiato negli appunti
Tipo di tesi
Tesi di laurea magistrale
Autore
DRAGONI, ENEA
URN
etd-03162026-101954
Titolo
pAirception: Language-Guided Landing Site Detection in Unstructured Environments
Dipartimento
INGEGNERIA DELL'INFORMAZIONE
Corso di studi
INGEGNERIA ROBOTICA E DELL'AUTOMAZIONE
Relatori
relatore Avizzano, Carlo Alberto
co-supervisore D'Avella, Salvatore
tutor Busam, Benjamin
co-supervisore D'Avella, Salvatore
tutor Busam, Benjamin
Parole chiave
- 3D point clouds
- autonomous landing
- human-robot interaction
- large language models
- neuro-symbolic artificial intelligence
- semantic segmentation
- unmanned aerial vehicles
- vision-language models
Data inizio appello
10/04/2026
Consultabilità
Non consultabile
Data di rilascio
10/04/2029
Riassunto (Inglese)
The autonomous deployment of Unmanned Aerial Vehicles (UAVs) in unstructured environments requires a delicate balance between high-level semantic scene understanding and strict physical safety guarantees. Traditional geometry-based landing systems fail to distinguish between structurally flat terrain and hazardous materials (e.g., bodies of water or fragile glass roofs), while purely deep-learning approaches lack the deterministic spatial certainty required for critical flight operations. Furthermore, standard graphical human-robot interfaces often prove too rigid and demanding for pilots during high-stress emergency scenarios. To address these fundamental limitations, this thesis presents pAirception, a novel hybrid neuro-symbolic framework for autonomous UAV landing and language-driven pilot guidance.
The proposed architecture tightly couples the probabilistic reasoning capabilities of modern Foundation Models with the deterministic safety of 3D geometric processing. A 4-bit quantized Large Language Model (Phi-3-Mini) acts as an interactive reasoning agent, translating unstructured natural language commands into structured robotic actions. To establish semantic awareness, a Vision-Language Model (Qwen2-VL) autonomously identifies safe environmental priors, which condition a zero-shot Promptable Segmentation model (SAM3) to extract dense 2D Regions of Interest. These semantic hypotheses are subsequently projected into the 3D space and rigorously validated by a high-frequency C++ geometric core. Utilizing temporal ego-motion deskewing, robust planar extraction via RANSAC, and constant-time spatial queries via Integral Images, the system deterministically evaluates the physical slope, surface roughness, and obstacle clearance of the targeted terrain. Validated landing spots are then ranked dynamically and presented to the human operator through an intuitive Augmented Telemetry overlay.
To prove the viability of the architecture under authentic operational conditions, the framework was evaluated exclusively using real-world multimodal sensor data. The datasets were acquired in unstructured rural environments utilizing a custom UAV payload equipped with a Livox Mid-360 LiDAR and a global-shutter FLIR camera, with hardware time synchronization managed by a Raspberry Pi edge logger. The results demonstrate that delegating open-set semantic target selection to advanced neural networks, while strictly reserving physical safety validation for deterministic spatial algorithms, provides a highly robust, interactive, and reliable autonomous landing pipeline.
The proposed architecture tightly couples the probabilistic reasoning capabilities of modern Foundation Models with the deterministic safety of 3D geometric processing. A 4-bit quantized Large Language Model (Phi-3-Mini) acts as an interactive reasoning agent, translating unstructured natural language commands into structured robotic actions. To establish semantic awareness, a Vision-Language Model (Qwen2-VL) autonomously identifies safe environmental priors, which condition a zero-shot Promptable Segmentation model (SAM3) to extract dense 2D Regions of Interest. These semantic hypotheses are subsequently projected into the 3D space and rigorously validated by a high-frequency C++ geometric core. Utilizing temporal ego-motion deskewing, robust planar extraction via RANSAC, and constant-time spatial queries via Integral Images, the system deterministically evaluates the physical slope, surface roughness, and obstacle clearance of the targeted terrain. Validated landing spots are then ranked dynamically and presented to the human operator through an intuitive Augmented Telemetry overlay.
To prove the viability of the architecture under authentic operational conditions, the framework was evaluated exclusively using real-world multimodal sensor data. The datasets were acquired in unstructured rural environments utilizing a custom UAV payload equipped with a Livox Mid-360 LiDAR and a global-shutter FLIR camera, with hardware time synchronization managed by a Raspberry Pi edge logger. The results demonstrate that delegating open-set semantic target selection to advanced neural networks, while strictly reserving physical safety validation for deterministic spatial algorithms, provides a highly robust, interactive, and reliable autonomous landing pipeline.
Riassunto (Italiano)
File
| Nome file | Dimensione |
|---|---|
La tesi non è consultabile. |
|