Tesi etd-03162026-101954

Tipo di tesi

Tesi di laurea magistrale

Autore

DRAGONI, ENEA

URN

etd-03162026-101954

Titolo

pAirception: Language-Guided Landing Site Detection in Unstructured Environments

Dipartimento

INGEGNERIA DELL'INFORMAZIONE

Corso di studi

INGEGNERIA ROBOTICA E DELL'AUTOMAZIONE

Relatori

relatore Avizzano, Carlo Alberto
co-supervisore D'Avella, Salvatore
tutor Busam, Benjamin

Parole chiave

3D point clouds
autonomous landing
human-robot interaction
large language models
neuro-symbolic artificial intelligence
semantic segmentation
unmanned aerial vehicles
vision-language models

Data inizio appello

10/04/2026

Consultabilità

Non consultabile

Data di rilascio

10/04/2029

Riassunto (Inglese)

The autonomous deployment of Unmanned Aerial Vehicles (UAVs) in unstructured environments requires a delicate balance between high-level semantic scene understanding and strict physical safety guarantees. Traditional geometry-based landing systems fail to distinguish between structurally flat terrain and hazardous materials (e.g., bodies of water or fragile glass roofs), while purely deep-learning approaches lack the deterministic spatial certainty required for critical flight operations. Furthermore, standard graphical human-robot interfaces often prove too rigid and demanding for pilots during high-stress emergency scenarios. To address these fundamental limitations, this thesis presents pAirception, a novel hybrid neuro-symbolic framework for autonomous UAV landing and language-driven pilot guidance.
The proposed architecture tightly couples the probabilistic reasoning capabilities of modern Foundation Models with the deterministic safety of 3D geometric processing. A 4-bit quantized Large Language Model (Phi-3-Mini) acts as an interactive reasoning agent, translating unstructured natural language commands into structured robotic actions. To establish semantic awareness, a Vision-Language Model (Qwen2-VL) autonomously identifies safe environmental priors, which condition a zero-shot Promptable Segmentation model (SAM3) to extract dense 2D Regions of Interest. These semantic hypotheses are subsequently projected into the 3D space and rigorously validated by a high-frequency C++ geometric core. Utilizing temporal ego-motion deskewing, robust planar extraction via RANSAC, and constant-time spatial queries via Integral Images, the system deterministically evaluates the physical slope, surface roughness, and obstacle clearance of the targeted terrain. Validated landing spots are then ranked dynamically and presented to the human operator through an intuitive Augmented Telemetry overlay.
To prove the viability of the architecture under authentic operational conditions, the framework was evaluated exclusively using real-world multimodal sensor data. The datasets were acquired in unstructured rural environments utilizing a custom UAV payload equipped with a Livox Mid-360 LiDAR and a global-shutter FLIR camera, with hardware time synchronization managed by a Raspberry Pi edge logger. The results demonstrate that delegating open-set semantic target selection to advanced neural networks, while strictly reserving physical safety validation for deterministic spatial algorithms, provides a highly robust, interactive, and reliable autonomous landing pipeline.

Riassunto (Italiano)

File

Nome file	Dimensione
La tesi non è consultabile. Contatta l’autore

ETD

Archivio digitale delle tesi discusse presso l’Università di Pisa

Tesi etd-03162026-101954