Tesi etd-04032025-150712 |
Link copiato negli appunti
Tipo di tesi
Tesi di laurea magistrale
Autore
GIANNICO, ENRICO FRANCESCO
URN
etd-04032025-150712
Titolo
Integrating obstacle detection and distance estimation in railway environments by deep learning
Dipartimento
INGEGNERIA DELL'INFORMAZIONE
Corso di studi
EMBEDDED COMPUTING SYSTEMS
Relatori
relatore Prof. Buttazzo, Giorgio C.
relatore Prof. Cococcioni, Marco
tutor Dott. de Gioia, Francesco
relatore Prof. Cococcioni, Marco
tutor Dott. de Gioia, Francesco
Parole chiave
- autonomous trains
- class remapping
- convolutional neural networks
- deep learning
- distance estimation
- distance sensors
- fine-tuning
- MiDaS
- monocular depth estimation
- neural networks
- neural networks integration
- object detection
- obstacle detection
- PyTorch
- railway environment
- railway safety
- scene understanding
- semantic segmentation
- synthetic dataset
- YOLO
Data inizio appello
27/05/2025
Consultabilità
Non consultabile
Data di rilascio
27/05/2028
Riassunto
This thesis is part of an industrial project involving the ReTiS Lab (Real-Time Systems Laboratory) at Sant’Anna School of Advanced Studies focused on the research of new artificial intelligence algorithms for obstacle detection in railway environments.
The work began with an analysis of the most recent approaches in the state of the art, which generally address three main tasks: rail track identification, obstacle detection, and distance estimation, of which the latter two are addressed in this thesis.
Earlier approaches to rail track identification and obstacle detection relied on computer vision methods and hand-crafted features, which often proved unreliable due to the high variability of railway scenarios. Today, thanks to advances in artificial intelligence and increased computational power, numerous neural network-based models have been developed that are better suited to tackle these tasks.
Among object detection neural networks, YOLO (You Only Look Once) has distinguished itself for its flexibility, accuracy and real-time performance. However, it is trained on datasets like Microsoft COCO, which includes recognition of 80 different object classes, many of which are not relevant for the railway sector. This issue has been addressed using the fine-tuning technique, allowing the network to detect only the relevant classes of interest.
The other task addressed in this thesis is object distance estimation. Until now, the primary approach has been to use specific sensors, either active (e.g., LiDAR, radar) or passive (e.g., stereo camera), which come with high costs and require periodic calibration and maintenance. Therefore, a study was conducted on available monocular depth estimation neural networks to allow distance estimation of potential obstacles using just a single camera. This is an ill-posed problem, as depth estimation from a single frame introduces scale and generalization issues. Indeed, most monocular depth estimation neural network models perform well for the specific dataset they were trained on, but they show significant performance loss on different datasets. However, MiDaS has addressed these generalization issues by training on multiple diverse datasets, compensating for their differences with a custom loss function. This has led to good generalization capabilities, at the cost of relative depth output estimates (i.e., not in metric form). To solve this issue, fine-tuning of the model was performed on a synthetic dataset (provided by the ReTiS Lab), which includes railway images compatible with real-world railway scenarios and absolute (i.e., metric) depth ground truth data.
Finally, the two neural network models were integrated into a single system capable of simultaneously detecting the presence of an obstacle in the scene and estimating its distance. This system can be used either on its own, providing a coarse depth estimation of the object, or in combination with a more precise distance sensor (e.g., LiDAR), resulting in a more reliable system.
The work began with an analysis of the most recent approaches in the state of the art, which generally address three main tasks: rail track identification, obstacle detection, and distance estimation, of which the latter two are addressed in this thesis.
Earlier approaches to rail track identification and obstacle detection relied on computer vision methods and hand-crafted features, which often proved unreliable due to the high variability of railway scenarios. Today, thanks to advances in artificial intelligence and increased computational power, numerous neural network-based models have been developed that are better suited to tackle these tasks.
Among object detection neural networks, YOLO (You Only Look Once) has distinguished itself for its flexibility, accuracy and real-time performance. However, it is trained on datasets like Microsoft COCO, which includes recognition of 80 different object classes, many of which are not relevant for the railway sector. This issue has been addressed using the fine-tuning technique, allowing the network to detect only the relevant classes of interest.
The other task addressed in this thesis is object distance estimation. Until now, the primary approach has been to use specific sensors, either active (e.g., LiDAR, radar) or passive (e.g., stereo camera), which come with high costs and require periodic calibration and maintenance. Therefore, a study was conducted on available monocular depth estimation neural networks to allow distance estimation of potential obstacles using just a single camera. This is an ill-posed problem, as depth estimation from a single frame introduces scale and generalization issues. Indeed, most monocular depth estimation neural network models perform well for the specific dataset they were trained on, but they show significant performance loss on different datasets. However, MiDaS has addressed these generalization issues by training on multiple diverse datasets, compensating for their differences with a custom loss function. This has led to good generalization capabilities, at the cost of relative depth output estimates (i.e., not in metric form). To solve this issue, fine-tuning of the model was performed on a synthetic dataset (provided by the ReTiS Lab), which includes railway images compatible with real-world railway scenarios and absolute (i.e., metric) depth ground truth data.
Finally, the two neural network models were integrated into a single system capable of simultaneously detecting the presence of an obstacle in the scene and estimating its distance. This system can be used either on its own, providing a coarse depth estimation of the object, or in combination with a more precise distance sensor (e.g., LiDAR), resulting in a more reliable system.
File
Nome file | Dimensione |
---|---|
La tesi non è consultabile. |