logo SBA

ETD

Archivio digitale delle tesi discusse presso l’Università di Pisa

Tesi etd-06142022-120833


Tipo di tesi
Tesi di laurea magistrale
Autore
OLIVOTTO, VALENTINA
URN
etd-06142022-120833
Titolo
Small object detection on high-resolution images: a comparison of Slice & Merge approaches
Dipartimento
INFORMATICA
Corso di studi
DATA SCIENCE AND BUSINESS INFORMATICS
Relatori
relatore Prof. Gallicchio, Claudio
Parole chiave
  • object detection
  • yolo
  • high-resolution images
  • fabric defect detection
Data inizio appello
01/07/2022
Consultabilità
Non consultabile
Data di rilascio
01/07/2092
Riassunto
This work aims to present a comparison of three new approaches, based on YOLOv5 architecture, for an industrial laundry's defect detection task. Automated technologies in this sector are valuable elements in reducing costs, time, and increasing customer satisfaction. Current approaches are based on traditional Computer Vision techniques, taking into consideration pixels’ characteristics and their statistical properties. These systems have many limitations in terms of robustness and hardware complexity. Thus, the idea is to take advantage of Artificial Intelligence models, based on Convolutional Neural Networks, to overcome these issues. In particular, YOLO has been considered the best choice for this purpose, since it reaches excellent performance on the state-of-the-art real-time object detection tasks, in terms of both speed and accuracy.

The main issue in this case study is that the dataset includes high-resolution images, while the defects to predict are few pixels in size. The small object detection on large images is an open problem in the Computer Vision field, especially for industrial tasks, since the available hardware is limited. As outlined by the results of the two proposed baselines, it is necessary to find a good tradeoff between accuracy in object detection and low computational costs. Thus, in this thesis, three different Slice & Merge approaches have been implemented and analyzed. The strategy is to slice the original input images, in order to reduce the computational costs, maintaining at the same time the details of high-resolution data. In particular, three slicing approaches have been considered to create the tiles, and the image_bbox_slicer library has been used to crop the images and to generate their corresponding annotations files. After that, YOLOv5s has been used to train a model and detect the bounding boxes in the test set. Ultimately, since the final outcome should be the original whole image with all the predicted boxes, two merge methodologies have been proposed. The possible heuristics of merging are various, so all the cases considered and the decisions are reported and explained.

The experimental analysis performed in the thesis shows the competitiveness of the introduced approaches, which are able to reach performance as per high-resolution images, while at the same time enabling reducing the computational requirements for the deep learning algorithms.
File