logo SBA

ETD

Digital archive of theses discussed at the University of Pisa

 

Thesis etd-06142022-120833


Thesis type
Tesi di laurea magistrale
Author
OLIVOTTO, VALENTINA
URN
etd-06142022-120833
Thesis title
Small object detection on high-resolution images: a comparison of Slice & Merge approaches
Department
INFORMATICA
Course of study
DATA SCIENCE AND BUSINESS INFORMATICS
Supervisors
relatore Prof. Gallicchio, Claudio
Keywords
  • fabric defect detection
  • high-resolution images
  • object detection
  • yolo
Graduation session start date
01/07/2022
Availability
Withheld
Release date
01/07/2092
Summary
This work aims to present a comparison of three new approaches, based on YOLOv5 architecture, for an industrial laundry's defect detection task. Automated technologies in this sector are valuable elements in reducing costs, time, and increasing customer satisfaction. Current approaches are based on traditional Computer Vision techniques, taking into consideration pixels’ characteristics and their statistical properties. These systems have many limitations in terms of robustness and hardware complexity. Thus, the idea is to take advantage of Artificial Intelligence models, based on Convolutional Neural Networks, to overcome these issues. In particular, YOLO has been considered the best choice for this purpose, since it reaches excellent performance on the state-of-the-art real-time object detection tasks, in terms of both speed and accuracy.

The main issue in this case study is that the dataset includes high-resolution images, while the defects to predict are few pixels in size. The small object detection on large images is an open problem in the Computer Vision field, especially for industrial tasks, since the available hardware is limited. As outlined by the results of the two proposed baselines, it is necessary to find a good tradeoff between accuracy in object detection and low computational costs. Thus, in this thesis, three different Slice & Merge approaches have been implemented and analyzed. The strategy is to slice the original input images, in order to reduce the computational costs, maintaining at the same time the details of high-resolution data. In particular, three slicing approaches have been considered to create the tiles, and the image_bbox_slicer library has been used to crop the images and to generate their corresponding annotations files. After that, YOLOv5s has been used to train a model and detect the bounding boxes in the test set. Ultimately, since the final outcome should be the original whole image with all the predicted boxes, two merge methodologies have been proposed. The possible heuristics of merging are various, so all the cases considered and the decisions are reported and explained.

The experimental analysis performed in the thesis shows the competitiveness of the introduced approaches, which are able to reach performance as per high-resolution images, while at the same time enabling reducing the computational requirements for the deep learning algorithms.
File