Tesi etd-03272025-093928 |
Link copiato negli appunti
Tipo di tesi
Tesi di laurea magistrale
Autore
MARINO, GABRIELE
URN
etd-03272025-093928
Titolo
Automated Extraction of Components from Technical Drawings using Computer Vision and Deep Learning
Dipartimento
INGEGNERIA DELL'INFORMAZIONE
Corso di studi
ARTIFICIAL INTELLIGENCE AND DATA ENGINEERING
Relatori
relatore Prof. Cimino, Mario Giovanni Cosimo Antonio
relatore Prof. Galatolo, Federico Andrea
relatore Dott. Consoloni, Marco
relatore Prof. Giordano, Vito
relatore Prof. Galatolo, Federico Andrea
relatore Dott. Consoloni, Marco
relatore Prof. Giordano, Vito
Parole chiave
- Components Extraction
- SAM
- Technical Drawing
- YOLO
Data inizio appello
14/04/2025
Consultabilità
Non consultabile
Data di rilascio
14/04/2095
Riassunto
Extracting individual components from technical drawings in patents is a significantly more complex task than segmenting objects in natural images.
This complexity arises from the lack of standardization in how designers create technical drawings, leading to high variability in their representation.
Moreover, despite their significance, analyzing and extracting information from technical drawings presents substantial challenges due to their complexity,
diverse styles, and varied formats. The absence of color, overlapping elements, and the presence of noise further complicate automated analysis.
This work presents a computer vision and deep learning-based pipeline to address this challenge. The process begins with a preprocessing phase where a YOLO model classifies
patent images to identify those containing technical drawings. A second YOLO model then segments multiple drawings that may appear within a single patent image.
Once a technical drawing is isolated, a computer vision algorithm leveraging image gradients extracts the endpoints of lead curves—annotation lines connecting
reference numbers to their corresponding components. These extracted points, along with the image, are then fed into the Segment Anything Model (SAM), a vision
transformer originally designed for segmenting natural images, to provide context for segmenting components within the technical drawing.
The performance of each stage in the pipeline has been evaluated, revealing limitations but also promising results that highlight the potential for further research
and improvements in this domain.
This complexity arises from the lack of standardization in how designers create technical drawings, leading to high variability in their representation.
Moreover, despite their significance, analyzing and extracting information from technical drawings presents substantial challenges due to their complexity,
diverse styles, and varied formats. The absence of color, overlapping elements, and the presence of noise further complicate automated analysis.
This work presents a computer vision and deep learning-based pipeline to address this challenge. The process begins with a preprocessing phase where a YOLO model classifies
patent images to identify those containing technical drawings. A second YOLO model then segments multiple drawings that may appear within a single patent image.
Once a technical drawing is isolated, a computer vision algorithm leveraging image gradients extracts the endpoints of lead curves—annotation lines connecting
reference numbers to their corresponding components. These extracted points, along with the image, are then fed into the Segment Anything Model (SAM), a vision
transformer originally designed for segmenting natural images, to provide context for segmenting components within the technical drawing.
The performance of each stage in the pipeline has been evaluated, revealing limitations but also promising results that highlight the potential for further research
and improvements in this domain.
File
Nome file | Dimensione |
---|---|
La tesi non è consultabile. |