Tesi etd-06302025-122029

Tipo di tesi

Tesi di laurea magistrale

URN

etd-06302025-122029

Titolo

Developing Multimodal Deep Learning Models for Technical Documents Representation and Retrieval

Dipartimento

INGEGNERIA DELL'INFORMAZIONE

Corso di studi

ARTIFICIAL INTELLIGENCE AND DATA ENGINEERING

Relatori

.

relatore Prof. Cimino, Mario Giovanni Cosimo Antonio
relatore Prof. Galatolo, Federico Andrea
relatore Consoloni, Marco

Parole chiave

CLIP
embeddings
information retrieval
multimodal retrieval
technical documents
technical drawings

Data inizio appello

23/07/2025

Consultabilità

Non consultabile

Data di rilascio

23/07/2095

Riassunto (Inglese)

Riassunto (Italiano)

Existing approaches for information retrieval applied on technical documents have overlooked the potential of analyzing technical drawings in conjunction with textual data to enhance the accuracy of prior art searches. Integrating visual information with textual information to bridge the gap between visual perception and language understanding can significantly support the Engineering Design (ED) process. The ED process is a series of steps that engineers follow to find a solution to a problem. The steps include problem solving processes such as, for example, determining your objectives and constraints, prototyping, testing and evaluation. In fact, images provide synthetic representation of design artifact/process, and they emerge as the primary mode of communications among innovators, engineers, and designers throughout the ED phases. Pre-existing works did not perform tasks like image rotation, segmentation or specializing on a specific domain, increasing the complexity of the input given to the retrieval system and making more difficult to find problems if they arise. The proposed solution comprises a full dataset creation pipeline to polish the input and create descriptions aligning images with sentences in documents text. A model architecture definition, specialized for the given problem, is also provided. The results shows that performing retrieval with embeddings generated by a CLIP like model may not be enough and that more complex information retrieval pipelines are needed in order to handle technical documents.

File

Nome file	Dimensione
La tesi non è consultabile. Contatta l’autore

ETD

Archivio digitale delle tesi discusse presso l’Università di Pisa

Tesi etd-06302025-122029