Tesi etd-06222023-204437

Tipo di tesi

Tesi di laurea magistrale

Autore

MASSETTI, MATTEO

URN

etd-06222023-204437

Titolo

Leveraging Physical Cues for Learned Representations in Visual Question Answering

Dipartimento

INFORMATICA

Corso di studi

INFORMATICA

Relatori

relatore Prof. Bacciu, Davide
relatore Dott. Valenti, Andrea

Parole chiave

Machine Learning
Multi Task Learning
Representation Learning
Visual Question Answering

Data inizio appello

21/07/2023

Consultabilità

Completa

Riassunto

Inferring knowledge from various sources and data, such as natural language
and visual data, is challenging. Several tasks were presented to reach this aim,
however, it is not just a matter of solving the task, but it is the assessment of
the models’ ability to ground natural language information in the visual world.
GuessWhat?! is an evaluation framework aiming at assessing the performances
of multi-modal conversational models. It is structured as a game in which two
players are collaborating for reaching a common objective, by the means of
generating and answering questions related to a visual scene.
This work presents a new version of the Imagination Module, which is part
of both player architectures and helps them to improve their understanding of
textual and visual information. The presented version integrates the information
about object attributes in the learned representation, to further improve the
generalization and grounding capabilities of the models.

File

Nome file	Dimensione
thesis_Massetti.pdf	7.10 Mb
Contatta l’autore

ETD

Archivio digitale delle tesi discusse presso l’Università di Pisa

Tesi etd-06222023-204437