logo SBA

ETD

Archivio digitale delle tesi discusse presso l’Università di Pisa

Tesi etd-06222023-204437


Tipo di tesi
Tesi di laurea magistrale
Autore
MASSETTI, MATTEO
URN
etd-06222023-204437
Titolo
Leveraging Physical Cues for Learned Representations in Visual Question Answering
Dipartimento
INFORMATICA
Corso di studi
INFORMATICA
Relatori
relatore Prof. Bacciu, Davide
relatore Dott. Valenti, Andrea
Parole chiave
  • Machine Learning
  • Representation Learning
  • Multi Task Learning
  • Visual Question Answering
Data inizio appello
21/07/2023
Consultabilità
Completa
Riassunto
Inferring knowledge from various sources and data, such as natural language
and visual data, is challenging. Several tasks were presented to reach this aim,
however, it is not just a matter of solving the task, but it is the assessment of
the models’ ability to ground natural language information in the visual world.
GuessWhat?! is an evaluation framework aiming at assessing the performances
of multi-modal conversational models. It is structured as a game in which two
players are collaborating for reaching a common objective, by the means of
generating and answering questions related to a visual scene.
This work presents a new version of the Imagination Module, which is part
of both player architectures and helps them to improve their understanding of
textual and visual information. The presented version integrates the information
about object attributes in the learned representation, to further improve the
generalization and grounding capabilities of the models.
File