Tesi etd-07042023-172612

Tipo di tesi

Tesi di laurea magistrale

Autore

AMARANTE, TOMMASO

URN

etd-07042023-172612

Titolo

Design and Development of a System for Counting-Related Visual Question Answering

Dipartimento

INGEGNERIA DELL'INFORMAZIONE

Corso di studi

ARTIFICIAL INTELLIGENCE AND DATA ENGINEERING

Relatori

relatore Prof. Cimino, Mario Giovanni Cosimo Antonio
relatore Prof. Falchi, Fabrizio
relatore Dott. Messina, Nicola
relatore Dott. Ciampi, Luca

Parole chiave

benchmark
countingVQA
dataset
visual question answering

Data inizio appello

21/07/2023

Consultabilità

Completa

Riassunto

The challenging task of Visual Question Answering (VQA) requires a thorough
comprehension of both visual content and natural language processing. Open-
ended counting is a special case of VQA where the goal is to answer specific
and possibly complex questions about the number of objects present in images.
However, even if counting is essential in many real-world applications, the de-
velopment and assessment of counting algorithms within the VQA domain are
limited by the scarcity of particular annotations for counting-related questions
in existing VQA datasets. To fill this gap, in this dissertation, we present Object-
CountingVQA, a brand-new dataset that focuses on the CountingVQA task.
This new benchmark comprises more than 2000 images, with more than 5500
associated question-answer combinations. One feature of ObjectCountingVQA is
that, in comparison to other benchmarks, it comprises more complex questions
that include adjectives and spatial relationships; this provides a challenging setup
for current VQA algorithms. We build those question-answer pairs automatically
by using chatGPT, a popular artificial intelligence chatbot developed by OpenAI,
starting from the structured data of the scene graphs of Visual Genome, a popu-
lar dataset for object detection and visual understanding. Then, we use Ground-
ingDINO, a powerful open-ended object detector, to automatically validate the
pairs and perform a first selection of good candidates. Finally, to ensure that all
the questions and answers were accurate enough to be exploited as a reliable
benchmark, we manually checked the generated annotations.
To demonstrate the potential of our ObjectCountingVQA dataset, we conduct
an experimental evaluation using a state-of-the-art VQA model, MOVIE+MCAN.
According to our findings, this newly introduced benchmark presents fresh chal-
lenges for the current VQA models, emphasizing the demand for specific counting
methods and challenging benchmarks.

File

Nome file	Dimensione
Tesi_Amarante.pdf	4.29 Mb
Contatta l’autore

ETD

Archivio digitale delle tesi discusse presso l’Università di Pisa

Tesi etd-07042023-172612