logo SBA

ETD

Archivio digitale delle tesi discusse presso l’Università di Pisa

Tesi etd-07072024-193713


Tipo di tesi
Tesi di laurea magistrale
Autore
PIERUCCI, MATTEO
Indirizzo email
m.pierucci5@studenti.unipi.it, matteo.pierucci4@gmail.com
URN
etd-07072024-193713
Titolo
A Novel Benchmark for Prompt-Guided Class-Agnostic Counting: Assessing Models' Understanding of Textual Prompts
Dipartimento
INGEGNERIA DELL'INFORMAZIONE
Corso di studi
ARTIFICIAL INTELLIGENCE AND DATA ENGINEERING
Relatori
relatore Prof. Avvenuti, Marco
relatore Dott. Falchi, Fabrizio
relatore Dott. Ciampi, Luca
relatore Dott. Messina, Nicola
Parole chiave
  • class-agnostic counting
  • computer vision
  • object counting
  • zero-shot counting
Data inizio appello
26/07/2024
Consultabilità
Completa
Riassunto
The computer vision task of object counting involves estimating the number of object instances within an image. Traditional class-specific object counting methods rely on regression models and density map estimation to count objects. However, these methods often require extensive datasets and fail to generalize across different object classes.

Recent research in object counting has increasingly focused on reducing the annotation problem in dataset creation. Therefore, class-agnostic object counting is a new task that involves training a network to count object instances of any class at test time, even if these classes differ from those seen during the training phase. Typically, these networks use images and density maps as training targets and may also employ single examples of the objects to be counted, named exemplars, as prototypes for counting at test time. Unlike traditional counting methods that rely on class-specific datasets, class-agnostic counting uses multi-class datasets to build a versatile model applicable to unseen categories with minimal additional data.

This work explores prompt-guided zero-shot counting, where textual prompts replace visual exemplars to guide the counting process. Despite the advancements in this field, numerous state-of-the-art models ignore textual information when estimating object count. To address this gap, we have developed a novel benchmark and metrics to assess models' understanding of textual prompts in the counting process. Additionally, we introduce a variation of an existing dataset, which includes cases where prompt-specified objects are absent in query images. Finally, we train a state-of-the-art counting model on this new dataset and test its performance on the novel benchmark.

Our experimental results demonstrate that the proposed model effectively understands textual prompts and accurately counts objects, even in challenging conditions where objects of interest are not present in the query image, creating a comparative baseline for future models. This work advances the field of prompt-guided zero-shot counting, offering insights into the capabilities and limitations of current models and providing a foundation for future research in this area.
File