Tesi etd-07072024-193713

Tipo di tesi

Tesi di laurea magistrale

Autore

PIERUCCI, MATTEO

Indirizzo email

m.pierucci5@studenti.unipi.it, matteo.pierucci4@gmail.com

URN

etd-07072024-193713

Titolo

A Novel Benchmark for Prompt-Guided Class-Agnostic Counting: Assessing Models' Understanding of Textual Prompts

Dipartimento

INGEGNERIA DELL'INFORMAZIONE

Corso di studi

ARTIFICIAL INTELLIGENCE AND DATA ENGINEERING

Relatori

relatore Prof. Avvenuti, Marco
relatore Dott. Falchi, Fabrizio
relatore Dott. Ciampi, Luca
relatore Dott. Messina, Nicola

Parole chiave

class-agnostic counting
computer vision
object counting
zero-shot counting

Data inizio appello

26/07/2024

Consultabilità

Completa

Riassunto

The computer vision task of object counting involves estimating the number of object instances within an image. Traditional class-specific object counting methods rely on regression models and density map estimation to count objects. However, these methods often require extensive datasets and fail to generalize across different object classes.

Recent research in object counting has increasingly focused on reducing the annotation problem in dataset creation. Therefore, class-agnostic object counting is a new task that involves training a network to count object instances of any class at test time, even if these classes differ from those seen during the training phase. Typically, these networks use images and density maps as training targets and may also employ single examples of the objects to be counted, named exemplars, as prototypes for counting at test time. Unlike traditional counting methods that rely on class-specific datasets, class-agnostic counting uses multi-class datasets to build a versatile model applicable to unseen categories with minimal additional data.

This work explores prompt-guided zero-shot counting, where textual prompts replace visual exemplars to guide the counting process. Despite the advancements in this field, numerous state-of-the-art models ignore textual information when estimating object count. To address this gap, we have developed a novel benchmark and metrics to assess models' understanding of textual prompts in the counting process. Additionally, we introduce a variation of an existing dataset, which includes cases where prompt-specified objects are absent in query images. Finally, we train a state-of-the-art counting model on this new dataset and test its performance on the novel benchmark.

Our experimental results demonstrate that the proposed model effectively understands textual prompts and accurately counts objects, even in challenging conditions where objects of interest are not present in the query image, creating a comparative baseline for future models. This work advances the field of prompt-guided zero-shot counting, offering insights into the capabilities and limitations of current models and providing a foundation for future research in this area.

File

Nome file	Dimensione
MasterTh...atteo.pdf	9.17 Mb
Contatta l’autore

ETD

Archivio digitale delle tesi discusse presso l’Università di Pisa

Tesi etd-07072024-193713