Tesi etd-09262012-113229

Tipo di tesi

Tesi di laurea magistrale

URN

etd-09262012-113229

Titolo

A Framework to compare text annotators and its applications

Dipartimento

INFORMATICA

Corso di studi

INFORMATICA

Relatori

.

controrelatore Prof.ssa Bernasconi, Anna
relatore Prof. Ferragina, Paolo

Parole chiave

benchmarking metrics
information retrieval
topic annotators
topic retrieval

Data inizio appello

12/10/2012

Consultabilità

Completa

Riassunto (Inglese)

Riassunto (Italiano)

Text in human languages have a low logic structure and are inherently ambiguous. For this reason, the typical approach of Information Retrieval to text documents has been based on the Bag-of-words model, in which documents are analyzed only by the occurrence of terms, discarding any possible structure. But a recently developing line of research is devoted to adding structure to unstructured text, by recognizing the topics contained in a text and annotate them.

Topic annotators are systems that have the purpose of linking a natural language document to the topics that are relevant for describing the content of the document. This systems can be applied to many classic problems of Information Retrieval: the categorization of a document can be based on its topics; the clustering of a set of documents can be done using their topics to find similarities; for a search engine, it would be easier to find relevant pages if there was a way to know the topics that the query expresses and search for them in the cached web pages.

In this thesis, we present a formal framework that describe the problems related to topic retrieval, the algorithms that solve those problems, and the way they can be benchmarked.

File

Nome file	Dimensione
tesi.pdf	3.04 Mb
Contatta l’autore

ETD

Archivio digitale delle tesi discusse presso l’Università di Pisa

Tesi etd-09262012-113229