logo SBA

ETD

Archivio digitale delle tesi discusse presso l’Università di Pisa

Tesi etd-09262012-113229


Tipo di tesi
Tesi di laurea magistrale
Autore
CORNOLTI, MARCO
URN
etd-09262012-113229
Titolo
A Framework to compare text annotators and its applications
Dipartimento
INFORMATICA
Corso di studi
INFORMATICA
Relatori
controrelatore Prof.ssa Bernasconi, Anna
relatore Prof. Ferragina, Paolo
Parole chiave
  • benchmarking metrics
  • information retrieval
  • topic annotators
  • topic retrieval
Data inizio appello
12/10/2012
Consultabilità
Completa
Riassunto
Text in human languages have a low logic structure and are inherently ambiguous. For this reason, the typical approach of Information Retrieval to text documents has been based on the Bag-of-words model, in which documents are analyzed only by the occurrence of terms, discarding any possible structure. But a recently developing line of research is devoted to adding structure to unstructured text, by recognizing the topics contained in a text and annotate them.

Topic annotators are systems that have the purpose of linking a natural language document to the topics that are relevant for describing the content of the document. This systems can be applied to many classic problems of Information Retrieval: the categorization of a document can be based on its topics; the clustering of a set of documents can be done using their topics to find similarities; for a search engine, it would be easier to find relevant pages if there was a way to know the topics that the query expresses and search for them in the cached web pages.

In this thesis, we present a formal framework that describe the problems related to topic retrieval, the algorithms that solve those problems, and the way they can be benchmarked.
File