ETD system

Electronic theses and dissertations repository

 

Tesi etd-09262012-113229


Thesis type
Tesi di laurea magistrale
Author
CORNOLTI, MARCO
URN
etd-09262012-113229
Title
A Framework to compare text annotators and its applications
Struttura
INFORMATICA
Corso di studi
INFORMATICA
Commissione
controrelatore Prof.ssa Bernasconi, Anna
relatore Prof. Ferragina, Paolo
Parole chiave
  • topic annotators
  • benchmarking metrics
  • topic retrieval
  • information retrieval
Data inizio appello
12/10/2012;
Consultabilità
completa
Riassunto analitico
Text in human languages have a low logic structure and are inherently ambiguous. For this reason, the typical approach of Information Retrieval to text documents has been based on the Bag-of-words model, in which documents are analyzed only by the occurrence of terms, discarding any possible structure. But a recently developing line of research is devoted to adding structure to unstructured text, by recognizing the topics contained in a text and annotate them.<br><br>Topic annotators are systems that have the purpose of linking a natural language document to the topics that are relevant for describing the content of the document. This systems can be applied to many classic problems of Information Retrieval: the categorization of a document can be based on its topics; the clustering of a set of documents can be done using their topics to find similarities; for a search engine, it would be easier to find relevant pages if there was a way to know the topics that the query expresses and search for them in the cached web pages.<br><br>In this thesis, we present a formal framework that describe the problems related to topic retrieval, the algorithms that solve those problems, and the way they can be benchmarked.
File