Tesi etd-09262012-113229 |
Link copiato negli appunti
Tipo di tesi
Tesi di laurea magistrale
Autore
CORNOLTI, MARCO
URN
etd-09262012-113229
Titolo
A Framework to compare text annotators and its applications
Dipartimento
INFORMATICA
Corso di studi
INFORMATICA
Relatori
controrelatore Prof.ssa Bernasconi, Anna
relatore Prof. Ferragina, Paolo
relatore Prof. Ferragina, Paolo
Parole chiave
- benchmarking metrics
- information retrieval
- topic annotators
- topic retrieval
Data inizio appello
12/10/2012
Consultabilità
Completa
Riassunto
Text in human languages have a low logic structure and are inherently ambiguous. For this reason, the typical approach of Information Retrieval to text documents has been based on the Bag-of-words model, in which documents are analyzed only by the occurrence of terms, discarding any possible structure. But a recently developing line of research is devoted to adding structure to unstructured text, by recognizing the topics contained in a text and annotate them.
Topic annotators are systems that have the purpose of linking a natural language document to the topics that are relevant for describing the content of the document. This systems can be applied to many classic problems of Information Retrieval: the categorization of a document can be based on its topics; the clustering of a set of documents can be done using their topics to find similarities; for a search engine, it would be easier to find relevant pages if there was a way to know the topics that the query expresses and search for them in the cached web pages.
In this thesis, we present a formal framework that describe the problems related to topic retrieval, the algorithms that solve those problems, and the way they can be benchmarked.
Topic annotators are systems that have the purpose of linking a natural language document to the topics that are relevant for describing the content of the document. This systems can be applied to many classic problems of Information Retrieval: the categorization of a document can be based on its topics; the clustering of a set of documents can be done using their topics to find similarities; for a search engine, it would be easier to find relevant pages if there was a way to know the topics that the query expresses and search for them in the cached web pages.
In this thesis, we present a formal framework that describe the problems related to topic retrieval, the algorithms that solve those problems, and the way they can be benchmarked.
File
Nome file | Dimensione |
---|---|
tesi.pdf | 3.04 Mb |
Contatta l’autore |