logo SBA

ETD

Archivio digitale delle tesi discusse presso l’Università di Pisa

Tesi etd-09202016-105923


Tipo di tesi
Tesi di laurea magistrale
Autore
PORCIANI, ELIA
URN
etd-09202016-105923
Titolo
Improving Top-K document retrieval algorithms using dynamic programming
Dipartimento
INFORMATICA
Corso di studi
INFORMATICA
Relatori
relatore Venturini, Rossano
Parole chiave
  • Block max wand
  • Compression algorithms
  • Dynamic Programming
  • Early termination
  • Interpolative
  • Top-K Retrieval
Data inizio appello
07/10/2016
Consultabilità
Completa
Riassunto
This Thesis aims to improve state-of-the-art algorithms for Top-K document retrieval problem. This problem asks for finding the most relevant k documents for a given query in collection of Web Pages.
After implementing and testing several state of the art algorithms for Top-K retrieval, we focus on Block-Max-Wand that is the best known solution for this problem. Block-Max-Wand is based on the idea of splitting posting lists into fixed-size blocks and approximating the scores with the maximum value of the block which they belong to. This improves query time but increases space usage.
In this Thesis we propose to use variable-size blocks in the above algorithm.
Indeed, we observe that the algorithm is faster if the scores of the postings are better approximated by their block maximum value. The problem of finding the best partitioning is an optimization problem where, fixing the number of partitions, we want to minimize the average distance by each score to its block-maximum value. The solution we suggest uses dynamic programming
in order to find an approximation of the optimal partitioning in linear time.
Experiments show that our solution improves the performance by 1.8x than the algorithm with fixed-size blocks over the standard datasets.
Finally, we introduce a compression approach able to reduce the space usage of Block-Max-Wand additional data by 57%, worsening the performance by 10% in the worst case.
File