ETD

Archivio digitale delle tesi discusse presso l'Università di Pisa

Tesi etd-02212020-144245


Tipo di tesi
Tesi di dottorato di ricerca
Autore
TRANI, ROBERTO
URN
etd-02212020-144245
Titolo
Efficiency-Effectiveness Trade-offs in Modern Query Processing
Settore scientifico disciplinare
INF/01
Corso di studi
INFORMATICA
Relatori
tutor Prof. Venturini, Rossano
relatore Dott. Nardini, Franco Maria
Parole chiave
  • tournament graph
  • top-1 retrieval
  • search results filtering
  • relevance-aware filtering
  • query reformulation
  • query expansion
  • efficiency-effectiveness trade-offs
Data inizio appello
04/03/2020
Consultabilità
Completa
Riassunto
Modern search engines face enormous performance challenges. The most popular ones process tens of thousands of queries per second and manage billions of documents. Besides being efficient in processing the queries, they have also to be effective in satisfying the information needs of the users. However, techniques that improve the quality of the results can also require longer processing time. Search engines have thus often to attain a compromise between effectiveness and efficiency when employing new query processing solutions. In this thesis, we propose three new solutions for improving the efficiency-effectiveness trade-offs in several different tasks of modern query processors, namely: query expansion, search results filtering, and top-1 retrieval.
Query expansion is the task of augmenting the user query with additional terms so to overcome some of the difficulties arising from the natural language, such as synonymy and polysemy. To this regard, we propose a thesaurus-based query expansion framework relying on structured Conjunctive Normal Form queries. We also propose three novel term selection supervised models capturing efficiency and effectiveness of the expansion candidates to include into the expanded query.
Search results filtering refers to the task of discarding some search results from an attribute-sorted list, e.g., by recency or by price, to improve the effectiveness of the returned results while preserving the ordering. To this end, we propose one efficient optimal algorithm and one faster approximate algorithm with strong approximation guarantees, which enable filtering in scenarios with tight time constraints.
Top-1 retrieval regards the identification of the utmost relevant result. It is critical in many applications, such as conversational AI and question answering, returning only one utmost relevant result to each user query. To this regard, we propose an efficient algorithm for finding the result winning the highest number of pairwise comparisons, given by a pairwise machine learning classifier, while performing the minimum number of comparisons needed to solve this problem.
In this thesis, we show that by trading efficiency for effectiveness it is possible to achieve a huge improvement of efficiency on these tasks at the cost of a negligible loss of effectiveness. We also demonstrate that new efficient algorithms can enable effective yet inefficient techniques to be efficiently used, thus providing new appealing solutions in the efficiency-effectiveness trade-off space.
File