ETD

Archivio digitale delle tesi discusse presso l'Università di Pisa

Tesi etd-11142016-162840


Tipo di tesi
Tesi di laurea magistrale
Autore
CONTE, FEDERICO
URN
etd-11142016-162840
Titolo
Optimizing Gradient-Boosted Regression Tree Learning through load-balancing and cache-aware data layout
Dipartimento
INFORMATICA
Corso di studi
INFORMATICA E NETWORKING
Relatori
relatore Prof. Venturini, Rossano
relatore Dott. Lucchese, Claudio
controrelatore Prof. Brogi, Antonio
Parole chiave
  • Learning to Rank
  • Information Retrieval
  • Machine Learning
Data inizio appello
02/12/2016
Consultabilità
Completa
Riassunto
Learning-to-Rank (LtR) is the state-of-the-art methodology being used in modern Web Search Engines for devising effective document ranking functions. State-of-the-art algorithms are based on Gradient-Boosted Regression Trees (GBRT), and typically generate thousands of large trees by processing large training datasets. In this master thesis, we address efficiency issues of GBRT algorithms and we propose a new implementation named FASTFOREST. We introduce two major optimizations. First, we optimize load balancing of the proposed multi-thread algorithm thanks to a two-step reordering of the document features. Second, we propose cache-efficient representation of the training data and strategies aimed at reducing the cache miss ratio. Experiments show that FASTFOREST can achieve up to a 2.36 speedup.
File