Tipo di tesi
Tesi di laurea magistrale
Titolo
Optimizing Gradient-Boosted Regression Tree Learning through load-balancing and cache-aware data layout
Corso di studi
INFORMATICA E NETWORKING
Parole chiave
- Information Retrieval
- Learning to Rank
- Machine Learning
Data inizio appello
02/12/2016
Riassunto (Italiano)
Learning-to-Rank (LtR) is the state-of-the-art methodology being used in modern Web Search Engines for devising effective document ranking functions. State-of-the-art algorithms are based on Gradient-Boosted Regression Trees (GBRT), and typically generate thousands of large trees by processing large training datasets. In this master thesis, we address efficiency issues of GBRT algorithms and we propose a new implementation named FASTFOREST. We introduce two major optimizations. First, we optimize load balancing of the proposed multi-thread algorithm thanks to a two-step reordering of the document features. Second, we propose cache-efficient representation of the training data and strategies aimed at reducing the cache miss ratio. Experiments show that FASTFOREST can achieve up to a 2.36 speedup.