Tesi etd-02072024-161543 |
Link copiato negli appunti
Tipo di tesi
Tesi di laurea magistrale
Autore
VOLPI, LORENZO
URN
etd-02072024-161543
Titolo
Predicting Classifier Accuracy under Prior Probability Shift
Dipartimento
INFORMATICA
Corso di studi
INFORMATICA
Relatori
relatore Esuli, Andrea
relatore Moreo Fernández, Alejandro
relatore Sebastiani, Fabrizio
relatore Moreo Fernández, Alejandro
relatore Sebastiani, Fabrizio
Parole chiave
- class prior estimation
- classifier accuracy prediction
- dataset shift
- learning to quantify
- machine learning
- prior probability shift
- quantification
Data inizio appello
23/02/2024
Consultabilità
Completa
Riassunto
Predicting the accuracy that a classifier will have on unseen data
(i.e., on unlabelled data that were not available at training time)
can be done via k-fold cross-validation (kFCV). However, using kFCV returns reliable predictions only when the training data and the unseen data are identically and independently distributed (IID), i.e., were randomly sampled from the same distribution. Unfortunately, in real-world applications it is often the case that the training data and the unseen data are not IID, i.e., that we want to deploy the trained model on unseen data that exhibit some kind of dataset shift with respect to the training data. In this work we deal with the problem of predicting classifier accuracy on unseen data characterised by prior probability shift (PPS), an important type of dataset shift. We propose a class of methods built on top of quantification algorithms robust to PPS, i.e., algorithms devised for estimating the prevalence values of the classes in unseen data characterised by PPS. The methods we propose are based on the idea of viewing the cells of the contingency table (on which classifier accuracy is computed) as classes. We perform systematic experiments in which we test the prediction accuracy of our methods against state-of-the-art classifier accuracy prediction methods from the machine learning literature.
(i.e., on unlabelled data that were not available at training time)
can be done via k-fold cross-validation (kFCV). However, using kFCV returns reliable predictions only when the training data and the unseen data are identically and independently distributed (IID), i.e., were randomly sampled from the same distribution. Unfortunately, in real-world applications it is often the case that the training data and the unseen data are not IID, i.e., that we want to deploy the trained model on unseen data that exhibit some kind of dataset shift with respect to the training data. In this work we deal with the problem of predicting classifier accuracy on unseen data characterised by prior probability shift (PPS), an important type of dataset shift. We propose a class of methods built on top of quantification algorithms robust to PPS, i.e., algorithms devised for estimating the prevalence values of the classes in unseen data characterised by PPS. The methods we propose are based on the idea of viewing the cells of the contingency table (on which classifier accuracy is computed) as classes. We perform systematic experiments in which we test the prediction accuracy of our methods against state-of-the-art classifier accuracy prediction methods from the machine learning literature.
File
Nome file | Dimensione |
---|---|
tesi_Lor...22024.pdf | 3.42 Mb |
Contatta l’autore |