Tesi etd-06192024-135255 |
Link copiato negli appunti
Tipo di tesi
Tesi di laurea magistrale
Autore
SALOGNI, ILARIA
URN
etd-06192024-135255
Titolo
From Classification to Quantification: New Methods for Estimating Disease Prevalence in Verbal Autopsies Datasets
Dipartimento
FILOLOGIA, LETTERATURA E LINGUISTICA
Corso di studi
INFORMATICA UMANISTICA
Relatori
relatore Prof. Sebastiani, Fabrizio
relatore Prof. Dell'Orletta, Felice
relatore Dott. Moreo Fernández, Alejandro
relatore Prof. Esuli, Andrea
relatore Prof. Dell'Orletta, Felice
relatore Dott. Moreo Fernández, Alejandro
relatore Prof. Esuli, Andrea
Parole chiave
- hierarchical dataframe
- machine learning
- NLP
- prior estimation
- quantification
- text quantification
- verbal autopsies
Data inizio appello
05/07/2024
Consultabilità
Completa
Riassunto
Verbal Autopsies are standardised textual questionnaires that gather information about the condition and symptoms experienced by the deceased in the days preceding a fatality, developed to address the need for fundamental registration of fatalities and their origins also for countries with weak death registration systems.
Therefore the aim is obtaining estimates of disease prevalence across different geographical regions, time periods, and age groups, and the focus shifts to aggregate data over individual data points, with the main goal to keep the prevalence estimation of each class accurate also when classes in the training data may be very different from the distribution in the future, unlabelled data.
Epidemiological objectives, such as those just mentioned, align more with the task of quantification than classification, understood as minimizing errors while accurately assigning the correct pathology to each death.
Quantification algorithms can also be used to predict the prevalence distribution of data with an intrinsic hierarchical nature, where data are organised in a tree-like structure, and this is the case of verbal autopsies.
This work, which centers on examining methods for estimating prevalence in mortality statistics datasets, aims to explore cause of death assignment from a new perspective, developing novel methodologies that leverage both flat and hierarchical quantification techniques and leveraging the textual portion of the verbal autopsies datasets, often overlooked in the previous works.
The work started from the knowledge of the fact that quantification methods could be better for prior-shift setting. Suprisingly, many among the supposedly more sophisticated quantification methods fail to improve over CC’s performance. Although interesting, we tested hierarchical quantification algorithms in various settings, but it did not show outstanding improvement with respect to the results obtained employing quantification algorithms that disregard the hierarchical nature of the labels.
Therefore the aim is obtaining estimates of disease prevalence across different geographical regions, time periods, and age groups, and the focus shifts to aggregate data over individual data points, with the main goal to keep the prevalence estimation of each class accurate also when classes in the training data may be very different from the distribution in the future, unlabelled data.
Epidemiological objectives, such as those just mentioned, align more with the task of quantification than classification, understood as minimizing errors while accurately assigning the correct pathology to each death.
Quantification algorithms can also be used to predict the prevalence distribution of data with an intrinsic hierarchical nature, where data are organised in a tree-like structure, and this is the case of verbal autopsies.
This work, which centers on examining methods for estimating prevalence in mortality statistics datasets, aims to explore cause of death assignment from a new perspective, developing novel methodologies that leverage both flat and hierarchical quantification techniques and leveraging the textual portion of the verbal autopsies datasets, often overlooked in the previous works.
The work started from the knowledge of the fact that quantification methods could be better for prior-shift setting. Suprisingly, many among the supposedly more sophisticated quantification methods fail to improve over CC’s performance. Although interesting, we tested hierarchical quantification algorithms in various settings, but it did not show outstanding improvement with respect to the results obtained employing quantification algorithms that disregard the hierarchical nature of the labels.
File
Nome file | Dimensione |
---|---|
Salogni_...ation.pdf | 1.14 Mb |
Contatta l’autore |