Tesi etd-06192024-135255

Tipo di tesi

Tesi di laurea magistrale

Autore

SALOGNI, ILARIA

URN

etd-06192024-135255

Titolo

From Classification to Quantification: New Methods for Estimating Disease Prevalence in Verbal Autopsies Datasets

Dipartimento

FILOLOGIA, LETTERATURA E LINGUISTICA

Corso di studi

INFORMATICA UMANISTICA

Relatori

relatore Prof. Sebastiani, Fabrizio
relatore Prof. Dell'Orletta, Felice
relatore Dott. Moreo Fernández, Alejandro
relatore Prof. Esuli, Andrea

Parole chiave

hierarchical dataframe
machine learning
NLP
prior estimation
quantification
text quantification
verbal autopsies

Data inizio appello

05/07/2024

Consultabilità

Completa

Riassunto

Verbal Autopsies are standardised textual questionnaires that gather information about the condition and symptoms experienced by the deceased in the days preceding a fatality, developed to address the need for fundamental registration of fatalities and their origins also for countries with weak death registration systems.
Therefore the aim is obtaining estimates of disease prevalence across different geographical regions, time periods, and age groups, and the focus shifts to aggregate data over individual data points, with the main goal to keep the prevalence estimation of each class accurate also when classes in the training data may be very different from the distribution in the future, unlabelled data.
Epidemiological objectives, such as those just mentioned, align more with the task of quantification than classification, understood as minimizing errors while accurately assigning the correct pathology to each death.
Quantification algorithms can also be used to predict the prevalence distribution of data with an intrinsic hierarchical nature, where data are organised in a tree-like structure, and this is the case of verbal autopsies.
This work, which centers on examining methods for estimating prevalence in mortality statistics datasets, aims to explore cause of death assignment from a new perspective, developing novel methodologies that leverage both flat and hierarchical quantification techniques and leveraging the textual portion of the verbal autopsies datasets, often overlooked in the previous works.
The work started from the knowledge of the fact that quantification methods could be better for prior-shift setting. Suprisingly, many among the supposedly more sophisticated quantification methods fail to improve over CC’s performance. Although interesting, we tested hierarchical quantification algorithms in various settings, but it did not show outstanding improvement with respect to the results obtained employing quantification algorithms that disregard the hierarchical nature of the labels.

File

Nome file	Dimensione
Salogni_...ation.pdf	1.14 Mb
Contatta l’autore

ETD

Archivio digitale delle tesi discusse presso l’Università di Pisa

Tesi etd-06192024-135255