Tesi etd-06012023-111629

Tipo di tesi

Tesi di laurea magistrale

Autore

PARRAVANO, MICHELA

URN

etd-06012023-111629

Titolo

Development of an unsupervised machine learning algorithm for the computation of survival analysis from clinical databases

Dipartimento

INGEGNERIA DELL'INFORMAZIONE

Corso di studi

INGEGNERIA BIOMEDICA

Relatori

relatore Prof. Vozzi, Giovanni
relatore Prof. Positano, Vincenzo
relatore Ing. Meloni, Antonella

Parole chiave

hierarchical clustering
MIOT
principal components
R software
random forest
survival analysis
thalassemia

Data inizio appello

20/06/2023

Consultabilità

Non consultabile

Data di rilascio

20/06/2093

Riassunto

The analysis of clinical databases can allow to identify factors which increase clinical risk of specific diseases. Unsupervised machine learning approaches can be a solution to investigate unknown associations, in order to understand complex diseases, to elaborate tailored therapeutic pathways, and also to identify subjects for clinical trials. An unsupervised machine learning algorithm for the computation of survival analysis was developed in this work by using the MIOT (Myocardial Iron Overload in Thalassemia) database and R software.
The first step of the algorithm consists of the reduction of the problem’s dimensions using a principal component analysis, in particular a Multiple Factor Analysis (MFA) is performed to treat both continuous and categorical variables. The next step consists of a Hierarchical Clustering on Principal Components (HCPC) with the aim to determinate different phenogroups in the set of patients with Thalassemia Major (TM). Then, survival analysis on the phenogroups is computed. The survival analysis has the objective to define the risk of the occurrence of a cardiac disease related with each phenogroup. Identifying the main risk factors of cardiac complications in patients suffering from TM could allow for a better identification of high-risk patients and subsequently a better prevention, in order to reduce mortality in TM patients. Finally, the identification of phenogroups characterized by different classes of risk could allow to develop a classifier, which performs the classification of new TM patients within one of the classes of risk.

File

Nome file	Dimensione
Tesi non consultabile. Contatta l’autore

ETD

Archivio digitale delle tesi discusse presso l’Università di Pisa

Tesi etd-06012023-111629