Tesi etd-06012023-111629 |
Link copiato negli appunti
Tipo di tesi
Tesi di laurea magistrale
Autore
PARRAVANO, MICHELA
URN
etd-06012023-111629
Titolo
Development of an unsupervised machine learning algorithm for the computation of survival analysis from clinical databases
Dipartimento
INGEGNERIA DELL'INFORMAZIONE
Corso di studi
INGEGNERIA BIOMEDICA
Relatori
relatore Prof. Vozzi, Giovanni
relatore Prof. Positano, Vincenzo
relatore Ing. Meloni, Antonella
relatore Prof. Positano, Vincenzo
relatore Ing. Meloni, Antonella
Parole chiave
- hierarchical clustering
- MIOT
- principal components
- R software
- random forest
- survival analysis
- thalassemia
Data inizio appello
20/06/2023
Consultabilità
Non consultabile
Data di rilascio
20/06/2093
Riassunto
The analysis of clinical databases can allow to identify factors which increase clinical risk of specific diseases. Unsupervised machine learning approaches can be a solution to investigate unknown associations, in order to understand complex diseases, to elaborate tailored therapeutic pathways, and also to identify subjects for clinical trials. An unsupervised machine learning algorithm for the computation of survival analysis was developed in this work by using the MIOT (Myocardial Iron Overload in Thalassemia) database and R software.
The first step of the algorithm consists of the reduction of the problem’s dimensions using a principal component analysis, in particular a Multiple Factor Analysis (MFA) is performed to treat both continuous and categorical variables. The next step consists of a Hierarchical Clustering on Principal Components (HCPC) with the aim to determinate different phenogroups in the set of patients with Thalassemia Major (TM). Then, survival analysis on the phenogroups is computed. The survival analysis has the objective to define the risk of the occurrence of a cardiac disease related with each phenogroup. Identifying the main risk factors of cardiac complications in patients suffering from TM could allow for a better identification of high-risk patients and subsequently a better prevention, in order to reduce mortality in TM patients. Finally, the identification of phenogroups characterized by different classes of risk could allow to develop a classifier, which performs the classification of new TM patients within one of the classes of risk.
The first step of the algorithm consists of the reduction of the problem’s dimensions using a principal component analysis, in particular a Multiple Factor Analysis (MFA) is performed to treat both continuous and categorical variables. The next step consists of a Hierarchical Clustering on Principal Components (HCPC) with the aim to determinate different phenogroups in the set of patients with Thalassemia Major (TM). Then, survival analysis on the phenogroups is computed. The survival analysis has the objective to define the risk of the occurrence of a cardiac disease related with each phenogroup. Identifying the main risk factors of cardiac complications in patients suffering from TM could allow for a better identification of high-risk patients and subsequently a better prevention, in order to reduce mortality in TM patients. Finally, the identification of phenogroups characterized by different classes of risk could allow to develop a classifier, which performs the classification of new TM patients within one of the classes of risk.
File
Nome file | Dimensione |
---|---|
Tesi non consultabile. |