Tesi di laurea magistrale
Radiomics and Machine Learning in Medical Image Analysis
Course of study
relatore Dott.ssa Retico, Alessandra
- medical imaging
- machine learning
Graduation session start date
Radiomics is an emerging field of research in the context of medical image analysis. It is based on the extraction and analysis of quantitative imaging features from medical images in order to exploit them for clinical decision support.
In daily clinical practice, medical images are typically only visually assessed by radiologists. In this way a lot of potential meaningful information, which is not appreciable by the human eye, is lost.
Radiomics aims to use this information to help clinicians in different tasks, such as making diagnosis, predicting prognosis and therapeutic response of patients.
Descriptive quantities extracted from images are defined as radiomic features.
Very recently, an additional type of radiomic approach, called dosiomic, has been introduced. Dosiomic features are extracted from the dose distribution delivered in a radiotherapy treatment.
Machine learning (ML) and deep learning (DL) algorithms are successfully applied in many different fields, due to their capability to make predictions without being explicitly programmed.
Generally, ML and DL techniques are widely employed in radiomics to build predictive models.
In this thesis we discuss the analysis workflow that goes from the radiomic features extraction to the development of predictive models. In particular,
we explore the applicability and robustness of ML methods when working with small datasets.
In fact, in the field of medical imaging, it is often not easy to collect large annotated datasets.
From the operative point of view, in this work, we considered the following three different clinical problems, in which we tried to address clinical questions using radiomics: the investigation of the predictive role of dosiomic and radiomic features for radiotherapy treatment outcome; the evaluation of the predictive power of radiomic features extracted from CT in tumor staging and histology prediction; the evaluation of the predictive power of radiomic features extracted from multiparametric MRI in tumor grade prediction;
Concerning the first task, we implemented the feature extraction step from dose distributions available in a dataset collected by pediatric Hospital Meyer and Radiotherapy Unit of University of Florence within the Artificial Intelligence in Medicine (AIM) INFN project. The dataset is composed by patients affected by medulloblastoma and treated with radiotherapy. Currently, the dataset consists of 55 subjects.
Regarding the second question, we build predictive models to classify histology and tumor staging using features extracted from thoracic CT of Non Small Cell Lung Cancer (NSCLC) patients.
For this task, a subset of 130 subjects from the public dataset Lung1 Maastro NSCLC, and a private dataset of 47 subjects collected in a collaboration between A.R.N.A.S. Ospedale Civico Di Cristina Benfratelli, Università degli Studi di Palermo and INFN Catania are considered. %The
To address the last question, we consider a publicly available dataset of 167 patients from The Cancer Imaging Archive (TCIA) affected by glioma, which is a central nervous system tumor. Starting from features extracted from
multi-parametric MRI within tumor heterogeneous sub-regions, we build predictive models aimed at distinguishing the two grades of glioma labeled as low grade glioma and high grade glioma.
Feature extraction, feature analysis and machine learning models have been developed and implemented in Python language.
In our workflow, dimensionality reduction algorithms, such as principal component analysis (PCA), linear discriminant analysis (LDA) and mutual information (MI), are introduced to prevent overfitting.
The classifiers considered are: Support Vector Machines (SVM), Random Forest, Adaboost, and Nearest Neighbors.
The hyper-parameter optimization of the algorithms is performed through an exhaustive search. Actually, the optimization process turned out to be unstable. Therefore, we proceed with the assessment of performances using a nested cross-validation.
The metric chosen to report results are the balanced accuracy and the area under receiver operating characteristic curve (AUC).
The best performances obtained regarding stage classification of NSCLC
are reached by nearest neighbors classifier: AUC=0.80+/-0.05 and balanced accuracy=0.70+/-0.16.
In histology classification of NSCLC the results obtained considering the Random Forest classifier is: AUC=0.60+/-0.07.
Despite the issues due to small datasets, in some cases we achieve encouraging results. In particular, in glioma grade classification, we obtained for the Random Forest classifier, AUC=0.94+/-0.03 and a balanced accuracy=0.82+/-0.03.
In those cases in which the classifiers achieved satisfactory performances, we developed a ranking system to highlight the most important features in the classification problem.
This point, which concerns the explainability of machine learning algorithms, is crucial for possible future translation of radiomic approaches into the clinical diagnostic pathway.
There are some hidden files because of the review of the procedures of theses' publication.