Tesi etd-04062022-123529

Tipo di tesi

Tesi di laurea magistrale

Autore

BALDINI, CHIARA

URN

etd-04062022-123529

Titolo

Generating realistic depth images of preterm infants in NICUs to unravel pose estimation

Dipartimento

INGEGNERIA DELL'INFORMAZIONE

Corso di studi

INGEGNERIA BIOMEDICA

Relatori

relatore Prof. Micera, Silvestro
correlatore Ing. Moccia, Sara

Parole chiave

conditional generative adversarial networks
depth-image generation
pose estimation
preterm infants’ monitoring

Data inizio appello

22/04/2022

Consultabilità

Non consultabile

Data di rilascio

22/04/2092

Riassunto

The World Health Organization (WHO) defines all cases of babies born alive before 37 completed weeks of gestation as preterm births. Preterm infants are estimated to be 15 million every year, i.e. more than 1 in 10 babies. Since preterm birth is a significant risk factor in the occurrence of cerebral palsy (CP), preterm infants' movement monitoring is crucial to detect the onset of short- and long-term complications. The current gold-standard test is known as the General Movement Assessment (GMA), a non-obtrusive observation of the infant, calm and alert, by a neonatologist for 3-5 minutes. It is performed from birth to 20 weeks of age. However, this test is sensitive to human factors, only qualitative, and time-consuming.
Deep learning (DL) is gaining interest in the field of preterm infants' movement monitoring, as a powerful tool to support clinicians in promptly diagnosing disorders relevant to preterm birth. In order to translate DL algorithms into actual clinical practice, the hardest challenge to face is the lack of large annotated datasets.
In this thesis, a framework capable of generating realistic images of preterm infants in desired poses is proposed. The framework consists of two stages. The first stage provides a coarse generated image through a U-Net-like autoencoder starting from a condition image and a target image, while the second generates a refined result using a conditional Generative Adversarial Network (GAN). The Moving INfants In RGB-D (MINI-RGBD) and the babyPose dataset, both consisting of depth images acquired from RGB-D cameras in the actual clinical practice, were chosen for the qualitative and quantitative evaluations. When tested on the MINI-RGBD, the proposed framework showed good results in terms of conventional metrics, such as focused-on-body Structural SIMilarity (mask SSIM) = 0.998, Inception Score (IS) = 2.817 and Fréchet Inception Distance (FID) = 1.791. To compensate for the fact that the previously explored metrics are properly not thought for depth images, a pose estimation framework that covers two consecutive CNNs, respectively for detection and regression, was trained on 1600 generated images together with 400 real images and compared with the same process implemented exclusively on real images. The median Root Mean Square Error (RMSE) equal to 10.789 (right arm), 12.172 (left arm), 12.172 (right leg) and 12.599 (left leg) was used as a contextualized pose estimation metric. When tested on the babyPose dataset, the framework obtained lower performance (mask SSIM = 0.936, IS = 2.072 and FID = 2.539).
The MINI-RGBD generated images are therefore excellent from the point of view of quality and the good results of the pose estimation pave the way for their possible use to increase the dataset size. Due to the multiple challenges that the dataset presents (e.g., variability, occlusions, and highly variable pixel intensity levels), the results of the generative framework run on babyPose dataset encourage further research on this topic to improve image quality. Future work will also incorporate the generation of temporal sequences over still frames, as well as the development of a user interface that can be used by clinicians to draw custom poses.

File

Nome file	Dimensione
Tesi non consultabile. Contatta l’autore

ETD

Archivio digitale delle tesi discusse presso l’Università di Pisa

Tesi etd-04062022-123529