logo SBA

ETD

Digital archive of theses discussed at the University of Pisa

 

Thesis etd-04062022-123529


Thesis type
Tesi di laurea magistrale
Author
BALDINI, CHIARA
URN
etd-04062022-123529
Thesis title
Generating realistic depth images of preterm infants in NICUs to unravel pose estimation
Department
INGEGNERIA DELL'INFORMAZIONE
Course of study
INGEGNERIA BIOMEDICA
Supervisors
relatore Prof. Micera, Silvestro
correlatore Ing. Moccia, Sara
Keywords
  • conditional generative adversarial networks
  • depth-image generation
  • pose estimation
  • preterm infants’ monitoring
Graduation session start date
22/04/2022
Availability
Withheld
Release date
22/04/2092
Summary
The World Health Organization (WHO) defines all cases of babies born alive before 37 completed weeks of gestation as preterm births. Preterm infants are estimated to be 15 million every year, i.e. more than 1 in 10 babies. Since preterm birth is a significant risk factor in the occurrence of cerebral palsy (CP), preterm infants' movement monitoring is crucial to detect the onset of short- and long-term complications. The current gold-standard test is known as the General Movement Assessment (GMA), a non-obtrusive observation of the infant, calm and alert, by a neonatologist for 3-5 minutes. It is performed from birth to 20 weeks of age. However, this test is sensitive to human factors, only qualitative, and time-consuming.
Deep learning (DL) is gaining interest in the field of preterm infants' movement monitoring, as a powerful tool to support clinicians in promptly diagnosing disorders relevant to preterm birth. In order to translate DL algorithms into actual clinical practice, the hardest challenge to face is the lack of large annotated datasets.
In this thesis, a framework capable of generating realistic images of preterm infants in desired poses is proposed. The framework consists of two stages. The first stage provides a coarse generated image through a U-Net-like autoencoder starting from a condition image and a target image, while the second generates a refined result using a conditional Generative Adversarial Network (GAN). The Moving INfants In RGB-D (MINI-RGBD) and the babyPose dataset, both consisting of depth images acquired from RGB-D cameras in the actual clinical practice, were chosen for the qualitative and quantitative evaluations. When tested on the MINI-RGBD, the proposed framework showed good results in terms of conventional metrics, such as focused-on-body Structural SIMilarity (mask SSIM) = 0.998, Inception Score (IS) = 2.817 and Fréchet Inception Distance (FID) = 1.791. To compensate for the fact that the previously explored metrics are properly not thought for depth images, a pose estimation framework that covers two consecutive CNNs, respectively for detection and regression, was trained on 1600 generated images together with 400 real images and compared with the same process implemented exclusively on real images. The median Root Mean Square Error (RMSE) equal to 10.789 (right arm), 12.172 (left arm), 12.172 (right leg) and 12.599 (left leg) was used as a contextualized pose estimation metric. When tested on the babyPose dataset, the framework obtained lower performance (mask SSIM = 0.936, IS = 2.072 and FID = 2.539).
The MINI-RGBD generated images are therefore excellent from the point of view of quality and the good results of the pose estimation pave the way for their possible use to increase the dataset size. Due to the multiple challenges that the dataset presents (e.g., variability, occlusions, and highly variable pixel intensity levels), the results of the generative framework run on babyPose dataset encourage further research on this topic to improve image quality. Future work will also incorporate the generation of temporal sequences over still frames, as well as the development of a user interface that can be used by clinicians to draw custom poses.
File