Tesi etd-02112026-174051 |
Link copiato negli appunti
Tipo di tesi
Tesi di laurea magistrale
Autore
CRISPINO, FRANCESCO PIO
URN
etd-02112026-174051
Titolo
Privacy-preserving human pose estimation from homothetic segmentation maps
Dipartimento
INGEGNERIA DELL'INFORMAZIONE
Corso di studi
ARTIFICIAL INTELLIGENCE AND DATA ENGINEERING
Relatori
relatore Dott. Pistolesi, Francesco
correlatore Ing. Mugnai, Matteo
correlatore Ing. Baldassini, Michele
correlatore Ing. Mugnai, Matteo
correlatore Ing. Baldassini, Michele
Parole chiave
- avatar normalization
- coco dataset
- computer vision
- de-identification
- deep learning
- densepose
- domain adaptation
- fine-tuning
- homothethic segmentation map
- kinematic reconstruction
- morphological standardization
- privacy preservation
- privacy preserving computer vision
- privacy-by-design
- privacy-preserving human pose estimation
- segmentation
Data inizio appello
27/02/2026
Consultabilità
Non consultabile
Data di rilascio
27/02/2066
Riassunto (Inglese)
Riassunto (Italiano)
This thesis addresses the critical challenge of Privacy-Preserving Human Pose Estimation (PPHPE) by introducing a novel people normalization algorithm that transforms human figures into standardized avatars while preserving the kinematic information necessary for accurate pose estimation. The work proposes a privacy-by-design architecture that allows HPE, action recognition, and similar tasks to be run on its output without privacy violations. The system has proven to be privacy-preserving, even when an attacker collects its output and attempts to run state-of-the-art (SOTA) re-identification models.
Human Pose Estimation has become fundamental in applications such as healthcare monitoring, fitness and rehabilitation, fall detection, human-robot collaboration, and activity recognition. The SOTA has evolved in different directions, with the main one being RGB-based approaches. These methods capture detailed visual information that reveals Sensitive Personal Information (SPI), including facial features, identity, ethnicity, and environmental context. While existing Privacy-Preserving HPE methods employ strategies such as input degradation (ultra-low resolution, silhouettes), learned anonymization (adversarial frameworks, GAN-based inpainting), non-visual sensing (thermal cameras, depth sensors, mmWave radar, WiFi CSI, LiDAR), or secure computation protocols, most research has focused on facial anonymization. The literature reveals a profound gap: individuals can be re-identified through soft biometrics such as gait, posture, body proportions, clothing, and contextual information. Robust, reversible, and utility-preserving frameworks for full-body anonymization remain critically underexplored.
This work introduces a comprehensive people normalization algorithm that operates through segment-wise geometric transformations based on DensePose annotations. The algorithm performs controlled morphological standardization to make subjects indistinguishable while maintaining anthropometric consistency, allowing for HPE even on the anonymized content. The sequence of action of the algorithm is: perspective-aware grouping, hybrid scaling of body segments, and hierarchical kinematic reconstruction.
The algorithm exploits DensePose dense surface correspondence, which maps every human pixel to a segmented body model with UV surface coordinates. The method merges the original 24 segments into a simplified 12-segment model, treating anterior and posterior surfaces as unified anatomical regions. This semantic regularization removes unimportant nuances and ensures consistent color-to-anatomy mapping regardless of subject orientation.
A perspective-aware clustering mechanism groups subjects based on spatial proximity and bounding box height similarity, ensuring that normalization preserves depth cues, essential for action recognition.
The hybrid scaling strategy is based on PCA. For each body segment, the algorithm extracts the principal eigenvectors (axes of elongation and thickness), allowing it to understand real segments’ dimensions rather than considering bounding boxes. Scaling factors blend multiple objectives to prevent unnatural distortions that would arise from area-based scaling alone.
The algorithm models the human body as a kinematic tree with a contact graph derived from dilation-based overlap detection. Contact points are computed through bidirectional mask dilation and overlap analysis, then mapped to the normalized image coordinates, ensuring that joint connectivity is maintained.
Since normalization alters body geometry, normalized keypoint calculation is necessary. Each keypoint is mapped to its enclosing DensePose segment using signed distance for robust assignment near boundaries. The keypoint is then converted to normalized UV coordinates within the segment's bounding box, decoupling it from absolute image space. These local coordinates are projected into the target segment geometry defined by the transformation plan, ensuring that semantic consistency is preserved.
To validate that high-accuracy HPE remains feasible on normalized data, five SOTA models were evaluated: HRNet-w32 (both ImageNet and COCO pre-trained variants), MediaPipe with MobileNetV3-Large backbone, YOLOv8s, and YOLO26x. Models were fine-tuned on both the original domain (natural segmented figures) and normalized domain (standardized avatars) using a subset of 1553 images from COCO val2014.
The fine-tuning protocol employed custom adaptation heads with stacked refinement architecture, weighted MSE loss to prioritize joint localization over black background, and progressive backbone unfreezing strategies (frozen, unfreeze 1/2/3 stages) with adaptive learning rates. Training was conducted over 100 epochs with AdamW optimization and cosine annealing.
Models achieved comparable performance between domains and with respect to SOTA results. Statistical analysis revealed that normalization does not inherently degrade HPE performance. The normalized domain represents a valid alternative feature space where certain architectures excel, validating that avatar rendering acts as an effective privacy filter preserving behavioral information while removing biometric identifiers.
This thesis establishes: (1) a robust full-body anonymization pipeline addressing re-identification beyond facial features, (2) comprehensive empirical evidence that privacy-by-design does not preclude high-accuracy pose estimation, (3) insights into optimal backbone unfreezing strategies for domain adaptation, and (4) empirical evidence that privacy is guaranteed even in catastrophic scenarios where an attacker attempts re-identification on anonymized images with SOTA methods. The work demonstrates that privacy-preserving HPE is achievable through morphological standardization, opening pathways for ethically compliant computer vision systems in surveillance, healthcare, and human-robot collaboration contexts where privacy regulations mandate data minimization principles.
Human Pose Estimation has become fundamental in applications such as healthcare monitoring, fitness and rehabilitation, fall detection, human-robot collaboration, and activity recognition. The SOTA has evolved in different directions, with the main one being RGB-based approaches. These methods capture detailed visual information that reveals Sensitive Personal Information (SPI), including facial features, identity, ethnicity, and environmental context. While existing Privacy-Preserving HPE methods employ strategies such as input degradation (ultra-low resolution, silhouettes), learned anonymization (adversarial frameworks, GAN-based inpainting), non-visual sensing (thermal cameras, depth sensors, mmWave radar, WiFi CSI, LiDAR), or secure computation protocols, most research has focused on facial anonymization. The literature reveals a profound gap: individuals can be re-identified through soft biometrics such as gait, posture, body proportions, clothing, and contextual information. Robust, reversible, and utility-preserving frameworks for full-body anonymization remain critically underexplored.
This work introduces a comprehensive people normalization algorithm that operates through segment-wise geometric transformations based on DensePose annotations. The algorithm performs controlled morphological standardization to make subjects indistinguishable while maintaining anthropometric consistency, allowing for HPE even on the anonymized content. The sequence of action of the algorithm is: perspective-aware grouping, hybrid scaling of body segments, and hierarchical kinematic reconstruction.
The algorithm exploits DensePose dense surface correspondence, which maps every human pixel to a segmented body model with UV surface coordinates. The method merges the original 24 segments into a simplified 12-segment model, treating anterior and posterior surfaces as unified anatomical regions. This semantic regularization removes unimportant nuances and ensures consistent color-to-anatomy mapping regardless of subject orientation.
A perspective-aware clustering mechanism groups subjects based on spatial proximity and bounding box height similarity, ensuring that normalization preserves depth cues, essential for action recognition.
The hybrid scaling strategy is based on PCA. For each body segment, the algorithm extracts the principal eigenvectors (axes of elongation and thickness), allowing it to understand real segments’ dimensions rather than considering bounding boxes. Scaling factors blend multiple objectives to prevent unnatural distortions that would arise from area-based scaling alone.
The algorithm models the human body as a kinematic tree with a contact graph derived from dilation-based overlap detection. Contact points are computed through bidirectional mask dilation and overlap analysis, then mapped to the normalized image coordinates, ensuring that joint connectivity is maintained.
Since normalization alters body geometry, normalized keypoint calculation is necessary. Each keypoint is mapped to its enclosing DensePose segment using signed distance for robust assignment near boundaries. The keypoint is then converted to normalized UV coordinates within the segment's bounding box, decoupling it from absolute image space. These local coordinates are projected into the target segment geometry defined by the transformation plan, ensuring that semantic consistency is preserved.
To validate that high-accuracy HPE remains feasible on normalized data, five SOTA models were evaluated: HRNet-w32 (both ImageNet and COCO pre-trained variants), MediaPipe with MobileNetV3-Large backbone, YOLOv8s, and YOLO26x. Models were fine-tuned on both the original domain (natural segmented figures) and normalized domain (standardized avatars) using a subset of 1553 images from COCO val2014.
The fine-tuning protocol employed custom adaptation heads with stacked refinement architecture, weighted MSE loss to prioritize joint localization over black background, and progressive backbone unfreezing strategies (frozen, unfreeze 1/2/3 stages) with adaptive learning rates. Training was conducted over 100 epochs with AdamW optimization and cosine annealing.
Models achieved comparable performance between domains and with respect to SOTA results. Statistical analysis revealed that normalization does not inherently degrade HPE performance. The normalized domain represents a valid alternative feature space where certain architectures excel, validating that avatar rendering acts as an effective privacy filter preserving behavioral information while removing biometric identifiers.
This thesis establishes: (1) a robust full-body anonymization pipeline addressing re-identification beyond facial features, (2) comprehensive empirical evidence that privacy-by-design does not preclude high-accuracy pose estimation, (3) insights into optimal backbone unfreezing strategies for domain adaptation, and (4) empirical evidence that privacy is guaranteed even in catastrophic scenarios where an attacker attempts re-identification on anonymized images with SOTA methods. The work demonstrates that privacy-preserving HPE is achievable through morphological standardization, opening pathways for ethically compliant computer vision systems in surveillance, healthcare, and human-robot collaboration contexts where privacy regulations mandate data minimization principles.
File
| Nome file | Dimensione |
|---|---|
La tesi non è consultabile. |
|