logo SBA

ETD

Archivio digitale delle tesi discusse presso l’Università di Pisa

Tesi etd-09132025-175358


Tipo di tesi
Tesi di laurea magistrale
Autore
NOCELLA, FRANCESCO
URN
etd-09132025-175358
Titolo
Development of deep learning based data augmentation in medical imaging classification
Dipartimento
INGEGNERIA DELL'INFORMAZIONE
Corso di studi
ARTIFICIAL INTELLIGENCE AND DATA ENGINEERING
Relatori
relatore Prof. Cimino, Mario Giovanni Cosimo Antonio
correlatore Dott. Parola, Marco
Parole chiave
  • AI
  • artificial intelligence
  • data augmentation
  • deep learning
  • diffusion models
  • GAN
  • generative adversarial networks
  • IA
  • image classification
  • medical imaging
  • oral squamous cell carcinoma
  • OSCC
  • ResNet
  • Stable Diffusion
  • synthetic data generation
  • Vision Transformer
  • ViT
Data inizio appello
02/10/2025
Consultabilità
Non consultabile
Data di rilascio
02/10/2028
Riassunto
The efficacy of Deep Learning for the early diagnosis of Oral Squamous Cell Carcinoma (OSCC) is critically constrained by the scarcity of large, annotated medical image datasets. This limitation hinders the development of robust and generalizable models, which are essential for improving patient outcomes. This thesis confronts this challenge by systematically investigating synthetic data augmentation as a strategy to enrich training data for OSCC classification.
To this end, we employed a suite of generative techniques, including Generative Adversarial Networks (StyleGAN3) and latent diffusion models (Stable Diffusion) with various conditioning mechanisms, to create synthetic oral lesion images. The impact of these augmentations was evaluated on two distinct architectures, a Convolutional Neural Network (ResNet-50) and a Vision Transformer (ViT), using two datasets: the newly introduced PhotoMOCI and the public KOCD.
Our findings reveal that the utility of synthetic augmentation is not universal, but is instead critically dependent on the classifier's architectural paradigm. For the ResNet-50, which possesses strong inductive biases, high-fidelity synthetic data provided a modest but consistent improvement in classification performance. Conversely, the Vision Transformer exhibited a notable performance degradation with the same synthetic data. This is not interpreted as a fundamental failure of the augmentation technique, but rather as an illustration of the ViT's well-documented data-hungry nature. The volume of synthetic data, while sufficient for the more data-efficient CNN, was likely insufficient for the ViT to learn past the subtle distributional shifts introduced by the generative models.
This thesis concludes that the success of generative data augmentation in this domain is a complex interplay between data quality, the scale of generation, and the inductive biases of the downstream model. Its primary contributions are threefold: (1) the introduction and public release of the PhotoMOCI dataset to foster reproducible research; (2) the establishment of a nuanced, architecture-aware benchmark for synthetic data in OSCC diagnostics; and (3) the finding that for CNNs, a bimodal diffusion model is an effective augmentation strategy, while for ViTs, the scale of augmentation is a critical, underexplored variable.
File