Tesi etd-09132025-175358

Tipo di tesi

Tesi di laurea magistrale

URN

etd-09132025-175358

Titolo

Development of deep learning based data augmentation in medical imaging classification

Dipartimento

INGEGNERIA DELL'INFORMAZIONE

Corso di studi

ARTIFICIAL INTELLIGENCE AND DATA ENGINEERING

Relatori

relatore Prof. Cimino, Mario Giovanni Cosimo Antonio
correlatore Dott. Parola, Marco

Parole chiave

AI
artificial intelligence
data augmentation
deep learning
diffusion models
GAN
generative adversarial networks
IA
image classification
medical imaging
oral squamous cell carcinoma
OSCC
ResNet
Stable Diffusion
synthetic data generation
Vision Transformer
ViT

Data inizio appello

02/10/2025

Consultabilità

Non consultabile

Data di rilascio

02/10/2028

Riassunto (Inglese)

Riassunto (Italiano)

The efficacy of Deep Learning for the early diagnosis of Oral Squamous Cell Carcinoma (OSCC) is critically constrained by the scarcity of large, annotated medical image datasets. This limitation hinders the development of robust and generalizable models, which are essential for improving patient outcomes. This thesis confronts this challenge by systematically investigating synthetic data augmentation as a strategy to enrich training data for OSCC classification.
To this end, we employed a suite of generative techniques, including Generative Adversarial Networks (StyleGAN3) and latent diffusion models (Stable Diffusion) with various conditioning mechanisms, to create synthetic oral lesion images. The impact of these augmentations was evaluated on two distinct architectures, a Convolutional Neural Network (ResNet-50) and a Vision Transformer (ViT), using two datasets: the newly introduced PhotoMOCI and the public KOCD.
Our findings reveal that the utility of synthetic augmentation is not universal, but is instead critically dependent on the classifier's architectural paradigm. For the ResNet-50, which possesses strong inductive biases, high-fidelity synthetic data provided a modest but consistent improvement in classification performance. Conversely, the Vision Transformer exhibited a notable performance degradation with the same synthetic data. This is not interpreted as a fundamental failure of the augmentation technique, but rather as an illustration of the ViT's well-documented data-hungry nature. The volume of synthetic data, while sufficient for the more data-efficient CNN, was likely insufficient for the ViT to learn past the subtle distributional shifts introduced by the generative models.
This thesis concludes that the success of generative data augmentation in this domain is a complex interplay between data quality, the scale of generation, and the inductive biases of the downstream model. Its primary contributions are threefold: (1) the introduction and public release of the PhotoMOCI dataset to foster reproducible research; (2) the establishment of a nuanced, architecture-aware benchmark for synthetic data in OSCC diagnostics; and (3) the finding that for CNNs, a bimodal diffusion model is an effective augmentation strategy, while for ViTs, the scale of augmentation is a critical, underexplored variable.

File

Nome file	Dimensione
La tesi non è consultabile. Contatta l’autore

ETD

Archivio digitale delle tesi discusse presso l’Università di Pisa

Tesi etd-09132025-175358