Tesi etd-04162025-113306

Tipo di tesi

Tesi di dottorato di ricerca

Autore

POPPI, SAMUELE

URN

etd-04162025-113306

Titolo

Responsible AI in Vision and Language: Ensuring Safety, Ethics, and Transparency in Modern Models

Settore scientifico disciplinare

IINF-05/A - Sistemi di elaborazione delle informazioni

Corso di studi

DOTTORATO NAZIONALE IN INTELLIGENZA ARTIFICIALE

Relatori

tutor Prof. Cucchiara, Rita
relatore Prof. Baraldi, Lorenzo

Parole chiave

AI Safety
GenAI
Large Language Models (LLMs)
Machine Unlearning
Model Interpretability
Multilingual Alignment
Multimodal AI
Responsible AI

Data inizio appello

16/05/2025

Consultabilità

Completa

Riassunto

This thesis examines how Responsible AI principles—safety, ethics, and transparency—can be effectively embedded into modern AI models. As large-scale systems like deep-fake generation and autonomous navigation grow increasingly pervasive, aligning these technologies with societal values, ethical standards, and user privacy becomes imperative. This research tackles these challenges through a series of interrelated contributions.

In the domain of deepfake detection and explainability, robust methods were developed using self-supervised models such as DINO to identify and classify synthetic images, including those generated by text-to-image diffusion models, even under adversarial conditions. By introducing visual explainability cues, this work enhanced user trust by identifying specific artifacts indicative of deepfake content.

For explainable navigation in embodied AI, a framework was designed to improve transparency in autonomous systems. By integrating a speaker policy and captioning module into a self-supervised exploration agent, the system generated natural language descriptions of its navigational context. The introduction of an explanation map metric ensured better alignment between visual attention and textual recounting, supporting human-robot collaboration.

In the area of machine unlearning, this thesis introduced a low-rank unlearning method to remove specific classes or examples from pre-trained models without requiring full access to the original dataset. This approach was extended to enable efficient, on-demand removal of multiple classes during inference, minimizing computational and storage demands while maintaining model effectiveness.

To address unsafe content in vision-and-language models, the research introduced Safe-CLIP, a fine-tuned version of CLIP, capable of filtering NSFW content. The development of ViSU, a dataset of safe and unsafe image-text pairs, supported this effort. Safe-CLIP redirected unsafe regions of the embedding space, achieving a balance between minimizing harmful outputs and retaining benign creative functionality.

Finally, the robustness of multilingual large language models (LLMs) in the context of safety was investigated. It was found that fine-tuning attacks in one language could compromise safety across all languages, revealing vulnerabilities in these models. To address this, the Safety Information Localization method identified safety-critical parameters, paving the way for more robust alignment practices.

Together, these contributions provide both theoretical insights and practical solutions to enhance the reliability, adaptability, and ethics of AI systems. By addressing challenges such as safer navigation, efficient unlearning, and robust NSFW filtering, this research advances the alignment of large-scale AI models with Responsible AI principles.

File

Nome file	Dimensione
Poppi_Ph..._13_1.pdf	30.38 Mb
Contatta l’autore

ETD

Archivio digitale delle tesi discusse presso l’Università di Pisa

Tesi etd-04162025-113306