Tesi etd-04162025-113306 |
Link copiato negli appunti
Tipo di tesi
Tesi di dottorato di ricerca
Autore
POPPI, SAMUELE
URN
etd-04162025-113306
Titolo
Responsible AI in Vision and Language: Ensuring Safety, Ethics, and Transparency in Modern Models
Settore scientifico disciplinare
IINF-05/A - Sistemi di elaborazione delle informazioni
Corso di studi
DOTTORATO NAZIONALE IN INTELLIGENZA ARTIFICIALE
Relatori
tutor Prof. Cucchiara, Rita
relatore Prof. Baraldi, Lorenzo
relatore Prof. Baraldi, Lorenzo
Parole chiave
- AI Safety
- GenAI
- Large Language Models (LLMs)
- Machine Unlearning
- Model Interpretability
- Multilingual Alignment
- Multimodal AI
- Responsible AI
Data inizio appello
16/05/2025
Consultabilità
Completa
Riassunto
This thesis examines how Responsible AI principles—safety, ethics, and transparency—can be effectively embedded into modern AI models. As large-scale systems like deep-fake generation and autonomous navigation grow increasingly pervasive, aligning these technologies with societal values, ethical standards, and user privacy becomes imperative. This research tackles these challenges through a series of interrelated contributions.
In the domain of deepfake detection and explainability, robust methods were developed using self-supervised models such as DINO to identify and classify synthetic images, including those generated by text-to-image diffusion models, even under adversarial conditions. By introducing visual explainability cues, this work enhanced user trust by identifying specific artifacts indicative of deepfake content.
For explainable navigation in embodied AI, a framework was designed to improve transparency in autonomous systems. By integrating a speaker policy and captioning module into a self-supervised exploration agent, the system generated natural language descriptions of its navigational context. The introduction of an explanation map metric ensured better alignment between visual attention and textual recounting, supporting human-robot collaboration.
In the area of machine unlearning, this thesis introduced a low-rank unlearning method to remove specific classes or examples from pre-trained models without requiring full access to the original dataset. This approach was extended to enable efficient, on-demand removal of multiple classes during inference, minimizing computational and storage demands while maintaining model effectiveness.
To address unsafe content in vision-and-language models, the research introduced Safe-CLIP, a fine-tuned version of CLIP, capable of filtering NSFW content. The development of ViSU, a dataset of safe and unsafe image-text pairs, supported this effort. Safe-CLIP redirected unsafe regions of the embedding space, achieving a balance between minimizing harmful outputs and retaining benign creative functionality.
Finally, the robustness of multilingual large language models (LLMs) in the context of safety was investigated. It was found that fine-tuning attacks in one language could compromise safety across all languages, revealing vulnerabilities in these models. To address this, the Safety Information Localization method identified safety-critical parameters, paving the way for more robust alignment practices.
Together, these contributions provide both theoretical insights and practical solutions to enhance the reliability, adaptability, and ethics of AI systems. By addressing challenges such as safer navigation, efficient unlearning, and robust NSFW filtering, this research advances the alignment of large-scale AI models with Responsible AI principles.
In the domain of deepfake detection and explainability, robust methods were developed using self-supervised models such as DINO to identify and classify synthetic images, including those generated by text-to-image diffusion models, even under adversarial conditions. By introducing visual explainability cues, this work enhanced user trust by identifying specific artifacts indicative of deepfake content.
For explainable navigation in embodied AI, a framework was designed to improve transparency in autonomous systems. By integrating a speaker policy and captioning module into a self-supervised exploration agent, the system generated natural language descriptions of its navigational context. The introduction of an explanation map metric ensured better alignment between visual attention and textual recounting, supporting human-robot collaboration.
In the area of machine unlearning, this thesis introduced a low-rank unlearning method to remove specific classes or examples from pre-trained models without requiring full access to the original dataset. This approach was extended to enable efficient, on-demand removal of multiple classes during inference, minimizing computational and storage demands while maintaining model effectiveness.
To address unsafe content in vision-and-language models, the research introduced Safe-CLIP, a fine-tuned version of CLIP, capable of filtering NSFW content. The development of ViSU, a dataset of safe and unsafe image-text pairs, supported this effort. Safe-CLIP redirected unsafe regions of the embedding space, achieving a balance between minimizing harmful outputs and retaining benign creative functionality.
Finally, the robustness of multilingual large language models (LLMs) in the context of safety was investigated. It was found that fine-tuning attacks in one language could compromise safety across all languages, revealing vulnerabilities in these models. To address this, the Safety Information Localization method identified safety-critical parameters, paving the way for more robust alignment practices.
Together, these contributions provide both theoretical insights and practical solutions to enhance the reliability, adaptability, and ethics of AI systems. By addressing challenges such as safer navigation, efficient unlearning, and robust NSFW filtering, this research advances the alignment of large-scale AI models with Responsible AI principles.
File
Nome file | Dimensione |
---|---|
Poppi_Ph..._13_1.pdf | 30.38 Mb |
Contatta l’autore |