Tesi etd-05112025-191738

Tipo di tesi

Tesi di laurea magistrale

Autore

VERSARI, ALESSANDRO

URN

etd-05112025-191738

Titolo

Enhancing Software Supply Chain Security: Leveraging AI for Suspicious Code Detection in Open Source Repositories

Dipartimento

INGEGNERIA DELL'INFORMAZIONE

Corso di studi

ARTIFICIAL INTELLIGENCE AND DATA ENGINEERING

Relatori

relatore Lettieri, Giuseppe
relatore Galatolo, Federico Andrea

Parole chiave

Backdoor Detection
LLMs
Open Source
Security
Software Supply Chain
Zero-Shot Classification

Data inizio appello

27/05/2025

Consultabilità

Completa

Riassunto

The XZ backdoor incident, where malicious code was stealthily introduced into a widely used compression library, exposed the growing risks in the open source software supply chain and highlighted the enormous challenges faced by open source maintainers. These maintainers are often burdened with reviewing massive volumes of code contributions, which makes it difficult to manually catch subtle security threats such as backdoors. This thesis proposes to detect suspicious code in open source repositories using AI, helping maintainers by automating the identification of potentially harmful code.
This elaborate will concentrate mostly on Trojan horse attack, the technique used to obfuscate them, and how AI, specifically Large Language Models (LLMs), can be employed to detect those kind of attacks.
The study also explores the use of the Perplexity measure. Initial experiments suggested that Perplexity could effectively flag anomalies. However, the results also revealed its limitations, particularly in distinguishing between truly malicious behavior and unconventional, yet benign, coding styles.
Lastly, state-of-the-art models were evaluated for their ability to detect malicious patterns in codebases using zero-shot classification. Among them, the LLamaguard 3 model, despite its compact size, demonstrated remarkable capabilities, outperforming larger models. It achieved an accuracy of 98.04% and an F1-score of 91.73% on the evaluated dataset.

File

Nome file	Dimensione
thesis_versari.pdf	895.69 Kb
Contatta l’autore

ETD

Archivio digitale delle tesi discusse presso l’Università di Pisa

Tesi etd-05112025-191738