Tesi etd-01212025-173309 |
Link copiato negli appunti
Tipo di tesi
Tesi di laurea magistrale
Autore
KAROUI, HAMZA
URN
etd-01212025-173309
Titolo
On the Design of a Code-to-Code Search Engine
Dipartimento
INFORMATICA
Corso di studi
INFORMATICA
Relatori
relatore Prof. Ferragina, Paolo
Parole chiave
- code embedding
- code search
- code snippets
- HuggingFace
- machine learning
- search engine
- semantic similarity
- Software Heritage
- syntactic similarity
- University of Bologna
Data inizio appello
28/02/2025
Consultabilità
Non consultabile
Data di rilascio
28/02/2028
Riassunto
This thesis focuses on the task of code search, aiming to address the challenge of locating all (or only the top-k) code snippets within a repository that are syntactically or semantically similar to a given query. Traditional code search methods often rely on syntactic matching, which is useful but limited in capturing deeper semantic relationships between code fragments.
The goal is to develop a search engine that leverages machine learning models to tackle the similarity search problem. The benefits of this implementation can significantly improve developers' productivity by reducing the time and effort needed to find similar code snippets. More accurate and relevant results allow programmers to identify reusable components and integrate solutions into their projects. This also enhances code quality, as developers can learn from well-structured, tested, and documented snippets.
This work is part of a larger project involving three key partners: HuggingFace, Software Heritage, and the University of Bologna, aiming to design an advanced code-to-code search engine based on code embedding techniques capable of efficiently retrieving similar code snippets from various repositories and programming languages.
The goal is to develop a search engine that leverages machine learning models to tackle the similarity search problem. The benefits of this implementation can significantly improve developers' productivity by reducing the time and effort needed to find similar code snippets. More accurate and relevant results allow programmers to identify reusable components and integrate solutions into their projects. This also enhances code quality, as developers can learn from well-structured, tested, and documented snippets.
This work is part of a larger project involving three key partners: HuggingFace, Software Heritage, and the University of Bologna, aiming to design an advanced code-to-code search engine based on code embedding techniques capable of efficiently retrieving similar code snippets from various repositories and programming languages.
File
Nome file | Dimensione |
---|---|
La tesi non è consultabile. |