Tesi etd-09182023-153912

Tipo di tesi

Tesi di laurea magistrale

Autore

LARI, FILIPPO

URN

etd-09182023-153912

Titolo

A Search Engine For Source Code

Dipartimento

INFORMATICA

Corso di studi

INFORMATICA

Relatori

relatore Prof. Ferragina, Paolo

Parole chiave

clone search
code search
locality-sensitive hashing
minhash

Data inizio appello

06/10/2023

Consultabilità

Non consultabile

Data di rilascio

06/10/2026

Riassunto

Nowadays software plays a central role in our era and source code is a particular kind of information produced in incredibly large amounts. The sheer amount of existing source code leads to a situation where most code to be written by a developer either has already been written elsewhere or, at least, is similar to some existing code. Recently, the Software Heritage, an ambitious initiative launched in 2015 by INRIA and supported by prestigious sponsors such as Google, Microsoft, GitHub, and the universities of Bologna and Pisa, is collecting all the publicly available software with the purpose of its preservation, since it is part of our cultural heritage. At the time of writing, Software Heritage is the world’s largest archive of source code with more than 16 billion source files, and over 3 billion commits coming from more than 250 million projects. Although Software Heritage is extremely clever in storing source code, it lacks a method for searching its enormous collection. This last challenge motivated the development of this thesis, in which we propose a novel method for efficiently indexing and effectively solving queries on large repositories of Java code. The proposed solution has been tested on a well-known benchmark achieving results comparable with the state-of-the-art while maintaining a fast query time and a low memory consumption.

File

Nome file	Dimensione
La tesi non è consultabile. Contatta l’autore

ETD

Archivio digitale delle tesi discusse presso l’Università di Pisa

Tesi etd-09182023-153912