Nowadays software plays a central role in our era and source code is a particular kind of information produced in incredibly large amounts. The sheer amount of existing source code leads to a situation where most code to be written by a developer either has already been written elsewhere or, at least, is similar to some existing code. Recently, the Software Heritage, an ambitious initiative launched in 2015 by INRIA and supported by prestigious sponsors such as Google, Microsoft, GitHub, and the universities of Bologna and Pisa, is collecting all the publicly available software with the purpose of its preservation, since it is part of our cultural heritage. At the time of writing, Software Heritage is the world’s largest archive of source code with more than 16 billion source files, and over 3 billion commits coming from more than 250 million projects. Although Software Heritage is extremely clever in storing source code, it lacks a method for searching its enormous collection. This last challenge motivated the development of this thesis, in which we propose a novel method for efficiently indexing and effectively solving queries on large repositories of Java code. The proposed solution has been tested on a well-known benchmark achieving results comparable with the state-of-the-art while maintaining a fast query time and a low memory consumption.