Tesi etd-02092026-223609 |
Link copiato negli appunti
Tipo di tesi
Tesi di laurea magistrale
Autore
ESPOSITO, PASQUALE
URN
etd-02092026-223609
Titolo
Data Structures for the Representation of Pangenomes
Dipartimento
INFORMATICA
Corso di studi
INFORMATICA
Relatori
relatore Prof.ssa Pisanti, Nadia
Parole chiave
- Bioinformatics
- Burrows-Wheeler Transform
- Computational Pangenomics
- Data Structures
- de Bruijn Graph
- Graph Indexing
- Variation Graph
Data inizio appello
27/02/2026
Consultabilità
Completa
Riassunto (Inglese)
Riassunto (Italiano)
The advent of Next-Generation Sequencing technologies has rendered the linear reference genome paradigm obsolete, as it is inherently limited by "reference bias" and incapable of representing global genetic diversity. The thesis analyzes the evolution of data structures for pangenomics, a model that redefines the genome as a dynamic network of shared variations rather than a static sequence.
The work examines the two main graph models: de Bruijn Graphs, with a focus on the Bifrost algorithm, and Variation Graphs, exploring reference-free algorithms such as Minigraph and PGGB. Additionally, the transition from the VCF format to the topological GFA format is discussed.
Crucial to this work is the analysis of indexing to make these data queryable. The thesis traces the path from the Positional Burrows-Wheeler Transform (PBWT) to the Graph BWT (GBWT), which resolves the limitations of linearity by indexing haplotypes as paths within the graph. Finally, the r-index is described as a scalable solution for the compression of highly repetitive pangenomes. These structures lay the computational foundations for future precision medicine.
The work examines the two main graph models: de Bruijn Graphs, with a focus on the Bifrost algorithm, and Variation Graphs, exploring reference-free algorithms such as Minigraph and PGGB. Additionally, the transition from the VCF format to the topological GFA format is discussed.
Crucial to this work is the analysis of indexing to make these data queryable. The thesis traces the path from the Positional Burrows-Wheeler Transform (PBWT) to the Graph BWT (GBWT), which resolves the limitations of linearity by indexing haplotypes as paths within the graph. Finally, the r-index is described as a scalable solution for the compression of highly repetitive pangenomes. These structures lay the computational foundations for future precision medicine.
File
| Nome file | Dimensione |
|---|---|
| tesi_PE.pdf | 2.22 Mb |
Contatta l’autore |
|