logo SBA

ETD

Archivio digitale delle tesi discusse presso l’Università di Pisa

Tesi etd-10022025-173453


Tipo di tesi
Tesi di laurea magistrale
Autore
CASTAGNA, ANDREA
URN
etd-10022025-173453
Titolo
NP-Hardness of minimizing BWT runs for consecutive texts under collection permutations
Dipartimento
INFORMATICA
Corso di studi
INFORMATICA
Relatori
relatore Prof. Grossi, Roberto
relatore Dott.ssa Guerrini, Veronica
Parole chiave
  • Extended Burrows Wheeler Transform
  • NP-Hard
Data inizio appello
17/10/2025
Consultabilità
Completa
Riassunto
Investigation of the differences between real and theoretical compressors, focusing on how compressing portions of a collection (rather than the whole concatenated text) affects compressibility and how reordering data can improve it. We analyze approaches from the literature, in particular phylogenetic re-ordering: the fact that EBWT positional clustering can be used to construct phylogenetic trees motivates framing the problem in terms of BWT runs. We formalize the task as optimizing compression rate by minimizing BWT runs over consecutive texts under collection reordering; the number of consecutive texts spanned by a sliding window is an additional parameter that we call the window size k. We prove the problem is NP-hard: first for the simple case k = 2, and then for every fixed k>1 via a reduction from the Hamiltonian Path problem. Concretely, given a directed graph we construct a set of texts whose minimum total EBWT runs equals a value that depends only on the graph’s number of vertices and on k if and only if the original graph has a Hamiltonian path. This construction establishes the problem’s intractability under standard assumptions and motivates further analysis and the search for practical heuristic methods.
File