Tipo di tesi
Tesi di laurea magistrale
Titolo
NP-Hardness of minimizing BWT runs for consecutive texts under collection permutations
Corso di studi
INFORMATICA
Parole chiave
- Extended Burrows Wheeler Transform
- NP-Hard
Data inizio appello
17/10/2025
Riassunto (Italiano)
Investigation of the differences between real and theoretical compressors, focusing on how compressing portions of a collection (rather than the whole concatenated text) affects compressibility and how reordering data can improve it. We analyze approaches from the literature, in particular phylogenetic re-ordering: the fact that EBWT positional clustering can be used to construct phylogenetic trees motivates framing the problem in terms of BWT runs. We formalize the task as optimizing compression rate by minimizing BWT runs over consecutive texts under collection reordering; the number of consecutive texts spanned by a sliding window is an additional parameter that we call the window size k. We prove the problem is NP-hard: first for the simple case k = 2, and then for every fixed k>1 via a reduction from the Hamiltonian Path problem. Concretely, given a directed graph we construct a set of texts whose minimum total EBWT runs equals a value that depends only on the graph’s number of vertices and on k if and only if the original graph has a Hamiltonian path. This construction establishes the problem’s intractability under standard assumptions and motivates further analysis and the search for practical heuristic methods.