Tesi etd-01252020-182713 |
Link copiato negli appunti
Tipo di tesi
Tesi di laurea magistrale
Autore
SCHIRINZI, SIMONE
URN
etd-01252020-182713
Titolo
On-line Elastic Degenerate Strings q-gram distance with Suffix Trees
Dipartimento
INFORMATICA
Corso di studi
INFORMATICA
Relatori
relatore Prof.ssa Pisanti, Nadia
Parole chiave
- algorithm
- bioinformatica
- bioinformatics
- ED-string
- elastic degenerate
- genoma
- genome
- pangenome
- q-gram distance
- strings
- suffix tree
Data inizio appello
06/03/2020
Consultabilità
Tesi non consultabile
Riassunto
An elastic-degenerate (ED) string is a sequence of n sets of strings of total length N, which was recently proposed to model a set of p genomes of average length w. ED-strings find an interesting application in the representation of the genome of a population: a pan-genome. In this view, comparing two ED-strings is a way to discover similarities and differences between distinct (sub)populations.
In this thesis we study the ED-strings distance problem based on the so-called q-grams. We first define two distance notions, and we give three algorithms. The first two, one per each distance notion as a comparison base, are an extension of Ukkonen's algorithm and run in time O(Np) and O(Np log Np), respectively. The third is able to compute both q-gram distances between ED-strings in time O(N p^{q/w}). This is achieved using a data structure we devised named DIGIT: eD-strIng q-Gram suffIx Tree: a generalized suffix tree that keeps tracks of q-grams. To our knowledge, we are the first to define and apply q-gram distance notions with ED-strings.
In this thesis we study the ED-strings distance problem based on the so-called q-grams. We first define two distance notions, and we give three algorithms. The first two, one per each distance notion as a comparison base, are an extension of Ukkonen's algorithm and run in time O(Np) and O(Np log Np), respectively. The third is able to compute both q-gram distances between ED-strings in time O(N p^{q/w}). This is achieved using a data structure we devised named DIGIT: eD-strIng q-Gram suffIx Tree: a generalized suffix tree that keeps tracks of q-grams. To our knowledge, we are the first to define and apply q-gram distance notions with ED-strings.
File
Nome file | Dimensione |
---|---|
Tesi non consultabile. |