logo SBA

ETD

Archivio digitale delle tesi discusse presso l’Università di Pisa

Tesi etd-02272019-190210


Tipo di tesi
Tesi di laurea magistrale
Autore
PUNZI, GIULIA
URN
etd-02272019-190210
Titolo
Combinatorial Properties of Maximal Common Subsequences and Efficient Algorithms for Their Enumeration
Dipartimento
MATEMATICA
Corso di studi
MATEMATICA
Relatori
relatore Prof. Grossi, Roberto
Parole chiave
  • maximal
  • MCS
  • LCS
  • massimali
  • sequences
  • sequenze
  • enumeration
  • enumerazione
  • strings
  • stringhe
Data inizio appello
15/03/2019
Consultabilità
Non consultabile
Data di rilascio
15/03/2089
Riassunto
A Maximal Common Subsequence (MCS) S between two strings X and Y is a subsequence of both strings such that inserting any character at any position of S no longer yields a common subsequence of X and Y. MCS are a natural generalization of the classical bioinformatics concept of Longest Common Subsequence (LCS), which is simply a MCS of maximum length. LCS have been widely employed by the community ever since the 1970s to perform string comparison operations. LCS, however, have serious computational limitations: there is a conditional quadratic lower bound on the time to find one LCS. This bound does not hold for MCS, which can be found almost linearly. Because of this, their enumeration holds major interest, as it can give us major insight on the strings in a shorter time than the currently employed LCS. In this thesis, we study in detail the properties of MCS, some of which are rather counterintuitive. Based on them, we develop two different algorithms for their enumeration. The first, dynamic programming algorithm, is based on an incremental approach, and turns out to be time-inefficient in the worst case. The second, more refined, binary partition algorithm builds increasingly long prefixes of MCS, and proves to be an efficient polynomial delay enumeration algorithm.
File