ETD

Archivio digitale delle tesi discusse presso l'Università di Pisa

Tesi etd-12052011-215104


Tipo di tesi
Tesi di dottorato di ricerca
Autore
BATTAGLIA, GIOVANNI
URN
etd-12052011-215104
Titolo
Discovery of Unconventional Patterns for Sequence Analysis: Theory and Algorithms
Settore scientifico disciplinare
INF/01
Corso di studi
INFORMATICA
Relatori
tutor Prof. Grossi, Roberto
Parole chiave
  • transposons
  • permutation patterns
  • pattern discovery
  • mask patterns
Data inizio appello
22/12/2011
Consultabilità
Completa
Riassunto
The biology community is collecting a large amount of raw data, such as the genome sequences of organisms, microarray data, interaction
data such as gene-protein interactions, protein-protein interactions, etc. This amount is rapidly increasing and the process of understanding the data is lagging behind the process of acquiring it. An inevitable first step towards making sense of the data is to study their regularities focusing on the non-random structures appearing surprisingly often in the input sequences: patterns.

In this thesis we discuss three incarnations of the pattern discovery task, exploring three types of patterns that can model different regularities of the input dataset.

While mask patterns have been designed to model short repeated biological sequences, showing a high conservation of their content at some specific positions, permutation patterns have been designed to detect repeated patterns whose parts maintain their physical adjacency but
not their ordering in all the pattern occurrences.
Transposons, instead, model mobile sequences in the input dataset, which can be discovered by comparing different copies of the same
input string, detecting large insertions and deletions in their alignment.
File