logo SBA

ETD

Archivio digitale delle tesi discusse presso l’Università di Pisa

Tesi etd-09272017-111428


Tipo di tesi
Tesi di laurea magistrale
Autore
SPINELLI, ANDREA
URN
etd-09272017-111428
Titolo
Towards Improved Handling of CNV Data for Genetic Analysis - development of a dedicated module for the Gemini toolset
Dipartimento
INGEGNERIA DELL'INFORMAZIONE
Corso di studi
INGEGNERIA BIOMEDICA
Relatori
relatore Prof. Bechini, Alessio
relatore Dott.ssa D'Aurizio, Romina
controrelatore Prof. Mangione, Maurizio
Parole chiave
  • bioinformatics
  • python
  • genome analysis
  • copy-number variants
  • gemini
  • software development
Data inizio appello
13/10/2017
Consultabilità
Completa
Riassunto
Over the years, researchers have revealed that all kind of DNA variations play a role in the susceptibility and genetic disease: not only the single-nucleotide polymorphisms (SPNs), but also larger structural variants called copy-number variants (CNVs). A CNV is an alteration of DNA, either a duplication or a deletion, whose length falls in the range from 50bp to millions of bases; the interested sequences span large portions of the genome and present a high number of repetitions of a base pattern, possibly encompassing different genes. CNVs are also known to modulate different aspects of the genetic disease.
In this study, the variants are extrapolated from raw sequencing data, and then stored in a tab-separated format file called Variant Call Format (VCF). In order to evaluate if a single variant or a set of variants may confer risk, variants are compared against different reference resources.
Gemini is a free and open-source framework for exploring genome variation, based on Python. This thesis work aims to extend the functionalities of Gemini, creating additional tools to handle CNV data.
Unlike existing software, Gemini integrates genetic variation with a diverse set of genome annotations (e.g. ENCODE, UCSC, ClinVar) into a unified portable database (based on SQLite). Its portability and flexibility, along with the possible integration of other genome annotations and the capability to query about variant information and the extensible with other Python tools, make Gemini an extremely interesting software to use and to extend.
As a prime result of the presented work, support of CNV data has been added to Gemini: it is now possible to load VCF files and create a database that integrates some existing genome annotations. To filter the variants, it has been developed a tool to overlap variants with a track of the Database of Genomic Variants (DGV) containing a CNV map of benign variants. This tool let us performing filter using overlap fraction, alteration (deletion or duplication), length of overlap, and sample. It has been also created a tool to annotate variants with the Gemini gene map (a track of Ensembl v.75). Such a tool provides also the options to load a custom gene map and to select the sample over run the annotation. The annotation tool produces also an heatmap that shows the correlation between variant alterations and the genes involved by variants. The browser-based interface of Gemini has been extended by adding the necessary parts for every new tool, including a wizard for the VCF load.
File