logo SBA


Digital archive of theses discussed at the University of Pisa


Thesis etd-03032014-181603

Thesis type
Tesi di dottorato di ricerca
Thesis title
The peculiar structure of the olive (Olea europaea L.) genome as shown by massively parallel sequencing data.
Academic discipline
Course of study
tutor Dott.ssa Natali, Lucia
commissario Zuccolo, Andrea
commissario Pistelli, Laura
commissario Martini, Claudia
commissario Lupo, Giuseppe
  • assembly of NGS reads
  • genome landscape
  • Olea europaea
  • repetitive DNA
  • retrotransposons
  • SINEs
  • tandem repeats
Graduation session start date
The olive tree (Olea europaea L.) is poorly characterized at genetic and genomic level compared to other fruit tree crops. In the frame of the Italian project OLEA, aimed to obtain the complete sequence of the olive genome, we performed a deep analysis of the repetitive component of this genome, using NGS techniques (454-Roche and Illumina).
In a first work, we described different computational procedures for isolating and characterizing olive repeated sequences. These procedures were used to determine the structure of the genome and the composition of its repetitive fraction. Our analyses showed the peculiar landscape of the olive genome, related to the massive amplification of tandem repeats, which represent about 31% of the whole genome, more than that reported for any other sequenced plant genome. Tandem repeats are represented by six main families of different length, two of which were firstly discovered in these experiments. The other large redundant class in the olive genome is represented by transposable elements, especially LTR-retrotransposons (LTR-REs). Similar procedures were concurrently applied to the genome of another species, the sunflower, and an article on this species is reported as appendix # 2.
In a second work, we provided a characterization of 255 unique full-length LTR-REs, identified scanning a number of BAC clone sequences. Copia elements resulted more numerous than Gypsy ones (162 vs. 81), 12 elements were not assigned to any superfamily because lacking of distinctive domains. Mapping a large set of Illumina reads onto the LTR-REs revealed that Gypsy families are made of more members than Copia ones. Four RE families resulted composed especially by solo-LTRs. The insertion time of intact retroelements, measured by sister LTRs divergence, showed that the mean insertion age of the isolated REs is around 18 million years (MY), although some isolated elements inserted relatively recently. Gypsy and Copia REs showed different waves of transposition, with Gypsy elements especially active between 10 and 25 MY and nearly inactive in the last 7 MY.
In the third work, using a specific bioinformatic pipeline on olive BAC clone sequences, we identified 418 olive Short Interspersed Nuclear Elements (SINEs), that constitute one of the first SINE collection in a dicotyledonous species. The identified SINEs represent 0.48% of the olive genome and their length ranges from 62 to 588 bp. The vast majority of identified SINEs resulted low or medium redundant, often in association with genic sequences. Analysis of sequence similarity allowed to identify ten major families. Our results demonstrate the suitability of the pipeline employed for SINE identification and will favour further analyses on these relatively unknown elements to be performed in other plant species.