ETD system

Electronic theses and dissertations repository


Tesi etd-09232015-150904

Thesis type
Tesi di laurea magistrale
Long non-coding RNAs, a novel class of regulatory RNAs: identification and characterization in the model species Brachypodium distachyon
Corso di studi
relatore Dott. Bernardi, Rodolfo
relatore Prof. Pè, Mario Enrico
Parole chiave
  • Brachypodium
  • Bioinformatics
  • RNA-Seq
  • Long non-coding RNAs
Data inizio appello
Riassunto analitico
Ninety percent of the eukaryotic genome is transcribed although only a small part corresponds to protein coding mRNAs, suggesting that a large proportion of transcribed RNAs do not code for proteins, hence classified as non-coding RNAs (ncRNAs). High-throughput sequencing technology has allowed the identification and characterization of several classes of ncRNAs with key roles in various biological processes. Among ncRNAs, long ncRNAs (lncRNAs) are transcripts typically longer than 200 nucleotides that tend to be expressed at low levels and exhibit tissue-specific/cell-specific or stress responsive expression profiles. LncRNAs have been identified in animals and in plants as well, where they are involved in different regulatory pathways both in development and stress responses, even if the understanding of molecular basis of these mechanisms remains largely unexplored.
My thesis project aims at identifying lncRNAs in Brachypodium distachyon (Bd), a wild grass belonging to the Pooideae and a model species for temperate cereals, such as wheat and barley.
A whole-genome annotation and a detailed analysis of lncRNAs expression patterns have been performed for the first time in Brachypodium. Moreover the potential lncRNA targets were investigated to highlight new regulatory networks and cross-talk between different RNA molecules.
Public and proprietary RNA-Seq data sets from 15 different experiments conducted in the reference inbred line Bd21 were analysed in this study. Public RNA-Seq data from different experiments, including several plant organs, were downloaded from the Sequence Read Archive ( Proprietary RNA-Seq libraries were previously produced by the lab from three leaf developmental leaf areas: proliferation, expansion and mature, grown in control and drought stress conditions. For each proprietary RNA-Seq sample, three biological replicates were produced.
This dataset is characterized by a total of 705 millions reads, which were subjected to a quality analysis. Each experiment was aligned independently to the Bd21 reference genome (v.2.1) using the spliced read aligner TopHat2 and, successively, for each experiment the transcriptome was de novo assembled using Cufflinks.
In order to identify Bd lncRNAs an in house bioinformatic pipeline was used. Briefly, this pipeline applies five filters based on the main lncRNA features: size selection, Open Reading Frame filter, known protein domain filter, Coding Potential Calculator, filter of housekeeping lncRNAs and precursors of small RNAs. Starting from the whole set of loci/isoforms (99141) de novo reconstructed, 2507 bona fide lncRNAs were identified.
Bona fide lncRNAs differential expression analysis was taken into account for datasets with replicates, i.e. proprietary libraries from different developing areas of the third leaf. This analysis revealed that several lncRNAs are differentially expressed during leaf cell differentiation and during drought treatment. Some lncRNAs resulted more abundant in specific plant stages, tissues or organs.
Moreover, a computational method developed to identify endogenous microRNA target mimic (eTM) allowed to investigate the link between lncRNAs and microRNAs through target mimicry, a regulatory mechanism for miRNA functions in plants in which the decoy RNAs bind to miRNAs via complementary sequences and therefore could interfere with the interaction between miRNAs and their authentic targets.