logo SBA

ETD

Archivio digitale delle tesi discusse presso l’Università di Pisa

Tesi etd-06302016-120838


Tipo di tesi
Tesi di laurea magistrale
Autore
TARSIA, ANNA
URN
etd-06302016-120838
Titolo
Deriving Rules for microRNA Regulation from Patterns in RNA Sequencing Data
Dipartimento
INFORMATICA
Corso di studi
INFORMATICA
Relatori
relatore Prof.ssa Pisanti, Nadia
relatore Prof. Sætrom, Pål
Parole chiave
  • short reads
  • microRNA targets
  • microRNA
  • expression level
  • correlation
  • strand selection
Data inizio appello
22/07/2016
Consultabilità
Non consultabile
Data di rilascio
22/07/2086
Riassunto
MicroRNAs (miRNAs) are single stranded non-coding RNA molecules, approximately 22 nucleotides (nts) in length, that can regulate gene expression in animals, plants and some viruses.
This thesis explores the process of miRNA biogenesis with the aim to build a predictive model for the miRNA strand selection. This phenomenon is largely unexplored, and the further objective of this thesis is then to add to the knowledge of this phenomenon.
During the miRNA biogenesis, the new miRNA strands are excised from longer double-stranded regions of RNA (precursor miRNA). A strand is chosen to join the silencing complex that will affect the production of proteins. The other one is degraded (passenger strand or miRNA* strand).
The thermodynamic stability of the duplex, as well as other factors, appear to play an important role in this decision, but the mechanism behind the strand selection is still not fully understood. Recent studies have for example shown that for some miRNAs, both strands are included with equal probability. This suggests the existence of other mechanisms to control the selection of mature miRNAs.

There are recognized properties to detect functional miRNAs, like expression level and short reads. We hypothesize that the miRNAs most likely to be selected are those highly expressed and with a major number of short reads, and ultimately confirm this hypothesis.

The thesis consists of five parts, starting with a descriptive survey of DNA, RNA, proteins, gene expression and miRNA, which effectively centers the focus on miRNA biology.
Cancer research is a wide and diverse field, and the necessity of a narrowed focus applies in tandem with the computer science angle taken to the subject. Following on this, chapter 3 presents the tools that we have been using as support to our model. Chapter 4 discusses methods employed in the research and the data sets on which we have been testing them: matching human samples of miRNAs and mRNAs from the FANTOM5 (F5) consortium.

The most important metods are those computing the correlation coefficients of the data and the ones defining the expression level classification and the short reads classification for the miRNA sequences, because these offer a high likelihood of relevant results.
Finally, the results are presented and discussed in chapter 5. Here, we find distribution charts showing that highly expressed miRNAs, as well as miRNAs with short reads, have a more negative correlation with their predicted mRNA targets. The group of miRNAs with detectable short reads is the one that has the biggest potential for further research, because short reads can align to the miRNA sequences with different levels of accuracy (offset) and in different positions (start/end).
Indeed, an additional method explores miRNAs with short reads positioned at the start of the sequences compared to the others. MiRNAs with "start" reads showed to have a more negative correlation with their predicted mRNA targets.

In this thesis, we have chosen to focus on finding good parameters to classify miRNA sequences, sometimes by setting specific contraints, like a very precise offset value for the short reads alignment, disregarding the others. This allowed us to arrive at more accurate and consistent conclusions, with a higher relevance for further research than we would had the scope been wider.
The research can be refined and there is room for improvement. The model needs to be tested on matching samples that are not dependent on one another (donor, tissue, time series and replicates) and also on different sets of samples.
Furthermore, it is reasonable to assume that experiences, methods and results might, at least in part, be transferrable to other areas of bioinformatics and cancer research.
File