Tesi etd-11222010-110708

Tipo di tesi

Tesi di dottorato di ricerca

Autore

CANGELOSI, DAVIDE

Indirizzo email

cangelo@di.unipi.it

URN

etd-11222010-110708

Titolo

On Improving Stochastic Simulation for Systems Biology

Settore scientifico disciplinare

INF/01

Corso di studi

INFORMATICA

Relatori

tutor Prof. Degano, Pierpaolo
relatore Dott. Marangoni, Roberto

Parole chiave

Adaptive Simulation Stochastic tau-leaping Paralle

Data inizio appello

17/12/2010

Consultabilità

Completa

Riassunto

Mathematical modeling and computer simulation are powerful approaches for understanding the complexity of biological systems.
In particular, computer simulation represents a strong validation and fast hypothesis verification tool. In the course of the years,
several successful attempts have been made to simulate complex biological processes like metabolic pathways, gene regulatory networks and cell signaling pathways. These processes are stochastic in nature, and furthermore they are characterized by multiple time scale evolutions and great variability in the population size of molecules. The most known method to capture random time evolutions of well-stirred chemical reacting systems is the Gillespie's Stochastic Simulation Algorithm. This Monte carlo method
generates exact realizations of the state of the system by stochastically determining when a reaction will occurs and what reaction it will be. Most of the assumptions and hypothesis are
clearly simplifications but in many cases this method have been proved useful to capture the randomness typical of realistic biological
systems. Unfortunately, often the Gillespie's stochastic simulation method results slow in practice. This posed a great challenge and a
motivation toward the development of new efficient methods able to simulate stochastic and multiscale biological systems. In this
thesis we address the problems of simulating metabolic experiments and develop efficient simulation methods for well-stirred chemically
reacting systems. We showed as a Systems Biology approach can provide a cheap, fast and powerful method for validating models proposed in literature. In the present case, we specified the model of SRI photocycle proposed by Hoff et al. in a suitable developed simulator. This simulator was specifically designed to reproduce in silico wet-lab experiments performed on metabolic networks with several possible controls exerted on them by the operator. Thanks to this, we proved that the screened model is able to explain correctly many light responses but unfortunately it was unable to explain some critical experiments, due to some unresolvable time scale problems. This confirm that our simulator is useful to simulate metabolic experiments. Furthermore, it can be downloaded at the URL
http://sourceforge.net/projects/gillespie-qdc. In order to accelerate the simulation of SSA we first proposed a data parallel implementation on General Purpose Graphics Processing Units of a revised version of the Gillespie's First Reaction Method. The simulations performed on a GeForce 8600M GS Graphic Card with 16 stream processors showed that the parallel computations halves the execution time, and this performance scales with the number of steps of the simulation. We also highlighted some specific problem of the programming environment to execute non trivial general purpose applications. Concluding we proved the
extreme computational power of these low cost and widespread technologies, but the limitations emerged demonstrate that we are far from a general purpose application for GPU. In our investigation we also attempted to achieve higher simulation speed focusing on tau-leaping methods. We revealed that these methods implement a common basic algorithmic convention. This convention is the pre-computation of information necessary to estimate the size of the
leap and the number of reactions that will fire on it. Often these pre-processing operations are used to avoid negative populations. The computational cost to perform these operations is often proportional to the size of the model (i.e. number of reactions). This means that larger models involve larger computational cost. The pre-processing operations result in very efficient simulation when the leap are long and many reactions can be fired. But at the contrary they represent a burden when leap are short and few reactions occur. So to efficiently deal with
the latter cases we proposed a method that works differently respect to the trend. The SSALeaping method, SSAL for short, is a new method which lays in the middle between the direct method (DM) and a tau-leaping. The SSALeaping method adaptively builds leaps and stepwise updates the system state. Differently from methods like the Modified tau-leaping (MTL), SSAL neither shifts from tau-leaping to DM nor pre-selects the largest leap time consistent with the leap condition. Additionally whereas MTL
prevents negative populations taking apart critical and non critical reactions, SSAL generates sequentially the reactions to fire
verifying the leap condition after each reaction selection. We proved that a reaction overdraws one of its reactants if and only if the leap
condition is violated. Therefore, this makes it impossible for the population to become negatives, because SSAL stops the leap
generation in advance. To test the accuracy and the performance of our method we performed a large number of simulations upon realistic
biological models. The tests aimed to span
the number of reactions fired in a leap and the number of reactions of the system as much as possible. Sometimes orders of magnitude. Results showed that our method performs better
than MTL for many of the tested cases, but not in all. Then to augment the number of models eligible to be simulated efficiently we
exploiting the complementarity emerged between SSAL and MTL, and we proposed a new adaptive method, called Adaptive Modified SSALeaping (AMS). During the simulation, our method switches between SSALeaping (SSAL) and Modified tau-leaping, according to conditions on the number of reactions of the model and the predicted number of reactions firing in a
leap. We were able to find both theoretically and experimentally how to estimate the number of reactions that will fire in a leap and the
threshold that determines the switch from one method to the other and viceversa. Results obtained from realistic biological models showed that in practice AMS performs better than SSAL and MTL by augmenting the number of models
eligible ro be simulated efficiently. In fact, the method selects correctly the best algorithm between SSAL and MTL according to the cases.

In this thesis we also investigated other new
parallelization techniques. The parallelization of biological systems stimulated the interest of many researchers because the nature of these systems is parallel and sometimes distributed.
However, the nature of the Gillespie's SSA is strictly sequential. We presented a novel exact formulation of SSA based on the idea of
partitioning the volume. We proved the equivalence between our method and DM, and we have given a simple test to show its accuracy in practice. Then we proposed a variant of
SSALeaping based on the partitioning of the volume, called Partitioned SSALeaping. The main feature we pointed out is that the
dynamics of a system in a leap can be obtained by the composition of the dynamics processed by each sub-volume of the partition. This form of independency gives a different view with respect to existing methods. We only tested the method on a simple model, and we showed that the method accurately matched the results of DM, independently of the number of sub-volumes in the partition. This confirmed that the method works and that independency is effective. We have not already given parallel implementation of this method because this work is still in progress and much work has to be done.
Nevertheless, the Partitioned SSAleaping is a promising approach for a future parallelization on multi core (e.g. GPU's) or in many core
(e.g. cluster) technologies.

File

Nome file	Dimensione
Davide_C...ITTED.pdf	1.52 Mb
Contatta l’autore

ETD

Archivio digitale delle tesi discusse presso l’Università di Pisa

Tesi etd-11222010-110708