## Tesi etd-11222010-110708 |

Thesis type

Tesi di dottorato di ricerca

Author

CANGELOSI, DAVIDE

email address

cangelo@di.unipi.it

URN

etd-11222010-110708

Title

On Improving Stochastic Simulation for Systems Biology

Settore scientifico disciplinare

INF/01

Corso di studi

INFORMATICA

Supervisors

**tutor**Prof. Degano, Pierpaolo

**relatore**Dott. Marangoni, Roberto

Parole chiave

- Adaptive Simulation Stochastic tau-leaping Paralle

Data inizio appello

17/12/2010;

Consultabilità

Completa

Riassunto analitico

Mathematical modeling and computer simulation are powerful approaches for understanding the complexity of biological systems.

In particular, computer simulation represents a strong validation and fast hypothesis verification tool. In the course of the years,

several successful attempts have been made to simulate complex biological processes like metabolic pathways, gene regulatory networks and cell signaling pathways. These processes are stochastic in nature, and furthermore they are characterized by multiple time scale evolutions and great variability in the population size of molecules. The most known method to capture random time evolutions of well-stirred chemical reacting systems is the Gillespie's Stochastic Simulation Algorithm. This Monte carlo method

generates exact realizations of the state of the system by stochastically determining when a reaction will occurs and what reaction it will be. Most of the assumptions and hypothesis are

clearly simplifications but in many cases this method have been proved useful to capture the randomness typical of realistic biological

systems. Unfortunately, often the Gillespie's stochastic simulation method results slow in practice. This posed a great challenge and a

motivation toward the development of new efficient methods able to simulate stochastic and multiscale biological systems. In this

thesis we address the problems of simulating metabolic experiments and develop efficient simulation methods for well-stirred chemically

reacting systems. We showed as a Systems Biology approach can provide a cheap, fast and powerful method for validating models proposed in literature. In the present case, we specified the model of SRI photocycle proposed by Hoff et al. in a suitable developed simulator. This simulator was specifically designed to reproduce in silico wet-lab experiments performed on metabolic networks with several possible controls exerted on them by the operator. Thanks to this, we proved that the screened model is able to explain correctly many light responses but unfortunately it was unable to explain some critical experiments, due to some unresolvable time scale problems. This confirm that our simulator is useful to simulate metabolic experiments. Furthermore, it can be downloaded at the URL

http://sourceforge.net/projects/gillespie-qdc. In order to accelerate the simulation of SSA we first proposed a data parallel implementation on General Purpose Graphics Processing Units of a revised version of the Gillespie's First Reaction Method. The simulations performed on a GeForce 8600M GS Graphic Card with 16 stream processors showed that the parallel computations halves the execution time, and this performance scales with the number of steps of the simulation. We also highlighted some specific problem of the programming environment to execute non trivial general purpose applications. Concluding we proved the

extreme computational power of these low cost and widespread technologies, but the limitations emerged demonstrate that we are far from a general purpose application for GPU. In our investigation we also attempted to achieve higher simulation speed focusing on tau-leaping methods. We revealed that these methods implement a common basic algorithmic convention. This convention is the pre-computation of information necessary to estimate the size of the

leap and the number of reactions that will fire on it. Often these pre-processing operations are used to avoid negative populations. The computational cost to perform these operations is often proportional to the size of the model (i.e. number of reactions). This means that larger models involve larger computational cost. The pre-processing operations result in very efficient simulation when the leap are long and many reactions can be fired. But at the contrary they represent a burden when leap are short and few reactions occur. So to efficiently deal with

the latter cases we proposed a method that works differently respect to the trend. The SSALeaping method, SSAL for short, is a new method which lays in the middle between the direct method (DM) and a tau-leaping. The SSALeaping method adaptively builds leaps and stepwise updates the system state. Differently from methods like the Modified tau-leaping (MTL), SSAL neither shifts from tau-leaping to DM nor pre-selects the largest leap time consistent with the leap condition. Additionally whereas MTL

prevents negative populations taking apart critical and non critical reactions, SSAL generates sequentially the reactions to fire

verifying the leap condition after each reaction selection. We proved that a reaction overdraws one of its reactants if and only if the leap

condition is violated. Therefore, this makes it impossible for the population to become negatives, because SSAL stops the leap

generation in advance. To test the accuracy and the performance of our method we performed a large number of simulations upon realistic

biological models. The tests aimed to span

the number of reactions fired in a leap and the number of reactions of the system as much as possible. Sometimes orders of magnitude. Results showed that our method performs better

than MTL for many of the tested cases, but not in all. Then to augment the number of models eligible to be simulated efficiently we

exploiting the complementarity emerged between SSAL and MTL, and we proposed a new adaptive method, called Adaptive Modified SSALeaping (AMS). During the simulation, our method switches between SSALeaping (SSAL) and Modified tau-leaping, according to conditions on the number of reactions of the model and the predicted number of reactions firing in a

leap. We were able to find both theoretically and experimentally how to estimate the number of reactions that will fire in a leap and the

threshold that determines the switch from one method to the other and viceversa. Results obtained from realistic biological models showed that in practice AMS performs better than SSAL and MTL by augmenting the number of models

eligible ro be simulated efficiently. In fact, the method selects correctly the best algorithm between SSAL and MTL according to the cases.

In this thesis we also investigated other new

parallelization techniques. The parallelization of biological systems stimulated the interest of many researchers because the nature of these systems is parallel and sometimes distributed.

However, the nature of the Gillespie's SSA is strictly sequential. We presented a novel exact formulation of SSA based on the idea of

partitioning the volume. We proved the equivalence between our method and DM, and we have given a simple test to show its accuracy in practice. Then we proposed a variant of

SSALeaping based on the partitioning of the volume, called Partitioned SSALeaping. The main feature we pointed out is that the

dynamics of a system in a leap can be obtained by the composition of the dynamics processed by each sub-volume of the partition. This form of independency gives a different view with respect to existing methods. We only tested the method on a simple model, and we showed that the method accurately matched the results of DM, independently of the number of sub-volumes in the partition. This confirmed that the method works and that independency is effective. We have not already given parallel implementation of this method because this work is still in progress and much work has to be done.

Nevertheless, the Partitioned SSAleaping is a promising approach for a future parallelization on multi core (e.g. GPU's) or in many core

(e.g. cluster) technologies.

In particular, computer simulation represents a strong validation and fast hypothesis verification tool. In the course of the years,

several successful attempts have been made to simulate complex biological processes like metabolic pathways, gene regulatory networks and cell signaling pathways. These processes are stochastic in nature, and furthermore they are characterized by multiple time scale evolutions and great variability in the population size of molecules. The most known method to capture random time evolutions of well-stirred chemical reacting systems is the Gillespie's Stochastic Simulation Algorithm. This Monte carlo method

generates exact realizations of the state of the system by stochastically determining when a reaction will occurs and what reaction it will be. Most of the assumptions and hypothesis are

clearly simplifications but in many cases this method have been proved useful to capture the randomness typical of realistic biological

systems. Unfortunately, often the Gillespie's stochastic simulation method results slow in practice. This posed a great challenge and a

motivation toward the development of new efficient methods able to simulate stochastic and multiscale biological systems. In this

thesis we address the problems of simulating metabolic experiments and develop efficient simulation methods for well-stirred chemically

reacting systems. We showed as a Systems Biology approach can provide a cheap, fast and powerful method for validating models proposed in literature. In the present case, we specified the model of SRI photocycle proposed by Hoff et al. in a suitable developed simulator. This simulator was specifically designed to reproduce in silico wet-lab experiments performed on metabolic networks with several possible controls exerted on them by the operator. Thanks to this, we proved that the screened model is able to explain correctly many light responses but unfortunately it was unable to explain some critical experiments, due to some unresolvable time scale problems. This confirm that our simulator is useful to simulate metabolic experiments. Furthermore, it can be downloaded at the URL

http://sourceforge.net/projects/gillespie-qdc. In order to accelerate the simulation of SSA we first proposed a data parallel implementation on General Purpose Graphics Processing Units of a revised version of the Gillespie's First Reaction Method. The simulations performed on a GeForce 8600M GS Graphic Card with 16 stream processors showed that the parallel computations halves the execution time, and this performance scales with the number of steps of the simulation. We also highlighted some specific problem of the programming environment to execute non trivial general purpose applications. Concluding we proved the

extreme computational power of these low cost and widespread technologies, but the limitations emerged demonstrate that we are far from a general purpose application for GPU. In our investigation we also attempted to achieve higher simulation speed focusing on tau-leaping methods. We revealed that these methods implement a common basic algorithmic convention. This convention is the pre-computation of information necessary to estimate the size of the

leap and the number of reactions that will fire on it. Often these pre-processing operations are used to avoid negative populations. The computational cost to perform these operations is often proportional to the size of the model (i.e. number of reactions). This means that larger models involve larger computational cost. The pre-processing operations result in very efficient simulation when the leap are long and many reactions can be fired. But at the contrary they represent a burden when leap are short and few reactions occur. So to efficiently deal with

the latter cases we proposed a method that works differently respect to the trend. The SSALeaping method, SSAL for short, is a new method which lays in the middle between the direct method (DM) and a tau-leaping. The SSALeaping method adaptively builds leaps and stepwise updates the system state. Differently from methods like the Modified tau-leaping (MTL), SSAL neither shifts from tau-leaping to DM nor pre-selects the largest leap time consistent with the leap condition. Additionally whereas MTL

prevents negative populations taking apart critical and non critical reactions, SSAL generates sequentially the reactions to fire

verifying the leap condition after each reaction selection. We proved that a reaction overdraws one of its reactants if and only if the leap

condition is violated. Therefore, this makes it impossible for the population to become negatives, because SSAL stops the leap

generation in advance. To test the accuracy and the performance of our method we performed a large number of simulations upon realistic

biological models. The tests aimed to span

the number of reactions fired in a leap and the number of reactions of the system as much as possible. Sometimes orders of magnitude. Results showed that our method performs better

than MTL for many of the tested cases, but not in all. Then to augment the number of models eligible to be simulated efficiently we

exploiting the complementarity emerged between SSAL and MTL, and we proposed a new adaptive method, called Adaptive Modified SSALeaping (AMS). During the simulation, our method switches between SSALeaping (SSAL) and Modified tau-leaping, according to conditions on the number of reactions of the model and the predicted number of reactions firing in a

leap. We were able to find both theoretically and experimentally how to estimate the number of reactions that will fire in a leap and the

threshold that determines the switch from one method to the other and viceversa. Results obtained from realistic biological models showed that in practice AMS performs better than SSAL and MTL by augmenting the number of models

eligible ro be simulated efficiently. In fact, the method selects correctly the best algorithm between SSAL and MTL according to the cases.

In this thesis we also investigated other new

parallelization techniques. The parallelization of biological systems stimulated the interest of many researchers because the nature of these systems is parallel and sometimes distributed.

However, the nature of the Gillespie's SSA is strictly sequential. We presented a novel exact formulation of SSA based on the idea of

partitioning the volume. We proved the equivalence between our method and DM, and we have given a simple test to show its accuracy in practice. Then we proposed a variant of

SSALeaping based on the partitioning of the volume, called Partitioned SSALeaping. The main feature we pointed out is that the

dynamics of a system in a leap can be obtained by the composition of the dynamics processed by each sub-volume of the partition. This form of independency gives a different view with respect to existing methods. We only tested the method on a simple model, and we showed that the method accurately matched the results of DM, independently of the number of sub-volumes in the partition. This confirmed that the method works and that independency is effective. We have not already given parallel implementation of this method because this work is still in progress and much work has to be done.

Nevertheless, the Partitioned SSAleaping is a promising approach for a future parallelization on multi core (e.g. GPU's) or in many core

(e.g. cluster) technologies.

File

Nome file | Dimensione |
---|---|

Davide_C...ITTED.pdf | 1.52 Mb |

Contatta l'autore |