ETD

Archivio digitale delle tesi discusse presso l'Università di Pisa

Tesi etd-10082008-113937


Tipo di tesi
Tesi di dottorato di ricerca
Autore
BERTOLLI, CARLO
URN
etd-10082008-113937
Titolo
Fault Tolerance for High-Performance Applications Using Structured Parallelism Models
Settore scientifico disciplinare
INF/01
Corso di studi
INFORMATICA
Relatori
Relatore Prof. Vanneschi, Marco
Parole chiave
  • Structured Parallel Programming
  • High-Performance Computing
  • High-Level Programming Models
  • Fault Tolerance
Data inizio appello
15/12/2008
Consultabilità
Completa
Riassunto
In the last years parallel computing has increasingly exploited the high-level models of structured parallel programming, an example of which are algorithmic skeletons. This trend has been motivated by the properties featuring structured parallelism models, which can be used to derive several (static and dynamic) optimizations at various implementation levels. In this thesis we study the properties of structured parallel models useful for attacking the issue of providing a fault tolerance support oriented towards High-Performance applications. This issue has been traditionally faced in two ways: (i) in the context of unstructured parallelism models (e.g. MPI), which computation model is essentially based on a distributed set of processes communicating through message-passing, with an approach based on checkpointing and rollback recovery or software replication; (ii) in the context of high-level models, based on a specific parallelism model (e.g. data-flow) and/or an implementation model (e.g. master-slave), by introducing specific techniques based on the properties of the programming and computation models themselves. In this thesis we make a step towards a more abstract viewpoint and we highlight the properties of structured parallel models interesting for fault tolerance purposes. We consider two classes of parallel programs (namely task parallel and data parallel) and we introduce a fault tolerance support based on checkpointing and rollback recovery. The support is derived according to the high-level properties of the parallel models: we call this derivation specialization of fault tolerance techniques, highlighting the difference with classical solutions supporting structure-unaware computations. As a consequence of this specialization, the introduced fault tolerance techniques can be configured and optimized to meet specific needs at different implementation levels. That is, the supports we present do not target a single computing platform or a specific class of them. Indeed the specializations are the mechanism to target specific issues of the exploited environment and of the implemented applications, as proper choices of the protocols and their configurations.
File