logo SBA

ETD

Archivio digitale delle tesi discusse presso l’Università di Pisa

Tesi etd-11292015-145039


Tipo di tesi
Tesi di dottorato di ricerca
Autore
LAMETTI, SILVIA
URN
etd-11292015-145039
Titolo
Structured Parallel Programming and Cache Coherence in Multicore Architectures
Settore scientifico disciplinare
INF/01
Corso di studi
SCIENZE DI BASE "GALILEO GALILEI"
Relatori
tutor Prof. Vanneschi, Marco
Parole chiave
  • cache coherence
  • cmp
  • cost models
  • multicore
  • parallel patterns
  • parallelism forms
  • performance models
  • structured parallel programming
  • Tilera TilePro64
Data inizio appello
20/12/2015
Consultabilità
Completa
Riassunto
It is clear that multicore processors have become the building blocks of today’s high-performance computing platforms. The advent of massively parallel single-chip microprocessors further emphasizes the gap that exists between parallel architectures and parallel programming maturity. Our research group, starting from the experiences on distributed and shared memory multiprocessor, was one of the first to propose a Structured Parallel Programming approach to bridge this gap. In this scenario, one of the biggest problems is that an application’s performance is often affected by the sharing pattern of data and its impact on Cache Coherence. Currently multicore platforms rely on hardware or automatic cache coherence techniques that allow programmers to develop programs without taking into account the problem. It is well known that standard coherency protocols are inefficient for certain data communication patterns and these inefficiencies will be amplified by the increased core number and the complex memory hierarchies.
Following a structured parallelism approach, our methodology to attack these problems is based on two interrelated issues: structured parallelism paradigms and cost models (or performance models).
Evaluating the performance of a program, although widely studied, is still an open problem in the research community and, notably, specific cost models to de- scribe multicores are missing. For this reason in this thesis, we define an abstract model for cache coherent architectures, which is able to capture the essential elements and the qualitative behaviors of multicore-based systems. Furthermore, we show how this abstract model combined with well known performance modelling techniques, such as analytical modelling (e.g., queueing models and stochastic process algebras) or simulations, provide an application- and architecture-dependent cost model to predict structured parallel applications performances.
Starting out from the behavior and performance predictability of structured parallelism schemes, in this thesis we address the issue of cache coherence in multicore architectures, following an algorithm-dependent approach, a particular kind of software cache coherence solution characterized by explicit cache management strategies, which are specific of the algorithm to be executed. Notably, we ensure parallel correctness by exploiting architecture-specific mechanisms and by defining proper data structures in order to “emulate” cache coherence solutions in an efficient way for each computation. Algorithm-dependent cache coherence can be efficiently implemented at the support level of structured parallelism paradigms, with absolute transparency with respect to the application programmer. Moreover, by using the cost model, in this thesis we study and compare different algorithm-dependent implementations, such as those based on automatic cache coherence with respect to an original, non-automatic and lock-free solution based on interprocessor communications. Notably, with this latter implementation, in some cases, we are able to reduce the number of memory accesses, cache transfers and synchronizations and increasing computation parallelism with respect to the use of automatic cache coherence.
Current architectures do not usually allow disabling automatic cache coherence. However, the emergence of many-core architectures somewhat changed the scenario, so that some architectures, such as the Tilera TilePro64, allow to control and disable the automatic cache coherence facilities. For this reason, in this thesis we finally apply our methodology to TilePro64 platform in order provide a further validation of the results obtained by our cost model.
File