logo SBA


Digital archive of theses discussed at the University of Pisa


Thesis etd-09152016-145603

Thesis type
Tesi di dottorato di ricerca
Thesis title
Parallel Patterns for Adaptive Data Stream Processing
Academic discipline
Course of study
tutor Danelutto, Marco
tutor Vanneschi, Marco
  • adaptive
  • data
  • latency
  • model predictive control
  • multicore
  • parallel
  • patterns
  • power
  • processing
  • stream
Graduation session start date
In recent years our ability to produce information has been growing steadily, driven by an ever increasing computing power, communication rates, hardware and software sensors diffusion. This data is often available in the form of continuous streams and the ability to gather and analyze it to extract insights and detect patterns is a valuable opportunity for many businesses and scientific applications. The topic of Data Stream Processing (DaSP) is a recent and highly active research area dealing with the processing of this streaming data.

The development of DaSP applications poses several challenges, from efficient algorithms for the computation to programming and runtime systems to support their execution. In this thesis two main problems will be tackled:
* need for high performance: high throughput and low latency are critical requirements for DaSP problems. Applications necessitate taking advantage of parallel hardware and distributed systems, such as multi/manycores or cluster of multicores, in an effective way;
* dynamicity: due to their long running nature (24hr/7d), DaSP applications are affected by highly variable arrival rates and changes in their workload characteristics. Adaptivity is a fundamental feature in this context: applications must be able to autonomously scale the used resources to accommodate dynamic requirements and workload while maintaining the desired Quality of Service (QoS) in a cost-effective manner.

In the current approaches to the development of DaSP applications are still missing efficient exploitation of intra-operator parallelism as well as adaptations strategies with well known properties of stability, QoS assurance and cost awareness. These are the gaps that this research work tries to fill, resorting to well know approaches such as Structured Parallel Programming and Control Theoretic models.
The dissertation runs along these two directions.

The first part deals with intra-operator parallelism. A DaSP application can be naturally expressed as a set of operators (i.e. intermediate computations) that cooperate to reach a common goal. If QoS requirements are not met by the current implementation, bottleneck operators must be internally parallelized.
We will study recurrent computations in window based stateful operators and propose patterns for their parallel implementation.
Windowed operators are the most representative class of stateful data stream operators.
Here computations are applied on the most recent received data. Windows are dynamic data structures: they evolve over time in terms of content and, possibly, size.
Therefore, with respect to traditional patterns, the DaSP domain requires proper specializations and enhanced features concerning data distribution and management policies for different windowing methods.
A structured approach to the problem will reduce the effort and complexity of parallel programming. In addition, it simplifies the reasoning about the performance properties of a parallel solution (e.g. throughput and latency).
The proposed patterns exhibit different properties in terms of applicability and profitability that will be discussed and experimentally evaluated.

The second part of the thesis is devoted to the proposal and study of predictive strategies and reconfiguration mechanisms for autonomic DaSP operators.
Reconfiguration activities can be implemented in a transparent way to the application programmer thanks to the exploitation of parallel paradigms with well known structures. Furthermore, adaptation strategies may take advantage of the QoS predictability of the used parallel solution.
Autonomous operators will be driven by means of a Model Predictive Control approach, with the intent of giving QoS assurances in terms of throughput or latency in a resource-aware manner.
An experimental section will show the effectiveness of the proposed approach in terms of execution costs reduction as well as the stability degree of a system reconfiguration. The experiments will target shared and distributed memory architectures.