logo SBA

ETD

Archivio digitale delle tesi discusse presso l’Università di Pisa

Tesi etd-04052020-135026


Tipo di tesi
Tesi di laurea magistrale
Autore
CARDACI, ANDREA
URN
etd-04052020-135026
Titolo
Performance Analysis of Stream Processing Systems on Multi-cores
Dipartimento
INFORMATICA
Corso di studi
INFORMATICA E NETWORKING
Relatori
relatore Mencagli, Gabriele
Parole chiave
  • Apache Flink
  • Apache Storm
  • data stream processing
  • parallel computing
  • performance analysis
  • WindFlow
Data inizio appello
08/05/2020
Consultabilità
Tesi non consultabile
Riassunto
Real-time requirements are becoming an increasingly common constraint of several existing large-scale applications needing to process large volumes of data in a timely manner. This has encouraged the development of Stream Processing Systems (SPSs) as general-purpose frameworks allowing application developers to focus mainly on the business logic code of their applications, while the provided abstractions hide low-level implementation tasks like resource scheduling and data exchange. Many state-of-the-art SPSs deal with high-throughput input streams by adopting a scale-out approach, i.e., by dividing the workload among several nodes of a distributed system. To this end, they rely on the Java Virtual Machine (JVM) for portability and popularity of this language. This distributed design fails to exploit the full potential of modern multi-core processors, since the provided processing bandwidth is often far from the memory bandwidth limit of the machine. This work selects two well-established distributed frameworks (Apache Flink and Storm) and compares their performance and programming model with WindFlow, a C++17 stream processing library that explicitly targets shared-memory systems. The benchmarks are based on two data streaming applications commonly used in prior works to evaluate the performance of SPSs. In the single-node multi-core scenario, our results show a substantial improvement in both throughput and latency for WindFlow when compared with the state-of-the-art frameworks. The main contribution of this thesis is to demonstrate that the obtained gain is enough to justify the investment of resources in developing SPSs that target shared-memory systems in addition to the distributed solutions existing so far.
File