Tesi etd-03262026-161807

Tipo di tesi

Tesi di laurea magistrale

URN

etd-03262026-161807

Titolo

Design and implementation of a DPDK-based data path for the DistWalk distributed workload emulator

Dipartimento

INGEGNERIA DELL'INFORMAZIONE

Corso di studi

COMPUTER ENGINEERING

Relatori

relatore Lettieri, Giuseppe
supervisore Cucinotta, Tommaso

Parole chiave

Cloud Computing
Distributed systems
DPDK
Kernel Bypass
Network latency
Performance evaluation

Data inizio appello

15/04/2026

Consultabilità

Non consultabile

Data di rilascio

15/04/2029

Riassunto (Inglese)

As distributed systems grow in size and complexity across cloud and edge deployments, evaluating their end-to-end performance has become increasingly challenging. In this context, performance metrics, such as latency and throughput, are dependent upon a variety of factors spanning computation, storage, and networking. While conventional benchmarking tools evaluate these factors individually, DistWalk is a distributed workload emulator capable of modeling realistic multi-tier request chains. This allows developers to assess the expected performance of a distributed application early in the design process, before the application itself is fully implemented. For performance-critical workloads, however, the POSIX APIs DistWalk mostly relies upon become a limiting factor, as all traffic is routed through the Linux kernel networking stack. At high packet rates, the overhead introduced by the kernel — including system calls, per-packet buffer allocation, data copies, interrupt handling, and protocol processing — becomes the dominant bottleneck, preventing DistWalk from being used to evaluate deployment configurations intended for latency-critical applications.
This thesis presents the design and implementation of a data path for DistWalk based on the Data Plane Development Kit (DPDK), a kernel-bypass framework that operates directly at Layer 2 of the OSI model. Among the key optimizations that DPDK offers, the interrupt-driven model of the kernel is replaced with continuous busy-polling from a dedicated CPU core, significantly reducing context switches and eliminating interrupt overhead. Data is exchanged between the NIC and the application in a zero-copy fashion through pre-allocated, hugepage-backed memory pools, which minimize TLB misses and eliminate per-packet allocation. On each DistWalk node, both the DPDK and socket-based event loops execute in an interleaved fashion, allowing DPDK, TCP, and UDP clients to be served within the same worker thread. Outbound packets are batched into a per-connection transmit array and flushed at the end of each processing cycle to amortize the cost of the transmit burst call. The implementation also supports multi-threaded operation through Receive Side Scaling (RSS), which distributes incoming packets to different worker threads based on their source MAC address.
A comprehensive set of experiments was performed on two servers connected through a 10 Gigabit Ethernet link using Intel X710 network interface controllers. The systems were configured to maximize reproducibility by tuning hardware and software settings to reduce the randomness introduced by power management features and to minimize interference from operating system activity. DPDK achieved a median round-trip latency of 11.2 μs, approximately 2.7 times faster than UDP and 3 times faster than TCP in their best achievable configurations, obtained by varying CPU idle states and DistWalk parameters. Additional experiments evaluate the impact of different DPDK configurations, including SR-IOV Virtual Functions and virtual Ethernet pairs, NUMA core placement, and multi-queue scaling through RSS.
By integrating DPDK into DistWalk, developers who are considering the adoption of kernel-bypass techniques can evaluate their potential benefits early in the design process, before the application itself is built. The DPDK code path is fully compatible with the existing socket-based implementation, allowing both to coexist within the same binary and enabling cross-protocol forwarding scenarios where DPDK and TCP nodes operate in the same network topology. The experimental results show the benefits of DPDK in terms of round-trip latency and the impact that system-level configuration choices can have on the observed performance, providing useful guidance for designing low-latency deployments.

Riassunto (Italiano)

File

Nome file	Dimensione
La tesi non è consultabile. Contatta l’autore

ETD

Archivio digitale delle tesi discusse presso l’Università di Pisa

Tesi etd-03262026-161807