Tesi etd-03262026-161807 |
Link copiato negli appunti
Tipo di tesi
Tesi di laurea magistrale
Autore
MULÈ, NICCOLÒ
URN
etd-03262026-161807
Titolo
Design and implementation of a DPDK-based data path for the DistWalk distributed workload emulator
Dipartimento
INGEGNERIA DELL'INFORMAZIONE
Corso di studi
COMPUTER ENGINEERING
Relatori
relatore Lettieri, Giuseppe
supervisore Cucinotta, Tommaso
supervisore Cucinotta, Tommaso
Parole chiave
- Cloud Computing
- Distributed systems
- DPDK
- Kernel Bypass
- Network latency
- Performance evaluation
Data inizio appello
15/04/2026
Consultabilità
Non consultabile
Data di rilascio
15/04/2029
Riassunto (Inglese)
As distributed systems grow in size and complexity across cloud and edge deployments, evaluating their end-to-end performance has become increasingly challenging. In this context, performance metrics, such as latency and throughput, are dependent upon a variety of factors spanning computation, storage, and networking. While conventional benchmarking tools evaluate these factors individually, DistWalk is a distributed workload emulator capable of modeling realistic multi-tier request chains. This allows developers to assess the expected performance of a distributed application early in the design process, before the application itself is fully implemented. For performance-critical workloads, however, the POSIX APIs DistWalk mostly relies upon become a limiting factor, as all traffic is routed through the Linux kernel networking stack. At high packet rates, the overhead introduced by the kernel — including system calls, per-packet buffer allocation, data copies, interrupt handling, and protocol processing — becomes the dominant bottleneck, preventing DistWalk from being used to evaluate deployment configurations intended for latency-critical applications.
This thesis presents the design and implementation of a data path for DistWalk based on the Data Plane Development Kit (DPDK), a kernel-bypass framework that operates directly at Layer 2 of the OSI model. Among the key optimizations that DPDK offers, the interrupt-driven model of the kernel is replaced with continuous busy-polling from a dedicated CPU core, significantly reducing context switches and eliminating interrupt overhead. Data is exchanged between the NIC and the application in a zero-copy fashion through pre-allocated, hugepage-backed memory pools, which minimize TLB misses and eliminate per-packet allocation. On each DistWalk node, both the DPDK and socket-based event loops execute in an interleaved fashion, allowing DPDK, TCP, and UDP clients to be served within the same worker thread. Outbound packets are batched into a per-connection transmit array and flushed at the end of each processing cycle to amortize the cost of the transmit burst call. The implementation also supports multi-threaded operation through Receive Side Scaling (RSS), which distributes incoming packets to different worker threads based on their source MAC address.
A comprehensive set of experiments was performed on two servers connected through a 10 Gigabit Ethernet link using Intel X710 network interface controllers. The systems were configured to maximize reproducibility by tuning hardware and software settings to reduce the randomness introduced by power management features and to minimize interference from operating system activity. DPDK achieved a median round-trip latency of 11.2 μs, approximately 2.7 times faster than UDP and 3 times faster than TCP in their best achievable configurations, obtained by varying CPU idle states and DistWalk parameters. Additional experiments evaluate the impact of different DPDK configurations, including SR-IOV Virtual Functions and virtual Ethernet pairs, NUMA core placement, and multi-queue scaling through RSS.
By integrating DPDK into DistWalk, developers who are considering the adoption of kernel-bypass techniques can evaluate their potential benefits early in the design process, before the application itself is built. The DPDK code path is fully compatible with the existing socket-based implementation, allowing both to coexist within the same binary and enabling cross-protocol forwarding scenarios where DPDK and TCP nodes operate in the same network topology. The experimental results show the benefits of DPDK in terms of round-trip latency and the impact that system-level configuration choices can have on the observed performance, providing useful guidance for designing low-latency deployments.
This thesis presents the design and implementation of a data path for DistWalk based on the Data Plane Development Kit (DPDK), a kernel-bypass framework that operates directly at Layer 2 of the OSI model. Among the key optimizations that DPDK offers, the interrupt-driven model of the kernel is replaced with continuous busy-polling from a dedicated CPU core, significantly reducing context switches and eliminating interrupt overhead. Data is exchanged between the NIC and the application in a zero-copy fashion through pre-allocated, hugepage-backed memory pools, which minimize TLB misses and eliminate per-packet allocation. On each DistWalk node, both the DPDK and socket-based event loops execute in an interleaved fashion, allowing DPDK, TCP, and UDP clients to be served within the same worker thread. Outbound packets are batched into a per-connection transmit array and flushed at the end of each processing cycle to amortize the cost of the transmit burst call. The implementation also supports multi-threaded operation through Receive Side Scaling (RSS), which distributes incoming packets to different worker threads based on their source MAC address.
A comprehensive set of experiments was performed on two servers connected through a 10 Gigabit Ethernet link using Intel X710 network interface controllers. The systems were configured to maximize reproducibility by tuning hardware and software settings to reduce the randomness introduced by power management features and to minimize interference from operating system activity. DPDK achieved a median round-trip latency of 11.2 μs, approximately 2.7 times faster than UDP and 3 times faster than TCP in their best achievable configurations, obtained by varying CPU idle states and DistWalk parameters. Additional experiments evaluate the impact of different DPDK configurations, including SR-IOV Virtual Functions and virtual Ethernet pairs, NUMA core placement, and multi-queue scaling through RSS.
By integrating DPDK into DistWalk, developers who are considering the adoption of kernel-bypass techniques can evaluate their potential benefits early in the design process, before the application itself is built. The DPDK code path is fully compatible with the existing socket-based implementation, allowing both to coexist within the same binary and enabling cross-protocol forwarding scenarios where DPDK and TCP nodes operate in the same network topology. The experimental results show the benefits of DPDK in terms of round-trip latency and the impact that system-level configuration choices can have on the observed performance, providing useful guidance for designing low-latency deployments.
Riassunto (Italiano)
File
| Nome file | Dimensione |
|---|---|
La tesi non è consultabile. |
|