Intel DPDK Summit 2016 San Jose

The work Understanding the Performance of DPDK as a Computer Architect‚Äč from ACANETS team was selected as one of the 30-minute talks on Intel DPDK Summit 2016 in San Jose. Check out the event, our presentation slides, and the video.

While high-performance hardware and virtual machines are deployed within the data centers and the cloud, Open vSwitch (OVS) [1] based software switching and data transfers are challenged to keep up with the network line rate. Data Plane Development Kit (DPDK) [2] for OVS, which is a set of user space libraries with optimized real and virtual NIC drivers, aims to accelerate the packet forwarding process at line rate. In our experiments, the performance evaluation results demonstrate OVS-DPDK can achieve a maximum of 8x throughput increase comparing with vanilla OVS. To understand the performance difference, in this work, we leverage the advanced computer architecture performance profiling tools such as Intel VTune Amplifier [3] and Linux perf [4] to investigate in detail what system architecture parameters are affected by OVS-DPDK for achieving the speedups. We present the experiment numbers in this work, and evaluate the causes of the results from a computer architect point of view.

We conduct two setups of experiment: 1) two VMs connects over OVS or OVS-DPDK on a single host; 2) a single VM connects to a physical host over OVS or OVS-DPDK. In the first experiment, the profiling results show that OVS-DPDK system can greatly benefit from the pipeline processing by showing the Cycle Per Instruction (CPI) close to the number of processing threads. Context switching is decreased in the order of 10 on the OVS-DPDK system since DPDK libraries is applied to bypass OS kernel invocation. More interestingly, cache and TLB behave exceptionally on the the OVS-DPDK system: we observe a 103x cache L1 miss rate decrease and a 9x TLB miss rate decrease from the system with OVS-DPDK than with the vanilla OVS. However, with the second experiment setup, we observe an order of 22x higher context switches on OVS-DPDK, which lead to hundreds of times decrease of the system throughput (to only 4.72 Mbps on a 10Gbps link). We plan to further analyze the performance of OVS-DPDK in this scenario to provide insights for improvements.