Last week we put the finishing touches on a new printing (and ebook) of “Software-Defined Networking: A Systems Approach”. The field continues to evolve at a rapid pace, and while we were adding chapters on network virtualization and access networks, another new development in the application of SDN to cloud data centers caught our attention: the rise of a new class of programmable processors known as “Infrastructure Processing Units (IPUs)”. That’s the topic of this week’s newsletter, and goes on the todo list for our next book update.
The recent announcements from Intel about Infrastructure Processing Units (IPUs) have prompted us to revisit the topic of how functionality is partitioned in a computing system. As we noted in our earlier post “The Accidental SmartNIC”, there is at least thirty years’ history of trying to decide how much one should offload from a general purpose CPU to a more specialized NIC, and an equally long tussle between more highly specialized offload engines versus more general-purpose ones. The IPU represents just the latest entry in a long series of general-purpose offload engines, and we’re now seeing quite a diverse set of options, not just from Intel but from others such as Nvidia and Pensando. These latter firms use the term DPU (data processing unit) but the consensus seems to be that these devices tackle the same class of problems. A handy continuum has been provided by Serve The Home which draws a line from fixed-function NICs through SmartNICs to DPUs and IPUs.
There are several interesting things going on here. The first is that there is an emerging consensus that the general purpose x86 (or ARM) server is no longer the best place to run the infrastructure functions of a cloud. By “infrastructure functions” we mean all the things that it takes to run a multi-tenant cloud that are not actually guest workloads: the hypervisor, network virtualization, storage services and so on. Whereas the server used to be the home of both guest workloads and infrastructure services, these functions are increasingly viewed as “overhead” that is only taking cycles away from guests. One oft-cited paper is Facebook’s “Accelerometer” study, which measures overhead within Facebook’s data centers as high as 80%, although this may not be generalizable to cloud providers. More plausibly, Google reported in 2015
“Datacenter tax” can comprise nearly 30% of cycles [...], which makes its constituents prime candidates for hardware specialization in future server systems-on-chips.
Amazon Web Services presumably saw the same issue of overheads cutting into the revenue-generating workloads, and started to use specialized hardware for infrastructure services when it acquired Annapurna Labs in 2015, laying the groundwork for its Nitro architecture. The impact was to move almost all infrastructure services out of the servers, leaving them free to run guest workloads and little else.
Once you decide to move a function out of the general-purpose CPU complex into some sort of offload engine, the question is how to retain the appropriate level of flexibility. These offloaded functions are not static, so putting them into fixed-function hardware would be a short-sighted move. This is why we have seen NICs move in recent years from fixed-function offloads such as TCP segmentation to the more flexible architecture of SmartNICs. So the goal is to build an offload system that is more optimized for the offloaded services than a general-purpose CPU, yet still programmable enough to support innovation and evolution of offloaded services.
Intel’s IPU family contains several entrants, which take different approaches to delivering that flexibility, including both FPGA- and ASIC-based versions. The Mount Evans ASIC is particularly interesting as it includes both ARM CPU cores and programmable networking hardware (from the Barefoot Networks team) that is P4-programmable. This is a subject dear to our hearts here at Systems Approach, as the P4 toolchain is central to much of the technology that we wrote about in our SDN book.
Putting a P4-programmable switch in an IPU/DPU makes lots of sense, since the networking functions that are likely to be offloaded include those of a virtual switch. And one thing we learned at Nicira and later in the NSX team at VMware was that if you want to move the vswitch to an offload engine, that engine needs to be fully programmable. If a NIC is insufficiently general to implement the whole vswitch, you can only move some subset of the vswitch functionality to the offload engine. Even if you could move 90% of the functionality, that remaining 10% that you have to keep doing in the CPU is likely to be a bottleneck. So a P4-programmable offload engine based on PISA (Protocol Independent Switching Architecture) provides the required level of flexibility and programmability to make offloading of the whole vswitch possible. Combine this with some other programmable hardware (such as ARM cores) and you can see how the entire set of infrastructure functions, including the hypervisor, storage virtualization, etc., can be offloaded to the IPU.
One way to view the latest generation of DPUs/IPUs is that the efforts of the SDN movement to create more programmable switches has enabled innovation in a new space. SDN initially promised to drive control plane innovation by decoupling the switching hardware from the software that controlled it. Network virtualization was one of the first applications of SDN to take off, with the separation of control and data planes and highly flexible software switches enabling networks to be created entirely in software (on top of a hardware-based underlay). PISA and P4 led to a more flexible form of switching hardware and a new way to define the hardware-software interface (improving on the earlier efforts of OpenFlow). All of these threads–control plane innovation, network virtualization, and flexible, programmable switch hardware–are now being brought together in the creation of IPUs and DPUs.
We can also view the development of IPUs/DPUs as a continuation of the trend in which processors are both highly flexible and yet specialized for certain tasks. GPUs and TPUs are really flexible, being used for everything from crypto-mining to machine learning to graphics processing, but are nevertheless quite specialized compared to CPUs. (GPUs were even used for packet processing in an era before we had PISA and P4.) DPUs and IPUs now seem well established as a new category of highly programmable devices that are optimized for a specific set of tasks that need to be performed in a modern cloud data center. With that greater specialization comes greater efficiency, while flexibility remains high enough to support future innovation.
While we only push out a new print and ebook version of our books when there is a large enough increment of new content, you can always get the latest content from the book site or GitHub. We’re not sure if we have enough material for a book on quantum computing yet, but Bruce had another crack at explaining the topic in a seminar that’s now on YouTube. We’re up to a total of five books in the Systems Approach series now; details here.