The Accidental SmartNIC

How flexibility drives innovation

Mar 15, 2021

This week we’re looking at the rise of SmartNICs, which actually have a history going back more than thirty years. SmartNICs are all over the place these days, enabling the AWS “Nitro” System, providing the foundation for VMware’s Project Monterey, and shipping from a wide range of hardware vendors. There are few topics that better exemplify the power of systems thinking.

The moment when the current generation of SmartNICs really captured my attention was during a demo at VMworld 2019. At the time, ESXi was formally supported on x86 processors only, but there had been a skunkworks project to run ESXi on ARM for several years. Since most SmartNICs have an ARM processor, it was now possible to run ESXi on it. I do remember thinking “just because you can do something doesn’t mean you should” but it made for a fun demo.

This certainly wasn’t my first exposure to SmartNICs. As a member of the networking team at VMware, I was periodically visited by SmartNIC vendors who wanted to offer their hardware as a way to improve the performance of virtual switching. And AWS had been subtly incorporating them into their EC2 infrastructure since about 2014 (depending on exactly how a SmartNIC is defined). But as I looked more closely at SmartNIC architectures, I realized that I had actually been involved in an earlier incarnation of the technology in the 1990s–not that we called them SmartNICs then. Even the term NIC was not yet standard terminology. Below is a slightly prettified diagram from a paper I published in SIGCOMM in 1991.

SmartNIC Block Diagram — A SmartNIC functional block diagram from a bygone era (c. 1991)

If you compare this to the block diagram of a current generation SmartNIC (e.g., here), you will see some pretty remarkable similarities. Of course you need to connect to a host bus on one side; that’s likely to be PCIe today. (That choice was much less obvious in 1990.) And you need the necessary physical and link layer hardware to connect to your network of choice; today that’s invariably some flavor of Ethernet, whereas in 1990 it still seemed possible that ATM would take off as a local area network technology (it didn’t). In between the host and the network, there’s one or more CPUs, and some programmable hardware (FPGA). It’s the programmability of the system, delivered by the CPU and FPGA, that makes it “Smart”.

To be clear, I definitely didn’t invent the SmartNIC. The earliest example that I can find was described by Kanakia and Cheriton in 1998. Other researchers around this time took a similar approach. There was a reason we gravitated towards designs that were relatively expensive but highly programmable: we didn’t yet know which functions belonged on the NIC. So we kept our options open. This gave us the ability to move functions between the host and the NIC, to experiment with new protocols, and to explore new ways of delivering data efficiently to applications. This was essentially my introduction to the systems approach to networking: building a system to experiment with various ways of partitioning functionality among components, and seeking an approach that would address end-to-end concerns such as reliability and performance. I was fortunate to be influenced in the design of my “SmartNIC'' by David Clark, the “architect of the Internet” and co-author of the end-to-end argument, and this work also led to my collaboration with Larry Peterson.

The 1990s, in retrospect, was a time when a lot of questions about networking were still up for debate. As we tried to achieve the then-crazy goal of delivering a gigabit per second to a single application, there was a widespread concern that TCP/IP would not be up to the task. Perhaps we needed completely new transport protocols, or a new network layer (e.g., ATM). Perhaps transport protocols were so performance-intensive that they needed to be offloaded to the NIC. With so many open questions, it made sense to design NICs with maximum flexibility. Hence the inclusion of a pair of CPUs and some of the largest FPGAs available at the time.

By the 2000s, many of these networking questions were addressed by the overwhelming success of the Internet. TCP/IP (with Ethernet as the link layer) became the dominant networking protocol stack. There turned out to be no problem getting these protocols from the 1970s to operate at tens of gigabits per second. Moore’s law helped, as did the rise of switched Ethernet and advances in optical transmission. As the protocols stabilised, there wasn’t so much need for flexibility in the NIC, and hence fixed-function NICs became the norm.

Jump ahead another ten years, however, and fixed-function NICs became a liability as new approaches to networking emerged. By 2010 NICs frequently included some amount of “TCP offload”, echoing one of the concerns raised in the 1990s. These offloads left hosts free to transfer large chunks of data to or from the NIC while the NIC added the TCP headers to segments on transmit and parsed them on receipt. This was a performance win, unless you wanted anything other than a simple TCP/IP header on your packets, such as an extra encapsulation header to support network virtualization. The optimization of performance for the common case turned into a huge handicap for innovative approaches that couldn’t leverage that optimization. (My colleagues at Nicira found some creative solutions to this problem, ultimately leading to the GENEVE encapsulation standard).

As networking became more dynamic with the rise of SDN and network virtualization (and the parallel rise of software-defined storage) it started to become clear that once again the functions of a NIC could not be neatly tied down and committed to fixed-function hardware. And so the pendulum swung back to where it had been in the 1990s, where the demand for flexibility warranted NIC designs that could be updated at software speeds–leading to what we might call the second era of SmartNICs. This time, it’s the need to efficiently support network virtualization, security features, and flexible approaches to storage that demands highly capable NICs. While all these functions can be supported on x86 servers, it’s increasingly more cost-effective to move them onto a SmartNIC that is optimized for those tasks and still flexible enough to support rapid innovation in cloud services. This is why you see projects like AWS Nitro, Azure Accelerated Networking, and VMware’s Project Monterey all moving functions that you expect to see in a hypervisor to the new generation of SmartNICs.

Why did I title this post “The Accidental SmartNIC”? Because I wasn’t trying to make a SmartNIC, there was just so much uncertainty about the right way to partition our system that I needed a high degree of flexibility in my design. (It’s also a nod to the excellent film “The Accidental Tourist”.) Determining how best to distribute functionality across components is a core aspect of the systems approach. Today’s SmartNICs exemplify that approach by allowing complex functions to be moved from servers to NICs, meeting the goals of high performance, rapid innovation, and cost-effective use of resources. Building a platform that supports innovation is a common goal in systems research and we see that playing out today as SmartNICs take off in the cloud.

For more on the Systems Approach, see this article. And if you want a deep dive into how Software-Defined Networking has adopted this approach, you can take a look at our book online or via any number of bookstores.

Francis Turner

Feb 10, 2022

Madge Networks sold Token Ring smart NICs in the early 1990s. We had a significant chunk of the Novell and Microsoft LAN protocols on the card in 1992/93 and TCP/IP was put on a year or so later. The main driver for this was freeing up MS/DOS system memory but it probably also helped with performance, particularly with some of the weird laptop connectivity mechanisms that were used before PCMCIA became a thing

Expand full comment

Jim Cownie

Nov 25, 2021

I would argue that Smart-NICs go back earlier than 1998. At Meiko we had a SPARC processor in our NIC (which was cache-coherent with the main CPU and performed safe RDMA from user-space with no need for a system call, or page locking) in the early 1990s (LLNL had a 256 node machine in 1994) See https://doi.org/10.1016/0167-8191(94)90025-6

1 reply

1 more comment...

Systems Approach

The Accidental SmartNIC

How flexibility drives innovation

Discussion about this post