Is Disaggregation Working?

Aug 29, 2022

When we posted our article about modernizing the edge, in which we commented on the benefits of disaggregation, one wag on Twitter asked “Does this mean disaggregation has been successful somewhere?” This week we set out to answer that question.

Our most recent post focussed on the opportunity to modernize and unify edge routing by applying SDN principles to the problem. In discussing the benefits of such an approach, I observed that disaggregation of networking has already happened in the data center, and that it would be good to bring those benefits to the edge. But that one tweet got me wondering: do I need to get out of my bubble? After all, in the last 18 months I’ve spent more time writing about the disaggregation of networking–particularly in our SDN book–than any other topic. Our books are full of examples of disaggregated networking, including plenty of code that you can use to build and run your own disaggregated network or edge cloud. But does that mean that disaggregation has been a success?

Before I try to answer that question, let’s look at some definitions. Disaggregation refers to the separation of components, and I have seen it used in two related but different ways in networking. A networking device consists of a hardware platform and some set of software to control that hardware, and historically these have been bundled together by commercial network equipment vendors. Separating out the hardware from the software is one form of disaggregation. Secondly, the traditional way to implement networking equipment was to run the control plane and data plane in the same boxes. SDN is one way to separate out the control plane from the data plane (but by no means the only way). So that too might be viewed as a sort of disaggregation, although I’m not so keen on that definition for reasons I’ll go into below. Add the fact that, in many cases, the data plane is implemented in hardware, while the control plane is almost always implemented in software, and you can see why there might be a tendency to conflate these two usages of the term disaggregation.

In an early (2011) talk on SDN, Nick McKeown made the analogy between the disaggregation that happened decades earlier in the computer industry and his vision for how the networking world might also embrace disaggregation (see figures adapted from Nick’s presentation).

Picture of a mainframe computer replaced by microprocessors, operating systems, and apps, separated by open interfaces — Disaggregation of the mainframe

Just as mainframes were disaggregated with the rise of x86 servers and independent operating systems, networks might be disaggregated into switch hardware, control planes, and apps, all developed independently. Even though this is part of a talk about SDN, there is nothing here about control planes being moved out of the boxes that do the forwarding, or of centralizing the control plane. These are important aspects of SDN–I would argue that SDN’s chief value has been the power of centralized control–but they are not a requirement for disaggregation. What is required, however, is open interfaces. There has always been some sort of internal interface between control and data planes (e.g., a private IPC mechanism) and the key step of disaggregation is to open up that interface, independent of where the control plane actually runs.

Picture of a router broken down into switching chips, network OS, and apps, separated by open interfaces — Disaggregation of the router

What I find valuable about this is that the open interface between switching hardware and network OS opens the door for innovation both above and below the switch interface. The set of companies that can build both a hardware switch and the necessary network OS, including routing protocols, has historically been countable on the fingers of one hand. The open interface lets some companies with hardware expertise focus on the switching hardware while other companies with networking software expertise focus above the interface. One of the early entrants to take advantage of de facto standards for that interface was Cumulus, whose Linux-based network OS can be deployed onto switching hardware from about a dozen vendors spanning multiple families of switching ASICs. While it’s hard to know exactly how successful they have been (Cumulus is now a unit inside Mellanox) they had some enterprise success, e.g. at JPMC.

One of the most well-known examples of an open interface between the switching hardware and network OS is Microsoft’s SAI, which forms the open interface underpinning the SONiC network OS. Over 100 switch models are on the supported hardware list for SONiC, again spanning a wide range of switching silicon and vendors. As an open source project, it opens up the network OS to contributors well beyond the traditional routing vendors. (One interesting example of this is the availability of a P4 interface in SONiC via PINS.) Given the size of Microsoft’s global cloud footprint where SONiC runs, you could argue that SONiC alone proves the value of disaggregation. And of course the other hyperscale cloud operators have also made extensive use of disaggregation; see the latest Jupiter paper from Google for an example. The ability to customize their control planes in a way that was impossible in the old bundled-from-your-router-vendor model is a frequently-cited benefit of disaggregation, including the ability to avoid bugs caused by features that are not needed in these cloud data centers. A late 2020 report on data center switching suggested that as the enterprise market dropped in the pandemic and more workloads moved to the cloud, disaggregated switches increased their market share to 11.7% (behind only Arista and Cisco).

It’s fair to ask whether disaggregation is having an impact outside the hyperscalers. It’s hard for me to tell how much enterprise adoption is taking place–I’d guess that traditional vendors continue to dominate. Familiarity with the operations of Cisco routers in particular is a hard barrier to overcome in enterprise. The area where there does seem to be enthusiasm for disaggregation is the telcos. AT&T has been one of the most vocal in this space, pushing for open interfaces for their network equipment via the Open Compute Project. Access networks in particular seem to be a sweet spot for the telcos to embrace both disaggregation and SDN.

So has disaggregation worked? The answer has to be “yes” if you look at hyperscaler clouds. It appears to be taking off in the telco space too. For enterprise, the results are mixed at best and I’d say we will have to wait and see. Interestingly, SDN has made great progress in enterprise, but mostly without disaggregation; proprietary solutions like SD-WAN and network virtualization have delivered value to enterprises without providing the open interfaces that disaggregation requires. Whether this changes will depend on whether enterprises see the same need for customization and innovation that the hyperscalers clearly demand. The rise of edge clouds–which we are predicting–could be one factor that drives enterprises in this direction.

Last week there was a flurry of coverage about the insecurity of in-app browsers thanks to some great work by Felix Krause, and Bruce provided commentary for an article in ABC News (the Australian one). On broader systems approach topics, Larry and Bruce sat down to talk with Robbie Mitchell at APNIC and the resulting podcast will appear in a few weeks, but meanwhile the APNIC podcasts are full of interesting topics of relevance to networking and systems people.

Systems Approach

Discussion about this post