Range: Why SDN Platforms Should be General-Purpose

We're currently working on an update to our SDN book that expands the coverage of Network Virtualization to a full chapter. That led to some spirited debate among the authors, which in turn inspired the following thoughts on general-purpose platforms versus solutions optimized for a more specific use case.

In the early days of SDN, when I was still trying to decide what I thought of the idea, I had a conversation with my colleague Mothy Roscoe at SOSP, who said something that resonated with me: “SDN is about turning networking into a distributed systems problem.” I still find that simple statement to be one of the best descriptions of SDN that I’ve heard. A salient example would be the replacement of distance-vector and link-state routing protocols with a logically centralized controller running a consensus algorithm (e.g., Raft).

My experience building out SDN since that conversation suggests the transformation is much more than algorithmic. It is, in fact, a march towards making the network as a whole a Programmable Platform. (This contrasts with how networks have historically been treated as plumbing assembled from a collection of individually configured devices.) The transformation ranges from big ideas like implementing the network control plane as a scalable cloud service, to small ideas such as using modern messaging frameworks like gRPC and protobufs instead of creating yet another Request/Reply protocol, paired with its own stylized message encoding.

This storyline has replayed itself recently as Bruce and I add a chapter on Network Virtualization to our SDN book. Bruce is the expert, having been part of the Nicira team that turned Network Virtualization into a successful use case for SDN, whereas I mostly obsess over how Network Virtualization relates to the rest of the SDN software stack. (That full stack includes a suite of Control Apps, a centralized Network OS, and a set of switches with a P4-programmable forwarding pipeline and running a local Switch OS).

You’ll have to read the new chapter for the details, but what I’ve concluded is that there is a straightforward mapping between Network Virtualization and the software stack discussed throughout the book, although there is a significant difference too: Network Virtualization systems are purpose-built to support virtual networks, whereas the full SDN software stack is intended to be general-purpose. In short, one is a platform and the other is a solution. Admittedly, the OS-person inside me wants SDN to be about creating the next generation platform and not just a sequence of engineering choices resulting from a market-driven walk of the design space.

SDN is now a well-established set of architectural principles, and Network Virtualization adheres to those principles: There is a clear separation between control and data planes, with a centralized controller responsible for a distributed set of forwarding elements, and it leverages a completely programmable forwarding plane. The differences between Network Virtualization and the other use cases described in our book can all be explained as implementation choices, with the dependency on software switches rather than hardware switches being pivotal. 

But this brings me back to the question of whether, given that the software-based implementations underlying Network Virtualization evolved as a use-case-specific solution, there is a place for a general-purpose, use-case-agnostic SDN platform. In other words, is there value in a general-purpose Network OS, or does every use-case require its own purpose-built controller. I stand by my claim that the former will ultimately prevail, although the industry has many examples of general-purpose platforms chasing and (trying to subsume) purpose-built solutions indefinitely.

There is at least one proof point for the claim that a general-purpose Network OS can support multiple use cases: ONOS. Over the last few years, ONOS has been used as the foundation for SD-Fabric (Leaf-Spine Switching Fabric), SD-PON (Passive Optical Networks), and SD-RAN (Radio Access Networks). The first two of these run in production networks and the third is well on its way. (It has also been used to implement Network Virtualization, although ONF no longer supports that use case.) While it’s true that no one deploys Control Apps on a single Network OS for more than one use case at the same time, this does not disqualify ONOS as a general-purpose Network OS. Today, customers want only one solution at a time, but that just leads me to conclude that ONOS is the Network OS version of an Exokernel, a general-purpose Library OS that is put to use in a variety of domain-specific ways. 

My SOSP colleagues would appreciate the Exokernel analogy, and it helps support my claim that a general-purpose Network OS is viable. But that doesn’t mean it will win in the marketplace, at least not anytime soon. The industry is full of tussles between general-purpose versus purpose-built, although in the long-haul, common platforms usually emerge and the industry moves on to fight their competitive battles at some higher layer. It will be interesting to watch how this plays out as more SDN use cases make their way into the mainstream, but I’m confident in the value of generality (with credit to David Epstein, whose book “Range: Why Generalists Triumph in a Specialized World” inspired this week’s title).

Bruce got back to video production this week, with a new piece on Decentralized Finance (DeFi). This is part of a broader discussion about the potential to bring decentralization back to the Internet, which might even be a topic for another book. Our latest piece in The Register covers Service Mesh and its relationship to SDN. And we had occasion to thank the OVS team for their excellent documentation, which settled a debate about configuration versus control in SDN.