Can We Unify All These Edge Routers?
A central theme of the systems approach is to look for commonality across seemingly disparate systems. Doing so is the thought process behind the design of shared platforms and extensible frameworks. This week’s post applies that lens to edge routers, a market that has thrived on (arguably excessive) differentiation for 40+ years.
Edge routers have been an essential part of the Internet for decades, connecting access networks (enterprise LANs, mobile and broadband networks) to the global backbone. These devices often have cryptic names—MPLS VPN Provider Edge routers, S/P-Gateways in the case of mobile cellular networks, and Broadband Network Gateways (BNG) in the case of fiber networks—but they are, at their core, IP (L3) packet forwarders, sometimes augmented with features to support the business logic required by commercial access providers. But the world is changing and the form and function of the edge router is changing with it.
To account for modern cloud technology, especially the rush to the edge, we expect it to be less common to think in terms of edge networks connecting to backbone networks. Instead, we will think in terms of local edge clouds connecting to global hyperscalers. Devices will request service from an edge cloud, which will sometimes forward requests to remote clouds (see for example, Cloudflare Workers and Fly.io), continuing the trend of true end-to-end connections being the exception.
L3 connectivity is still there, of course, but it will increasingly be an implementation detail. And as this transition happens, the L3 data plane will be subsumed into the switching fabric of the edge cloud, with the associated control plane (whether IETF-specified, 3GPP-specified, BBF-specified, or proprietary) implemented by microservices running in the cloud (at the edge or centralized). That is, the edge router will increasingly be realized as a disaggregated collection of virtual functions rather than by a physical box, with control in the cloud and with the dataplane running on specialized infrastructure for speed and scale. In this sense, we see the paradigm introduced by SDN—logically centralized control with distributed forwarding—making its way to the edge.
SD-WANs are a current example of applying an SDN architecture to the edge, and more recently, cloud-delivered SASE (Secure Access Service Edge) services blend layers of security into the solution. But the pattern is much the same—L3 packet forwarding in the data plane coupled with a rich cloud-based control plane—with significant (functional) overlap with cloud native implementations of access gateways. And with most of today’s SD-WAN offerings being vertically integrated and proprietary, we would argue that the benefits of SDN (such as the ability for network operators to customize the functionality) are only partly delivered in these solutions today.
Once you stop thinking in terms of “edge routers as special devices” and start to view “routing as yet another edge function”, it’s a small step to realize that today’s diverse set of edge routers are all fundamentally the same, and that it is possible to build a generalized (and disaggregated) edge routing capability that accommodates them all. This function can be centrally orchestrated and deployed, with functional elements running in multiple edges where case-specific packet processing needs to take place.
Easier said than done, of course, but it strikes me as a likely outcome, and worth a little forethought. The key insight is that all the scenarios outlined above have a similar structure, with L3 forwarding in the data plane augmented with support for:
Secure tunnels → requiring encapsulation/decapsulation
Differentiated Service → requiring Q-in-Q tagging and class-based queues
Billing & Accounting → requiring per-flow counters
Policy Enforcement → requiring access control rules
Observability → requiring in-band network telemetry
And a microservice-based control plane that implements:
Authentication → triggering changes to data plane tunnels
Subscriber Management → triggering updates to per-flow counters and queues
Mobility & Routing → triggering forwarding changes according to resource availability
Session & Policy Management → triggering changes access control rules
Diagnostics & Anomaly Detection → triggering changes to in-band network telemetry
All of the data plane features can be realized in P4-programming forwarding pipelines (more on that in a moment), where the “triggering” relationship in the list of control functions helps us understand how to craft a converged control/data-plane interface—something that P4-Runtime (P4RT) supports.
An example of the generalized data plane already exists, and we describe it in our SDN book. It’s the fabric.p4 program that implements the forwarding pipeline for ONF’s SD-Fabric, which (a) implements L3 forwarding for the leaf-spine switching fabric you would find in an edge cloud, and (b) can be extended to connect different access network technologies (5G’s UPF and a PON-based BNG) to the Internet. The current implementation is a bit crude (it uses #ifdef), but the idea is clear: it’s possible to build an L3 forwarding pipeline that can be extended with access-specific “plugins”.
Popping up a level, one can imagine iterating on fabric.p4 until you have an extensible edge cloud data plane suitable for all of the use cases outlined above. The P4RT-generated interface could then support multiple control plane tenants, for example, allowing a 3GPP-defined core and an SD-WAN controller to independently set queue parameters, define encapsulation/decapsulation labels, install forwarding rules, and so on.
Converging on a shared data plane, but accepting that multiple control planes will co-exist, is a good starting point. But converging on the control plane is likely within reach as well, where we can expect a converged data plane to catalyze that process. In my mind, it’s primarily a matter of aligning incentives for the various domains. It’s already the case that the BBF is working towards a converged access network control plane that aligns with the 3GPP-defined mobile core, largely because Telcos have an incentive to make that happen. Another good example is Magma, which defines a unified control plane and a programmable data plane for both RAN-based and WiFi-based wireless networks. As enterprises start to roll out private 5G, the push to unify how they are managed will only increase. The SD-WAN use case is more of a wild card. On the one hand, SD-WAN is surprisingly similar to SD-RAN in the functionality it needs from an edge router. On the other hand, SD-WAN offerings so far have resisted disaggregation. Of course the same was true of Telco access networks, until recently.
If we accept that unification of edge routing is possible, a reasonable next question is: is it desirable? I would argue that the value will come first from disaggregation, as we have already seen in other environments such as the cloud data center. Once the control plane is disaggregated from the data plane, innovation can happen more easily in both, and the operators of these devices gain the ability to customize the functionality rather than just accepting the bundle that comes from the router vendor. And secondly, there is an opportunity to take a more holistic view of the edge, which offers the chance for applying consistent network policies that are independent of the access technology. But this is a topic for another post.
This post started out as a speculative brainstorming exercise, but ONF is starting to float the thesis that the edge router is ripe for an SDN makeover. We’d love to hear what you think. And in other news, we published a new blog post on Magma last week.
Thanks for reading Systems Approach! Subscribe for free to receive new posts and support our work.