Applying a Systems Lens to Mobile Networks
This week we continue our series of posts explaining the Systems Approach by way of examples. In this case, we’re returning to the mobile packet core and the difference between viewing it as a set of prescribed boxes versus developing a solution that allows the whole network to be managed as one system.
Mobile networks have been an area of interest for Larry and me for several years, as we have tried to apply the “Systems Approach” lens to the problem. Of late, we’ve been working on an updated book on 5G networks, while also writing blogs and other papers about the Magma mobile core. Most recently, a paper that I co-authored with several members of the Magma engineering team was accepted for publication at NSDI ‘23 (pre-print here). With mobile networks being quite complex and full of jargon, I find it’s easy to get lost in the weeds. But there are systems principles at work if we look hard enough, which is the aim of this post.
We have known for a long time that rigid layering can be an impediment to systems thinking in networking–something that was pointed out by David Clark in his wonderful foreword to the first edition of our textbook. We’ve written previously about the value of rethinking protocol layers. In the standard descriptions of mobile networks, there is an analogous tendency to use a box-level view with precise specifications of all the interfaces between boxes, as illustrated in the figure. It’s a bit more free-form than layering, but it still constrains our thinking. If we think that the main job of a mobile network is to implement all those boxes and the interfaces between them–which is what the 3GPP specifications tell us–then it is hard to take a system-level view of the mobile network.
These figures, which are based on 3GPP specs (and adapted from our 5G book) omit a lot of detail but you get the basic idea. The mobile core, whether LTE or 5G, is made up of a bunch of functions that talk to each other and to the RAN. Every line that runs between a pair of boxes in one of those figures has a set of defined interfaces–protocols that define how one box talks to another. Magma, however, does not conform to this box-level view. Consider the following figure (taken from the NSDI paper).
Clearly, this is a different set of boxes than the ones specified by 3GPP. Inside the box marked “RAN-specific protocols” are a bunch of protocol-terminating functions, which allow Magma to talk standard 3GPP protocols to the Radio Access Network. But all inter-module communication outside the RAN is based on gRPC. All the functions that are specified as boxes in either LTE or 5G (e.g., MME) are implemented in generic functions such as “access control and management”. One benefit of this approach is that it simplifies the support of LTE and 5G on a common platform, as it does not require the change in modularity suggested in the box-level view of 3GPP. It also provides the opportunity to include WiFi support (with telco-like capabilities such as per-user policies) in the same platform.
One especially high-impact consequence of the approach taken in Magma is that it allows a refactoring of the mobile core into centralized and distributed components. Note that there is a “central control and management” function at the top of the figure. Centralization of configuration is important in mobile networks so that network-wide configuration tasks can be performed in one place. For example, a new subscriber can be added to the network using the central management plane without the need to enter the subscriber details into multiple boxes. Everything else in this figure can be (and is) distributed out to the devices (known as Access Gateways) that sit next to the radio towers.
Scaling out a Magma deployment is achieved by adding more access gateways as radio towers are added, which naturally brings more data plane capacity and the ability to authenticate and connect more subscribers as the RAN grows. But perhaps even more importantly, by keeping the logic to authenticate devices and establish sessions local to the RAN, Magma avoids the need to backhaul control plane traffic to a central location. Running 3GPP protocols over long backhaul links has proven problematic when the reliability of the backhaul links is less than perfect, e.g., when satellite is used for backhaul. This is because the 3GPP protocols are in some cases quite sensitive to loss and latency. Loss or latency can cause connections to be dropped, which in turn forces mobile devices to repeat the process of attaching to the core. In practice, not all devices handle this elegantly, sometimes ending up in a “stuck” state.
In the Magma approach, the operations required to authenticate and attach a mobile device to the core can typically be completed using information cached locally, without any traffic crossing the backhaul. Even when Magma does need to pass information over a backhaul link (e.g., to obtain new configuration state from the central orchestrator), it does so using gRPC, which is designed to operate reliably in the face of unreliable or variable-latency links. So the ability to refactor the design provides a tangible benefit in reliability when backhaul links are imperfect.
In the absence of the refactoring, the solution to unreliable backhaul is to deploy the entire mobile core next to the RAN. This solves the problem of running signaling protocols over the backhaul, but now creates the problem of remote management of entire mobile cores; that is, we have lost the centralized management noted above. Furthermore, the number of systems to be managed is now increased (one set of mobile core devices per radio tower), and the footprint at each remote site is increased.
Magma can be viewed as another successful application of SDN principles. Not only does it employ a centralized control plane and distributed data plane, but it also distributes a chunk of control plane functionality out to the access gateways. This hierarchical approach has been used in other SDN systems (e.g., OVN).
The takeaway from all of this is not so much to argue that Magma is the best possible solution to building a mobile core (and indeed there are some downsides to the architecture discussed here). Rather, it illustrates how a systems view can open up opportunities for refactoring a system, in a way that focussing on box-level functionality does not. If you assume that the boundaries between components are fixed, and focus on optimizing within those boundaries–whether that is because of a 3GPP specification or because a protocol layering diagram told you where the layer boundaries were–then you miss out on opportunities to improve the entire system.
Following our earlier posts on QUIC, we noticed a paper on DNS over QUIC performance was presented at IMC last month, demonstrating the benefits of applying QUIC for the request-response paradigm of DNS: the ability to secure the transaction without paying for multiple RTTs of connection establishment. And later this week you can catch Bruce along with Motonori Shindo and Kentaro Ebisawa (translators of our SDN book into Japanese) talking about SDN in the Asia-Pacific region on the Networking Channel.