OnRamp: Incrementally Mastering Complexity
We have been on an open source kick recently, and that continues with today’s look at the challenges involved in helping users come up-to-speed on an open source project. The challenge is especially thorny when those users are exploring an unfamiliar topic and the software is to be deployed as a scalable cloud service.
Here’s a problem I’ve been struggling with for the last several months: How to make a complex system assembled from dozens of components easily consumable by a wide range of users. The system is the Aether edge cloud, which serves as the blueprint for our Private 5G book. The “dozens of components” include a long list of Cloud Native tools plus an open source implementation of 4G/5G Mobile Core and RAN. The “wide range of users” includes students who want hands-on experience with concepts they’re seeing for the first time, researchers who want to investigate narrow problems in the larger 5G/edge space, and organizations that want to deploy and operate Private 5G in everything from lab trials to commercial offerings. I’ll get to my definition of “easily consumable” in a moment, which is at the heart of why I find this to be an interesting problem in general.
You could make the case that this is a self-inflicted challenge—Aether is available in GitHub and anyone with the technical chops is free to do with it what they will—but it follows from the goal of realizing the know-how/educational value of open source software I talked about in my previous post. To provide a little background, a multi-site deployment of Aether has been running as a managed cloud service since 2020, in support of the Pronto Research Project. But that deployment depends on an ops team with significant insider knowledge about Aether’s engineering details. It has proven difficult for others to reproduce that know-how and bring up their own Aether deployments.
Offering “Aether-as-a-Service” comes with an ongoing obligation to provide operational support, but it is easier than releasing “Aether-as-Software”, complete with the machinery needed for others to deploy and operate Aether as their own service. This is well understood by anyone that has attempted to take that step, and familiar to me from my experience operating PlanetLab (but never packaging it in a way that made it easy for others to replicate). In the case of PlanetLab, the biggest value was in the network effect—having access to compute resources contributed by others all over the world—so operating a multi-tenant service made sense. In contrast, the biggest value for Aether is having full ownership of your deployment.
A minimal version of Aether is also available in a package called Aether-in-a-Box (AiaB). Originally designed to give developers a streamlined modify-build-test loop they could run on their laptops, AiaB serves as a good way to get started. But there is a considerable gap between what it provides and an operational 5G-enabled edge cloud deployed in a particular environment. Or as I’ve been known to say on family road-trips: You can’t get there from here.
As a “getting started” package, AiaB is straightforward to use. You set up a VM (on your laptop or in the cloud), clone the AiaB repo, and type make 5g-test. Doing so installs Kubernetes, brings up the Mobile Core, runs an emulated 5G RAN workload, and prints out the results. There are more intermediate Make targets users can explore, which helps from a learning perspective, but ultimately AiaB favors easy-of-use for canned configurations over ease-of-transition to more complex configurations that have been customized for a particular use case. That is the crux of the problem we try to address with Aether OnRamp: start with something as easy as AiaB, and then incrementally expose (and document) the information needed to take ownership of Aether in its full glory: a multi-site / hybrid cloud / managed service. OnRamp tries to do this in a way that supports more than one off-ramp, so that users that want to focus on a particular subsystem need not pay too steep of an up-front price.
The general approach OnRamp takes is to draw crisp lines between different stages (e.g., development, integration, deployment, operations) and layers (infrastructure, services, applications, traffic sources), and then introduce them incrementally, coupled with documentation that calls out the relevant “decision points” and “configuration parameters”. We include a first version of OnRamp with our Private 5G book, with the caveat that there is still much work to be done. That version relies heavily on Makefiles, which means the key configuration parameters are exposed as ad hoc variables. This summer I have been working with Bilal Saleem and Muhammad Shahbaz at Purdue to transition OnRamp to use Ansible. This started as an effort to scale Aether from a single node to a multi-node cluster, but Ansible is proving to be a better way to manage the overall stepwise configuration strategy.
Although conceptually straightforward, this general approach is not easy to execute in practice. I have two takeaways from the experience so far. One is the importance of using tools that are powerful enough to get the job done, but not so heavy-weight as to obscure (abstract away) the know-how you’re trying to impart. This is important for two reasons: (1) so the novice can easily see what’s going on under the covers, and (2) so the expert can easily unwind engineering choices and apply different tooling. Ansible seems to be a good compromise in that it explicitly exposes the playbook, and provides a well-defined way to specify the relevant variables. There’s still a syntactic gap between an Ansible task and the corresponding kubectl call, but that gap is easy to document.
The second difficulty is to identify those relevant variables, which are often buried in a sea of configuration parameters. This process is complicated by developer bias—my term for codifying config variables (for example Helm Chart values files) that perfectly suit the developer’s needs, but obfuscate where a user trying to deploy the code needs to take ownership, so as to customize the parameters for their particular scenario. The only way I know to untangle this obfuscation is to try to document it, not just in a technical sense (i.e., most parameters are defined somewhere, if you know where to look), but rather, in an intuitive way that will make sense to a non-expert. Helping the non-expert understand what they can safely ignore goes hand-in-hand with identifying what they need to know.
Whether OnRamp hits the mark is a matter of judgment, where the proof of the pudding is in the eating. You can try our first attempt at helping users consume Aether today, and we expect to announce the availability of OnRamp-v2 by the end of the summer. Watch this site for an announcement. Finally, I would be remiss if I didn’t mention all the excellent hand-on tutorials created by other open source projects; the Kubernetes Tutorial is a great example. Like what we’re trying to do with Aether, the goal is to lower the barrier for people being able to consume open source software. My personal spin on that objective is to use such hands-on experience as a way to teach the underlying principles and concepts, and to enable research that builds on those concepts.
With Bruce in Scotland presenting his lecture at Edinburgh University’s celebration of 60 years of Computer Science and Artificial Intelligence, it’s up to his apprentice to see to it that this week’s post makes it to your mailbox. Confidence level: 7-out-of-10.
For those with a vested interest in SIGCOMM, you may want to read (and comment on) the community’s effort to establish a consensus about how its flagship conference should evolve.