Read Section 8
Sometimes our posts are inspired by recent developments, such as the launch of a new family of IPUs, but sometimes we reflect back on events from much earlier in our careers that continue to shape our thinking today. This week’s newsletter is very much in the second category, going all the way back to our first experiences with operating systems, which taught us lessons about innovation that remain relevant today.
If (like me) you were a CS graduate student who cut your teeth on Berkeley Unix—complete with the first open source implementation of TCP/IP—you know Section 8 as the cryptic System Maintenance Commands section of the Unix User’s Manual. (Not to be confused with the Section 8 that was a recurring theme in Catch-22 and M*A*S*H.) It was obvious (to me) that this concluding section warranted a closer look because the introduction warned: “Information in this section is not of great interest to most users.” Judging by my taste in research problems over the years, reading Section 8 turned out to be a pretty good investment.
But before getting to Section 8, you first learned about the rest of Unix, where you discovered how empowering it is to be able to build new Internet applications. Anyone interested in how targeted investments in open source software, coupled with affordable hardware, can spur innovation should study the role of BSD (Berkeley Software Distribution) in the success of the Internet. It’s easy to assume the Internet as we know it today was inevitable, but at the time BSD Unix happened, it was not at all clear the incumbent Telcos could be disrupted. We’ve commented on the power of APIs many times (e.g., here), but the impact of the Socket API (Section 2) on enabling innovation on top of the Internet cannot be overstated. With that stable fixed-point in the architecture, a thousand flowers bloomed… and we have (thankfully) moved well beyond the telco vision of B-ISDN.
Section 8 was the second half of the story. In addition to describing how to shutdown and boot a system, it defined the process for managing long-running daemon processes, the Unix equivalent of today’s microservices. If you had responsibility for configuring and managing system services on your department’s server, which came with superuser privilege, you needed not only to know how to program Unix, you also needed to understand the ins-and-outs of operating Unix. As a grad student, the lessons I learned while being responsible for sendmail(8) on a live multi-user system were immeasurable. Every mistake instantly sent the faculty into the hallway looking for the responsible idiot. (In my defense, this was at a time when email addresses contained percent (%) and bang (!) operators in addition to at (@), and their precedence was not well-defined.)
BSD also provided me with an early lesson in the power of having many eyeballs on the lookout for security vulnerabilities. Looking at the source code for Sendmail, for example, revealed a backdoor, whereby one could Telnet to port 25, type the magic “wizard” command, and fork a root shell. So I made my counterparts at Berkeley and other Universities aware of that vulnerability by doing exactly that. Others probably did too, but it was a different time, and the lesson didn’t initially take hold. With debugging convenience and a naive sense of community trumping security, the backdoor remained open by default in Sendmail until the Morris Worm used it as one of its attack vectors a couple years later.1
Gaining this sort of practical experience is obviously valuable if your plan is to become a system administrator, but it has long been my experience that an opportunity to manage systems that deliver services to actual users is a great source of systems research problems, as well as fertile ground for platform innovations. My PhD dissertation, born out of frustration with sendmail, turned out to be on naming and addressing; later, real-world experience running a CDN on PlanetLab generated a sequence of systems papers (as Vivek Pai and I reported in a 2007 CACM article); and most recently, our experience operating an edge cloud has led to an appreciation for the state management problem inherent in DevOps (as well as our latest book). And my experience is far from unique: many of the cloud tools we take for granted today—Kubernetes is a great example—started as someone’s response to an operational point-of-pain.
This all leads me to believe that an open operations platform (as documented in Section 8) is just as important as an open programming platform (as documented in Section 2) for democratizing innovation. Would BSD Unix have had the same impact in the 1980s and 90s if the University Computer Center had supported it rather than the CS department letting its grad students take ownership of the operations problem? We can ask a similar question today. The value of being able to create new cloud applications is abundantly clear, but is there also value in having open access to the tools used to manage and operate the cloud (rather than delegating the latter to the cloud providers)?
To me, the answer is clearly yes. It comes down to the virtuous cycle of solutions being enabled by platforms on the one hand, and platforms being reshaped with the experience of usage on the other. Stable platforms with well-defined APIs surely allow a thousand flowers to bloom, but eventually, disruptive refactoring of those platforms is what leads to the next round of innovation. Software-Defined Networking is a famous example of disruptive refactoring, but it only works if we have sufficiently sophisticated tooling to assemble all the components into a coherent—and manageable—system. Orchestration and Lifecycle Management have become the dominant operational issues because (a) many smaller parts have to be assembled, and (b) these individual parts are expected to change more frequently. They are essential parts of what we might call the Cloud OS.
Certainly not everyone who writes programs—whether it’s running on a personal server or in the cloud—also needs to know how to keep that program running 24/7, but from the perspective of empowering more people to participate in the creation of new systems, the operations platform needs to be kept open and accessible to anyone who wants to invest the time in it. Fortunately, there are a plethora of open source components available today that can be used to operate and lifecycle-manage a cloud. We’ve documented a roadmap for using them in Edge Cloud Operations: A Systems Approach (a sort of “Section 8” for the Cloud). We’re hoping there are still a few people who are just crazy enough to give it a try.
One highlight from recent weeks: following our recents posts on IPUs, Nick McKeown reached out to us and agreed to sit down for a chat about both IPUs (some of which are developed by his team at Intel) and how the development of SDN led us to where we are today. You can find that talk on our YouTube channel.
For a more intriguing account of security in this unique time, you may want to read Cliff Stoll’s “The Cuckoo’s Egg”.