TCP: The “P” is for Platform
A Case Study in System Evolution
With the recent publication of a new specification of TCP (RFC 9293) we found ourselves reflecting on how TCP has evolved over the decades, and wondered what the future holds.
The original specification for the Transmission Control Protocol (TCP) was published as RFC 793 in 1981. TCP has proven to be resilient over the intervening forty years, but hardly static. There have been so many extensions and implementation notes that it’s hard to keep track of all of them, so in case you missed it, RFC 9293 was just published to address that problem. In a major milestone, RFC 793 is now officially Obsolete.
For those of us that have been around for most or all of those years, reading RFC 9293 is a walk down memory lane. From the silly window syndrome to slow start, fast retransmit, duplicate ACKs, window scaling, and much more, the history of TCP is a remarkable case study in system evolution. Proposing a clean-slate redesign is a popular pitch for researchers, and the idea of a fresh start has a certain appeal, but there is so much experience codified in TCP that any replacement has a very high bar to clear.
Taking a step back from the details and looking at the “system evolution” story, several things jump out at me. For starters, I would hate to have to implement TCP from scratch based solely on a reading of RFC 9293 (and the many RFCs it includes by reference). It’s an open question as to whether doing so is even possible, since for many years it’s been the case that TCP has been defined by its reference implementation; the RFCs are more descriptive than prescriptive. That’s not a criticism. From the beginning, the IETF has favored protocol definitions based on implementations, where RFC 9293 is the latest update of that iterative process.
If the implementation drives the specification, then which implementation is authoritative? The answer has been the dominant open source implementation of the day. This was originally the Berkeley Software Distribution (BSD) implementation of Unix. BSD and its descendants continue to this day (notably as FreeBSD), but it was eventually overtaken by Linux, in the early 2000s, as the de facto open source, Unix-based OS. (It is also the case that many of today’s commercial OSes are derived from either BSD or Linux.)
But the Linux version of TCP is more than a reference implementation. You could make the argument that the Linux kernel provides a platform for evolving TCP. While reading RFC 9293 I had a vague recollection of an RFC published during the heyday of TCP extensions entitled “TCP Extensions Considered Harmful”, so I Googled it, and it turns out to be RFC 1263. (It also turns out I was a co-author; I can only wonder what else I might have written and long since forgotten about.) The RFC describes general mechanisms for evolving TCP that would be more rational than TCP options (essentially by proposing what would today be called semantic versioning), but one takeaway that seems relevant today is a concluding statement:
Because of lack of any alternatives, TCP has become a de-facto platform for implementing other protocols. It provides a vague standard interface with the kernel, it runs on many machines, and has a well defined distribution path.
This gets us into a murky distinction—is it TCP that serves as a platform for evolving transport functionality or is it the Linux networking subsystem—but that’s a distinction without a difference. The two are effectively one-in-the-same, with header options serving as one method for adding “transport plug-ins” to the kernel. (Here I’m using a simple definition of a platform as a tool or framework that lets us add new functionality over time.)
Congestion control is another example of how Linux TCP serves as an extensible framework. All the algorithms described in our book are available (and can be optionally activated) in the Linux kernel, where, like TCP itself, the implementation is the authoritative definition of each of those algorithms. As a consequence, an API has emerged for congestion control, providing a well-defined way to continually adapt TCP. And with a nod to feature velocity, Linux now provides a convenient and safe way to dynamically inject new congestion control logic into the kernel by supporting this API in the extended Berkeley Packet Filter (eBPF). This simplifies the task of experimenting with new algorithms or tweaking existing algorithms, side-stepping the hurdle of waiting for the relevant Linux kernel to be deployed. It also makes it easy to customize the congestion control algorithm used on a per-flow basis, as well as explicitly exposing the device-level ingress/egress queues to the decision-making process. (This is how CoDel and ECN, for example, are supported in the Linux kernel.)
That’s the good news, but as a case study of how to most effectively evolve software, the results are mixed. For example, as APIs go, the Linux TCP congestion control API is not particularly intuitive and its only documentation is in the code. A second complication is that while this API makes it possible to substitute different algorithms into TCP, an ideal interface would also support reuse: making it possible for different transport protocols (e.g., SCTP, QUIC) to reuse existing algorithms rather than have to maintain a separate/parallel implementation. A third observation is while Linux has done an excellent job of making the file system replaceable (and it can now be done in a safe and high-performance way) the approach does not extend to TCP, which has too many tentacles throughout the kernel. All of this, coupled with the limitations of TCP options called out in RFC 1263, might lead us to conclude that TCP evolved over the years in spite of itself. At the very least, we are left wondering about lost opportunities.
In the meantime, the cloud has grown up around TCP, with an emphasis on improving feature velocity. Protocol standards (above the physical level) become less relevant once you have the ability to dictate what code runs on both ends of a connection, which the cloud and modern apps are well-positioned to exploit. One has to wonder if TCP as we know it today will fade into the background, not because of a clean-slate replacement, but because it is overtaken by cloud software management practices. The adoption of QUIC would seem to be a good test of this hypothesis: it both provides value that TCP does not (a well-designed and efficient Request/Reply mechanism) and a modern approach to continuously integrating and continuously deploying new features.
One plausible outcome is that the network as a whole becomes a programmable platform, improving feature velocity for everything from the transport protocols running on endpoints to the forwarding pipeline running in network switches. And the more complete and agile that platform becomes, the more likely it is that RFC-defined specifications will one day become obsolete. As we said in RFC 1263:
We hope to be able to design and distribute protocols in less time than it takes a standards committee to agree on an acceptable meeting time.
Perhaps we are getting closer to realizing that goal.
Maybe it’s a coincidence but Isovalent (creators of Cilium and proponents of eBPF) closed a funding round just weeks after we wrote about Cilium. And the idea of getting service meshes without sidecars got another boost with the announcement of Istio Ambient Mesh. In other news, our article on unifying edge routing appeared in The Register.
Thanks for reading Systems Approach! Subscribe for free to receive new posts and support my work.