TCP: The “P” is for Platform

A Case Study in System Evolution

Sep 12, 2022

With the recent publication of a new specification of TCP (RFC 9293) we found ourselves reflecting on how TCP has evolved over the decades, and wondered what the future holds.

The original specification for the Transmission Control Protocol (TCP) was published as RFC 793 in 1981. TCP has proven to be resilient over the intervening forty years, but hardly static. There have been so many extensions and implementation notes that it’s hard to keep track of all of them, so in case you missed it, RFC 9293 was just published to address that problem. In a major milestone, RFC 793 is now officially Obsolete.

For those of us that have been around for most or all of those years, reading RFC 9293 is a walk down memory lane. From the silly window syndrome to slow start, fast retransmit, duplicate ACKs, window scaling, and much more, the history of TCP is a remarkable case study in system evolution. Proposing a clean-slate redesign is a popular pitch for researchers, and the idea of a fresh start has a certain appeal, but there is so much experience codified in TCP that any replacement has a very high bar to clear.

Taking a step back from the details and looking at the “system evolution” story, several things jump out at me. For starters, I would hate to have to implement TCP from scratch based solely on a reading of RFC 9293 (and the many RFCs it includes by reference). It’s an open question as to whether doing so is even possible, since for many years it’s been the case that TCP has been defined by its reference implementation; the RFCs are more descriptive than prescriptive. That’s not a criticism. From the beginning, the IETF has favored protocol definitions based on implementations, where RFC 9293 is the latest update of that iterative process.

If the implementation drives the specification, then which implementation is authoritative? The answer has been the dominant open source implementation of the day. This was originally the Berkeley Software Distribution (BSD) implementation of Unix. BSD and its descendants continue to this day (notably as FreeBSD), but it was eventually overtaken by Linux, in the early 2000s, as the de facto open source, Unix-based OS. (It is also the case that many of today’s commercial OSes are derived from either BSD or Linux.)

But the Linux version of TCP is more than a reference implementation. You could make the argument that the Linux kernel provides a platform for evolving TCP. While reading RFC 9293 I had a vague recollection of an RFC published during the heyday of TCP extensions entitled “TCP Extensions Considered Harmful”, so I Googled it, and it turns out to be RFC 1263. (It also turns out I was a co-author; I can only wonder what else I might have written and long since forgotten about.) The RFC describes general mechanisms for evolving TCP that would be more rational than TCP options (essentially by proposing what would today be called semantic versioning), but one takeaway that seems relevant today is a concluding statement:

Because of lack of any alternatives, TCP has become a de-facto platform for implementing other protocols. It provides a vague standard interface with the kernel, it runs on many machines, and has a well defined distribution path.

This gets us into a murky distinction—is it TCP that serves as a platform for evolving transport functionality or is it the Linux networking subsystem—but that’s a distinction without a difference. The two are effectively one-in-the-same, with header options serving as one method for adding “transport plug-ins” to the kernel. (Here I’m using a simple definition of a platform as a tool or framework that lets us add new functionality over time.)

Congestion control is another example of how Linux TCP serves as an extensible framework. All the algorithms described in our book are available (and can be optionally activated) in the Linux kernel, where, like TCP itself, the implementation is the authoritative definition of each of those algorithms. As a consequence, an API has emerged for congestion control, providing a well-defined way to continually adapt TCP. And with a nod to feature velocity, Linux now provides a convenient and safe way to dynamically inject new congestion control logic into the kernel by supporting this API in the extended Berkeley Packet Filter (eBPF). This simplifies the task of experimenting with new algorithms or tweaking existing algorithms, side-stepping the hurdle of waiting for the relevant Linux kernel to be deployed. It also makes it easy to customize the congestion control algorithm used on a per-flow basis, as well as explicitly exposing the device-level ingress/egress queues to the decision-making process. (This is how CoDel and ECN, for example, are supported in the Linux kernel.)

That’s the good news, but as a case study of how to most effectively evolve software, the results are mixed. For example, as APIs go, the Linux TCP congestion control API is not particularly intuitive and its only documentation is in the code. A second complication is that while this API makes it possible to substitute different algorithms into TCP, an ideal interface would also support reuse: making it possible for different transport protocols (e.g., SCTP, QUIC) to reuse existing algorithms rather than have to maintain a separate/parallel implementation. A third observation is while Linux has done an excellent job of making the file system replaceable (and it can now be done in a safe and high-performance way) the approach does not extend to TCP, which has too many tentacles throughout the kernel. All of this, coupled with the limitations of TCP options called out in RFC 1263, might lead us to conclude that TCP evolved over the years in spite of itself. At the very least, we are left wondering about lost opportunities.

In the meantime, the cloud has grown up around TCP, with an emphasis on improving feature velocity. Protocol standards (above the physical level) become less relevant once you have the ability to dictate what code runs on both ends of a connection, which the cloud and modern apps are well-positioned to exploit. One has to wonder if TCP as we know it today will fade into the background, not because of a clean-slate replacement, but because it is overtaken by cloud software management practices. The adoption of QUIC would seem to be a good test of this hypothesis: it both provides value that TCP does not (a well-designed and efficient Request/Reply mechanism) and a modern approach to continuously integrating and continuously deploying new features.

One plausible outcome is that the network as a whole becomes a programmable platform, improving feature velocity for everything from the transport protocols running on endpoints to the forwarding pipeline running in network switches. And the more complete and agile that platform becomes, the more likely it is that RFC-defined specifications will one day become obsolete. As we said in RFC 1263:

We hope to be able to design and distribute protocols in less time than it takes a standards committee to agree on an acceptable meeting time.

Perhaps we are getting closer to realizing that goal.

Maybe it’s a coincidence but Isovalent (creators of Cilium and proponents of eBPF) closed a funding round just weeks after we wrote about Cilium. And the idea of getting service meshes without sidecars got another boost with the announcement of Istio Ambient Mesh. In other news, our article on unifying edge routing appeared in The Register.

Tom Anschutz

I've spent some time looking at QUIC, and am impressed with the technology - and especially in what the architecture portends and what it may become. One could develop other protocols with similar characteristics, but with QUIC already in all the browsers it seems that testing your hypothesis about superseding TCP is gong to happen earlier with this protocol than with trying to develop and deploy another.

Faster starts will give it an immediate benefit for short transmissions vs. TCP - especially with increasingly encrypted transmissions. So there is an end-user experience benefit, and also a bandwidth access advantage when it's co-mingled with TCP traffic. My guess is that it might also yield measurable battery improvements in wireless devices because of shortened transmissions.

QUIC is intending to provide FEC as an option. Using FEC for flows within a DE network provides an equivalency of an AF service. If you put a TCP flow within an IP-IP tunnel with FEC applied, then that TCP flow pushes aside other TCP flows on the same path. It no longer shares bandwidth equitably. This is easier to put in practice in networks with an abundance of bandwidth compared to the application need. Think Optical Broadband, 5G mmWave, and 6G WiFi. Once again QUIC with FEC will have an advantage over TCP when they are co-mingled.

I agree that the CICD approach and user-space code will allow faster deployments. As a user space application of UDP, it's not clear that any standardization need happen for a new variant - especially if that variant were negotiated at both ends for the same software supplier. It also solves the head-of-line blocking problem for multiplexed HTTP2 connections.

Finally, IMO, the biggest advantage to QUIC is having an application layer session identifier that exists across IP addresses. If the session identifier is linked to a single application, then architecturally this can become an application address, and that fixes an initial design problem with IP networking. An IP address indicates an interface on a host, and applications are "inferred" from well-known TCP or UDP ports. All sorts of problems come from this, like hosting multiples of the same app on the same server. (You could write another article entirely on what we've done to compensate from not having application addresses in IP networks, but that might be for a different time.) In fair disclosure, such a use would also require new "DNS" type function to discover these identifiers and make use of them in a larger context. Using the QUIC session identifier as an application identifier can provide many interesting benefits, and one of those that interests me the most is application-layer-mobility (think about hand-off from WiFi to a Cell network without losing the YouTube session). That is another win for end-user experience, and I think also a (long term) stake in the heart of many cellular providers. If the whole network-layer-mobility mechanism is made unnecessary - or even if it's just made an edge use case - then that speaks to a completely different future state for mobile networks. With this sort of session identifier, QUIC can easily assimilate MPTCP functionality and allow for interesting use of multiple access/network types simultaneously. That would move toward obsolete existing network-based-mobility. That, in turn, can provide benefits that are not possible with today's typical services and providers, and such benefits can lead to the sort of adoption that disrupts industries. (I actually think that mobile network providers will not go out of business, but that the sort of service they provide will change a lot in the future. But that would be a longer discussion, and I'm already typing too much. ;)

So I think you are absolutely correct that we can test your hypothesis, and also that there are a number of other interesting things going on with QUIC that are worth testing and examining. It seems that the industry is exploring how far you can push the end-to-end principle. In my case, I'm most interested in when the value-add of "mobility" is going to change hands.

Expand full comment

1 reply

Peter van Roosmalen

RFC 9232 = Network Telemetry Framework , you mean RFC 9293 I guess?

2 more comments...

Systems Approach

Discussion about this post