It's TCP vs. RPC All Over Again
We’ve written previously about the fact that the Internet seems to be missing a standard protocol for the request/response paradigm, with repeated attempts to force-fit TCP leading to inevitable mismatches. Sometimes we feel that we are the only people who think this way, with our networking book being one of the few to suggest a third paradigm beyond datagrams (UDP) and byte streams (TCP). But a lively debate on the topic has reassured us that we are not alone, as we discuss in this week’s newsletter.
Even though I’ve probably experienced enough debates about TCP to last a lifetime, I read Ivan Pepelnjak’s recent blog post raising objections to John Ousterhout’s position paper It’s Time to Replace TCP in the Datacenter with considerable interest. John’s rebuttal to Ivan’s post only piqued my interest. I have never understood why the Internet has worked so persistently to adapt TCP in support of request/reply workloads instead of standardizing an RPC transport protocol to complement TCP. Their exchange gives me a reason to revisit that question (apropos of Groundhog Day, I suppose).
Before getting to that, I want to be clear that John does not need anyone’s help defending his work. His original position paper and follow-on response are clearly argued and backed by as much data as he could lay his hands on. That does not surprise me. I remember the first paper I saw his research group present. It was at the 10th SOSP, entitled A Trace-Driven Analysis of the UNIX 4.2 BSD File System, and it is worth mentioning because two years later he came back to the 11th SOSP with Caching in the Sprite Network File System, describing a system that addressed the problems the analysis revealed. At the time (the mid-to-late 1980s), Sprite was one of a handful of distributed operating systems built around fast RPC mechanisms. Dave Cheriton’s V kernel and Andy Tanenbaum’s Amoeba were two others. This work happened at the same time TCP was starting to get attention, in no small part because it was released as part of UNIX 4.2 BSD. To close the loop on this story, I found Sprite’s RPC mechanism to be compelling, and so adapted it to my own research on the x-Kernel. That line of research later formed the basis of the RPC Section in our textbook. I was pleased to see Sprite RPC reincarnated in the Homa protocol.
It was important to me to put RPC on equal footing with TCP when we wrote the first edition of our book, in part because of the central role it played in distributed computing, and in part because you didn’t have to look far to find RPC-like behavior in the Internet: SMTP was a purpose-built RPC for email; SNMP was a purpose-built RPC for network management; DNS was a purpose-built RPC for name resolution; and then several years later, HTTP was introduced as purpose-built RPC for web resources. That we have since turned HTTP into the Internet’s de facto RPC protocol (and then now realizing that it is suboptimal, are trying to optimize it by collapsing all the layers into the new QUIC protocol), is only a testament to how small a role technical rationale plays in what happens in industry. Or maybe it’s more about NIH, in as much as the Internet and distributed systems communities were largely disjoint for many years. (I can count on one hand the people I saw at both SOSP and SIGCOMM during the years I actively attended both.)
Another explanation is that the Internet has unnecessarily coupled the transport protocol with the rest of the RPC framework. Conflating the two naturally follows from the purpose-built examples I just gave: SMTP is bundled with MIME; SNMP is bundled with MIB; and HTTP is bundled with HTML. But the idea of promoting a self-contained and general-purpose request/reply transport protocol as a peer of TCP goes back to 1988, and the (ultimately) thwarted attempt to standardize VMTP, which was based on experience with the V kernel.
But coming back to the specific question of RPC vs TCP in the datacenter, it still has me scratching my head about why it hasn’t happened. The datacenter is a unique and self-contained environment. One explanation is that TCP is a chameleon protocol, or as I described it in another post, as much a platform as it is a protocol. You want individual messages instead of a byte-stream? TCP has an option for that. In contrast, RPC is natively message-oriented. You want multiple outstanding calls without head-of-line blocking? TCP can do that by opening multiple connections. In contrast, RPC protocols decouple “logical channels” from request/reply message pairs. You want congestion control? TCP can give you one version tuned for the wide-area and another version tuned for the datacenter. In contrast, perhaps the biggest contribution of Homa is to challenge the premise of TCP’s flow-centric approach to congestion control for datacenter workloads. You want a low-latency network stack? Well that’s a challenge TCP has a 40-year history of trying to optimize away, and when that falls short, ultimately looking to SmartNICs to solve. In contrast, RPC was designed from the start to optimize round-trip performance in low-latency networks. It’s difficult for me to imagine TCP ever doing better.
Maybe it comes down to a matter of judgment. Do you prefer multiple specialized tools or a single general-purpose tool? Creating the latter is the holy grail of system design, but when you consider how dominant the request/reply message exchange is in cloud computing, I find the argument for a transport protocol optimized for that use case to be more compelling. Or said another way, maybe RPC is the general-purpose tool, and we’ve been stubbornly trying to adapt a niche tool for far too long, creating what my former student Sean O’Malley once called a Swiss army knife with a jacuzzi blade.
But my original question was to ask why that hasn’t happened. The only answer I can come up with is that judgment often reflects biases. If you’ve been taught that TCP covers every use case (with UDP providing an escape hatch for the rare exceptions that might arise), then it’s difficult to see a request/reply transport protocol as an equally viable alternative. The emergence of QUIC lends credibility to the latter perspective, with Homa representing a second design—one that’s been optimized for the datacenter (and not limited to HTTP’s five operations).
Now that we have made it possible to pay for this newsletter, we’d like to thank the people who already signed up. Perhaps you could try asking your company to reimburse the small subscription fee as a business expense – here is a template to help.
A few things that we’ve been reading this week relating to previous posts: Christian Huitema has played around with QUIC as a transport protocol for interplanetary communication, and it wasn’t all smooth sailing. Like everyone else, the New Yorker is jumping in to help us understand ChatGPT, and we liked this piece which makes the analogy to lossy compression. Since giving up Twitter and moving our social media presence to Mastodon, we’ve been liking it over there, and Cory Doctorow has thoughts about the growth in Mastodon’s user numbers. Follow us @SystemsAppr@discuss.systems.