TCP Congestion Control: Acceptable Unfairness
This week we’ve posted a reasonably complete draft of our congestion control book, which has reminded us once again what a long and rich history the field has. Let’s take a look at that history and ask whether congestion control is ever going to converge on a stable solution. Please feel encouraged to submit feedback on the book via GitHub.
It’s hard not to be amazed by the amount of active research on congestion control over the last 30-plus years. From theory to practice, and with more than its fair share of flame wars, the question of how to manage congestion in the network is a technical challenge that resists an optimal solution while offering countless options for incremental improvement. This seems like a good time to take stock of where we are, and ask ourselves what might happen next.
Congestion control is fundamentally an issue of resource allocation—trying to meet the competing demands that applications have for resources (in a network, these are primarily link bandwidth and router buffers), which ultimately reduces to deciding when to say no and to whom. The best framing of the problem I know traces back to a paper by Frank Kelly in 1997, when he characterized congestion control as “a distributed algorithm to share network resources among competing sources, where the goal is to choose source rate so as to maximize aggregate source utility subject to capacity constraints.”
This is a hard problem in any computer system, but what makes it especially difficult in this case is the decentralized nature of the Internet. Not only is the Internet’s approach to congestion control distributed across its millions of hosts and routers, it is fair to think of them as cooperatively trying to achieve a globally optimal solution. From this perspective, there is a shared objective function, and all the elements are implementing a distributed algorithm to optimize that function. Compounding the challenge, and arguably the reason there are so many approaches to congestion control, is the lack of consensus on the right objective function.
Of course everyone wants high throughput, low latency, stability, and fairness, but it’s how you trade those off against each other that makes it a tough question to answer. To make matters worse, the problem is over-constrained, meaning that each solution must choose which constraints to give the most–and least–weight to. Fairness has been a particularly thorny issue, not only when considering a given algorithm competing against itself, but especially when comparing algorithm A with algorithm B. If A is able to measure improved throughput over B, but it does so by being more aggressive, and hence, stealing bandwidth from B’s flows, then A’s improvement is not fairly gained and may be discounted. In such an analysis, Jain’s fairness index has historically been used as a quantitative measure of fairness.
Fairness arguments have been used for 30 years in favor of the incumbent algorithm (whatever it happens to be), making Ranysha Ware’s recent proposal to measure harm instead of fairness a welcome breath of fresh air in this ongoing debate. Ware et al. advocate for a threshold based on harm, as measured by a reduction in throughput or an increase in latency. Intuitively, if the amount of harm caused by flows using a new mechanism B on flows using existing mechanism A is within a bound derived from how much harm A-managed flows cause other A-managed flows, we can consider B deployable alongside A without harm.
Unfortunately, replacing fairness with harm doesn’t eliminate the over-constrained nature of the problem. But what might is the proliferation of purpose-built mechanisms targeted at specific usage domains. Based on my reading of a lot of the recent research, this is where congestion control algorithms seem to be headed. Data Center TCP (DCTCP) is a great example. It targets congestion within the data center, and so assumes link bandwidths are 10-Gbps or higher and RTTs are typically measured in the tens of microseconds. It doesn’t have to be responsive to such a wide-range of operating parameters as the Internet as a whole. Even more importantly, because the data center is contained within a single administrative domain, it is possible to enable features like Explicit Congestion Notification (ECN) on every single switch, and optimize the algorithm to take advantage of that information.
You could argue that the data center is a special case, but I’m not so sure. Google’s Bottleneck Bandwidth and RTT (BBR) mechanism is another example worth considering. It is general-purpose in that it attempts to handle the wide range of scenarios that any Internet-worthy algorithm would have to respond to, but it’s not particularly fair when competing with non-BBR flows. But that doesn’t matter if it’s only deployed within Google’s backbone, interconnecting their data centers. In that case, BBR only has to be fair to its own flows. Is it possible to generalize from this example?
Perhaps it is. In the early decades of the Internet, end-to-end meant between any two hosts in the Internet. But today, direct communication between arbitrary hosts is the exception, not the rule. Most TCP connections on today’s Internet can be categorized as follows: edge-to-cloud (between an end user device and the nearest cloud hosting center or CDN server); cloud-to-cloud (typically over a provider’s backbone); or intracloud (between two servers within a data center). The second two are usage domains that already run purpose-built TCP congestion control algorithms, as covered above. And the first is well on its way to being reinvented—with the rollout of 5G (and all the spectrum-allocation algorithms 5G implies) for mobile devices, and with cloud-provider last-mile offerings for our streaming and gaming devices at home.
None of this means the resource allocation problem goes away, but it might become more tractable as the deployments become more customized. Most importantly, we won’t have to agree to one universal mechanism, except as a fallback mechanism for those rare occasions when true peer-to-peer transfers are required. This may very well accelerate the rate at which congestion control papers are published, since the “it’s not fair” argument is no longer relevant. Of course, if the Internet truly re-decentralizes in the coming years, congestion control might once again need to address global optimization problems, but perhaps that will happen without a single incumbent algorithm being favored by default. It seems congestion control researchers have a secure future.
The number one story catching our attention this week was the six hour Facebook outage, which provided a chance for everyone in networking to say “it’s always DNS” or “it’s always BGP” or else pull out their previously formed opinions and show that they had been right all along. We found ourselves in the last camp as our piece on Internet Centralization got plenty of airplay thanks to The Register, while deep down we suspect that our other piece on Network Verification probably deserved a closer look. Sky News stayed on brand by inventing the “Bridging Gap Protocol”. We’ll have more on the outage in the next newsletter.