The Challenge of East-West Traffic
While one half of the Systems Approach team has been out bagging Munros in Scotland, we’ve found some time to reflect on the changing approaches to datacenter security, particularly the focus on East-West traffic, which is the topic of this week’s newsletter.
One of the fun things about being an Australian living in the Northern hemisphere (which was my situation for over thirty years) is having repeated conversations about which way water rotates when it goes down the drain. OK, it becomes a bit less fun over time, but I was always surprised that few people actually tried the experiment of checking out a few different drains in their own hemisphere. It turns out that drain geometry and initial water movement in the sink, not the Coriolis effect, dominate the direction of rotation, so you will see both directions in either hemisphere. There is actually a nice YouTube video that shows this, and then impressively proceeds to show the effect of Coriolis force on water draining out of a pair of identical kiddie pools in the two hemispheres (thus removing the confounding factors in most sinks, toilets, etc.).
A similar amount of time goes into explaining that North is not actually “up”, it’s just shown that way on maps drawn by Northern hemisphere-based explorers and almost every other cartographer since then. My father, who travelled quite a bit to the Northern hemisphere in the 1970s, might have been one of the first to make custom maps that put South at the top, to point out to his overseas colleagues that their map-drawing conventions were just that–arbitrary conventions. Never mind the issues of projection which left me thinking Greenland was bigger than Australia until I learned about alternatives to Mercator.
All of this has been on my mind this week as (a) I returned to the Northern hemisphere for the first time since 2020 (see previous notes about my talk in Edinburgh) (b) I spent a lot of time with maps when I was out walking in the highlands of Scotland (c) I found myself needing to explain the difference between East-West and North-South traffic to some colleagues in the context of datacenter security. I’m not exactly sure of the origins of this naming convention, but the idea is that the ingress/egress point of a datacenter carries the “North-South” traffic, while the traffic that flows between servers within the datacenter is the “East-West” traffic.
Why do we even make this distinction? One big reason is security. Historically, the simplest way to “secure” a datacenter was to put some set of appliances (firewalls, intrusion detection systems, etc.) at the ingress/egress point. This is the “perimeter” model of security, which became prominent for several reasons. First, the number of ingress points to a datacenter is small–maybe as low as one, certainly no more than a handful. So it is natural to place centralized security appliances next to those choke points so that all the traffic can be passed through them. Furthermore, the bandwidth involved at ingress is likely to be orders of magnitude lower than the total East-West bandwidth: traffic entering a datacenter is likely to be measured in gigabits per second, while East-West traffic can easily run into the terabits. Neither of these points means that perimeter security is a good model–just that it was for a long time the most practical approach.
It is worth taking a step back to ask why centralized appliances became the preferred way to apply security controls. One version of this story is that the original Internet architecture had no security, and that early efforts to add security followed the end-to-end argument, which makes a good case for putting security into end-systems. For example, encryption and authentication are security mechanisms that can be implemented in end-systems (provided you can find a way to manage key distribution, which has proven challenging). However, as David Clark (co-author of the end-to-end argument) pointed out in a 2001 paper with Marjory Blumenthal, a multitude of factors pushed the Internet towards the adoption of centralized appliances inserted into the path of traffic by the late 1990s, such as the rise of malware, the adoption of the Internet by unsophisticated users, and the unreliability of software implementations on end-systems (e.g., OS bugs). While many Internet purists lamented the decline of the end-to-end principle, Clark and Blumenthal adopted the position that we have to deal with the world we live in rather than some idealized parallel universe. Centralized firewalls became part of the landscape because they allowed IT administrators to gain some control over the security of their networks in a world of increasing threats, without depending on the impractical notion that every end-system would do the right thing.
By the time I came to be involved in datacenter networking around 2012, the idea of securing the “perimeter” of the datacenter–which essentially involved putting a number of appliances into the ingress/egress path–was well established. Unfortunately, it was also fast becoming clear that this approach was inadequate, as a lack of East-West security meant that a compromise of a single (perhaps non-critical) system inside the perimeter could provide the launching pad for a much more serious attack via lateral movement among systems. The poster child for this issue was the 2013 Target hack, in which the initial breach took place via a refrigeration contractor’s computer, allowing the attackers to gain a foothold inside the perimeter of Target’s network, from where they were able, over a series of weeks, to move laterally among systems until they obtained the credit card details of about 100 million customers. There was no reason for the contractor portal (the original entry point for the attack) to have any connectivity to the systems that had credit card data. Nevertheless, because both systems were “inside the perimeter,” there were limited security controls between them. Lack of control over East-West traffic was the key to this and many other attacks.
Securing East-West traffic in 2013 was a fundamentally hard problem, because there is a vast number of paths between systems carrying massive volumes of data, and the traditional way to secure this would be to divide the network into a small number of zones with firewalls between them. Within a zone, traffic still flowed freely. It was either impractical or prohibitively expensive to place firewalls in such a way that all East-West traffic could be intercepted.
I came to be interested in this issue because of the evolution of network virtualization that was taking place at about the same time as the Target breach. Our early network virtualization product at Nicira virtualized layer 2 (switching) and layer 3 (routing) and we had long held the view that we would work our way up the layers to virtualize all of networking. A simple firewall operates at layer 4 (looking at transport protocol port numbers) and so this was the logical next step.
Network virtualization enables an SDN-style implementation of a firewall. By “SDN-style” I mean that the data plane is distributed while the control plane is logically centralized. In the image above, the distributed data plane runs in the virtual switch of each server, inspecting the traffic entering and leaving each virtual machine. (Similar approaches can be applied to containerized or bare-metal workloads.) This means it is now possible to apply firewall policies to every single packet that traverses the data center–even packets that only pass from one VM to another in the same server. Since virtual switches can process packets as fast as the server can send them, it became feasible to have terabits of firewall capacity allocated to East-West traffic. But because the architecture is based on SDN, there is a logically centralized control plane that simplifies management of the distributed data plane. From a control plane and management perspective, the firewall still looks like a centralized device, where an IT administrator (or an automated system calling an API) can set the firewall policies for the entire datacenter. But the data plane scales out with server capacity, and there is no need for heroic efforts to force traffic to flow through some centralized appliance.
There is more detail about this aspect of network virtualization in our SDN book. This is by no means the last word in East-West security; Aviatrix, for example, addresses East-West security for cloud workloads. And it’s important to do more than just inspect protocol ports as Thomas Graf shows in a talk on Cilium. Overall, the creation of tools to efficiently provide security services to East-West traffic was one of the key components to implementing zero-trust security, a topic we’ve covered previously. It’s also one of the main reasons that network virtualization achieved mainstream adoption in enterprise datacenters: it became obvious that relying only on perimeter security and a handful of firewall zones was insufficient for today’s security challenges. There is plenty more to be done here, with service meshes being another area of active work addressing (among other things) East-West security. But at least we no longer punt on the problem by relying solely on centralized appliances at the ingress to focus only on North-South traffic.
Our previous notes on the importance of APIs and observability caused us to take note of the recent acquisition of Akita, an API observability company. Our book on Private 5G is available at a discount if you buy the ebook via our website. You can also pick up Systems Approach coffee mugs. And don’t forget to follow us on Mastodon, the thinking person’s social network.