Thursday, May 30, 2013

Designing a Data Center Network

In many careers there are things that you find yourself doing over and over again as a matter of course, and those are the things in which you become an "expert" or "guru" (my wife hates that term; it was thrown around a lot at one of my previous employers). Then there are those rare cases, the things that tend to not need to be done from scratch very often; you could go your whole career without having to do some of these things. So, what kinds of things do I find myself having to do a lot as a Linux Admin? Installing Linux itself? Sure. Installing and configuring/optimizing some package like Apache or MySQL? Absolutely. Linux is all about open source, right? I mean, I install a lot of stuff that I will likely never, ever touch again (looking at you Storm and Kafka) but the general idea of finding, researching, installing, and maintaining an open source package or platform is pretty standard as a Linux Admin.

Now, what about as a Network Engineer? How many times do I find myself needing to set up a whole new network from scratch, or simply needing to install a piece of hardware like a switch or router? Not very often at all. I would guess that, except in the case of consultants or folks working for startups, most NEs spend the bulk of their time making tweaks to firewall rules or routing tables, that kind of thing. The only time I've personally had to set up a new network was when I worked for Large Retail Grocer and as a consultant. With LRG I was still under the instructions of our corporate HQ, so while I had the ability to create VLANs as I pleased and choose IP ranges and such, there was no network design involved. It was what it was. As a consultant, the networks I installed were so small there were literally only two VLANs (1 and 2, which are the defaults on the ASA 5505s these companies used), and there was no routing as the ASAs were configured with a static default route to the ISP. So, when I was tasked with designing the network for the next iteration of the software at my current job, I pretty much had no place of experience from which to start.


At least two of the data center designs with which I have worked over the past 4 years are results of the "let's get something out there" plan. At Gravel SaaS was never meant to be the model. We sold software that clients installed on their own servers, in their own environments. Legend has it that a couple of potential clients wanted to use our software but didn't want to host it themselves, so we scrambled to put something together that they could use. One switch, server, and firewall later, we were hosting clients, and the solution quickly grew larger and more cumbersome than anyone imagined. By the time I signed on most of our business was "SaaS", or something like that, and we did not have an infrastructure that had been designed or planned out for it so we suffered a lot of performance problems. We did not have happy clients.

At River we have a similar situation. The data center was designed with more single points of failure than you can shake a stick at. Whole clusters of servers sit behind a single ProCurve, which in turn has a single fiber link to a core. It's also a flat /16 network; we have a few applications that can hit the network pretty hard (Cassandra replication for example) and it flows over the same broadcast domain.

When presented with an opportunity to start from scratch, I went with what I knew in theory to be the most important aspects of a network: redundancy and fast convergence. There should be no single point of failure switches or links and traffic should be segmented as much as possible according to need. I was all set with a design that had the standard 3 tiers: access layer for the servers, an aggregation layer, and the core/edge layer. I'd done my research, pulled the examples from Cisco's website, and was ready to go with a proposal for the gear we needed.

The Senior Linux Admin said, "What about spine/leaf?" <cue the sound of needle getting pulled off a record>

It's actually rather hard to find vendor-agnostic explanations of this topology online. A Google search will yield Cisco and Dell documents that focus on their specific implementations. They give you a basic overview, but as soon as it gets detailed it starts to read like marketing handouts. There's also a lot of jargon out there that, if you've never thought very deeply about designing a network (or never had to do it) can be distracting. Surprisingly, the UK Register has a decent, short write-up of the concept.

Spine leaf is essentially a flattened network architecture, from my understanding. The "leaves" are the switches that connect your servers to each other, and the spine are the switches that connect your leaves. You do away with the aggregation layer all together because your leaves are connected to all of your spines, meshed, and are non-blocking (i.e. not using STP to shut down the redundant links). The idea is that spine/leaf addresses the changes in network traffic in todays distributed networks, in which instead of traditional client-server traffic (north-south) you also now have a lot of east-west traffic, or server to server. It also decreases the oversubscription inherent in 3-tiered architectures. That was the second term I had to become familiar with. Oversubscription is, in a nutshell, promising more bandwidth than you actually have available in the hopes that not everyone will attempt to use that bandwidth at the same time. You can oversubscribe a switch out of the box if the backplane doesn't have enough bandwidth for the ports included. Even if the backplane is up to the task and non-blocking, if the uplinks don't equal the potential total bandwidth of the ports, you have oversubscription. Some amount of oversubscription is typical and expected in traditional network architectures, but what ratio is acceptable becomes a matter of the specifics of your applications.

We tapped a reseller/consulting firm that we've used in the past for their recommendation. We were mainly looking for a hardware rec; my manager at the time didn't think we should keep using HP. They recommended Extreme Networks and set up a meeting where we described the functionality of our new software (to the best of our knowledge) including traffic patterns, and they came up with a network design for us. It's not true spine/leaf in the strictest sense that you'll find referenced in the literature. This is kind've what true spine/leaf looks like:


Again, note that every leaf is connected to every spine. Our network looks more like this:


Never mind that it doesn't look as nice as Brad's :) . The important thing of note is that while we have 10Gig up to the spine, we are not fully meshed. The leaf layer consists of two stacks, and the spine is also a 2-node stack. East-West traffic on the same vlan flows on the stacking connections without need of routing or traveling up a layer; inter-vlan traffic has to go up to the spine switch and get routed back down through one of its links, whether that be through the master or backup in the spine stack. In this way high-replication traffic spread across racks—say, a Hadoop or Cassandra cluster for example—talks to each other fairly easily without having to travel up a layer, as it would in a traditional 3-tier environment. Unfortunately, if you spread your cluster out across the two stacks for failover, then you reintroduce some percentage of that.

We haven't yet put our architecture to the test by deploying code to it, but that's the next step. At that point we will undoubtedly have to tweak some of the design and weigh tradeoffs between complexity, speed, and fault tolerance. Certainly working through this has been an incredible learning experience. I'll be blogging about my overall experience with Extreme Networks soon.

No comments:

Post a Comment