Thursday, May 30, 2013

Designing a Data Center Network

In many careers there are things that you find yourself doing over and over again as a matter of course, and those are the things in which you become an "expert" or "guru" (my wife hates that term; it was thrown around a lot at one of my previous employers). Then there are those rare cases, the things that tend to not need to be done from scratch very often; you could go your whole career without having to do some of these things. So, what kinds of things do I find myself having to do a lot as a Linux Admin? Installing Linux itself? Sure. Installing and configuring/optimizing some package like Apache or MySQL? Absolutely. Linux is all about open source, right? I mean, I install a lot of stuff that I will likely never, ever touch again (looking at you Storm and Kafka) but the general idea of finding, researching, installing, and maintaining an open source package or platform is pretty standard as a Linux Admin.

Now, what about as a Network Engineer? How many times do I find myself needing to set up a whole new network from scratch, or simply needing to install a piece of hardware like a switch or router? Not very often at all. I would guess that, except in the case of consultants or folks working for startups, most NEs spend the bulk of their time making tweaks to firewall rules or routing tables, that kind of thing. The only time I've personally had to set up a new network was when I worked for Large Retail Grocer and as a consultant. With LRG I was still under the instructions of our corporate HQ, so while I had the ability to create VLANs as I pleased and choose IP ranges and such, there was no network design involved. It was what it was. As a consultant, the networks I installed were so small there were literally only two VLANs (1 and 2, which are the defaults on the ASA 5505s these companies used), and there was no routing as the ASAs were configured with a static default route to the ISP. So, when I was tasked with designing the network for the next iteration of the software at my current job, I pretty much had no place of experience from which to start.

Sunday, May 5, 2013

Intermittent VPN Connectivity Issues

...and the art of troubleshooting

My brother-in-law, who is a Network Engineer for a large university and in charge of networking interns, commented once that troubleshooting was a lost art form. I couldn't agree more—I definitely struggle with troubleshooting at times. There are two main stumbling blocks that I tend to struggle with.

The first is getting distracted by red herrings. Have you ever looked at the logs on a machine when you're not actively troubleshooting? They're full of errors and warnings. If the server is working up to snuff, you never look at these logs too closely. Most of us have monitoring turned on anyway so we don't tend to look at the logs until something goes wrong. It then becomes a matter of trying to determine which, if any, of those errors and warnings are related to your problem. It's easy to see something and think that it could be the reason you're having issues, and you suddenly find yourself chasing down something that isn't related at all. Of course this is also a pretty strong argument for reviewing the logs every now and then simply as a matter of practice.

The second and sometimes more difficult problem is getting information. I think it's common knowledge that to effectively troubleshoot an issue, one of the first and most important steps is to gather information about the problem. If the problem is one that you as the administrator have discovered yourself, the information-gathering is much more straightforward. When it's been reported to you by someone else, it can be challenging to get what you need to proceed in a smart way.

This is especially dependent on the person doing the reporting as well. I have found that there are two categories of people you talk to when an issue arises. There's the person who knows nothing about the technology and doesn't care; they just want you to fix it. Getting information from this person is a little like pulling teeth because they are likely impatient and don't want to spend the time talking to you about it. Just get it done, man! For example, someone might report that the internet is down. They don't go into the detail of what happened to make them think this until you drag it out of them, and then you find that actually they can't send or receive email and it's nothing to do with the internet at all, or that they're using some intranet application.

Then there's the second type of person, who is either technical or thinks they're technical. You'd think this would be a good person to troubleshoot for. The problem is that this type of person often comes to you with an preconceived idea of what they think the problem might be, which makes information-gathering challenging because they only want to talk about what they think the issue is, not leaving room for you to determine if that is indeed the problem. When you start to ask standard questions they tend to get defensive, wondering why you don't simply treat the issue they've identified.

That being said, we recently had an issue with remote access VPN connectivity to our data center.