Tuesday, July 24, 2012

More Than I Can Chew

I'm leaving my current job at the end of this week. My job switching has unfortunate timing. A couple of projects were started/requested during the last two months, and that's when I found an opportunity that I was interested in pursuing, so the timeline on these projects got decreased in a hurry. The first was getting a test cloud implementation set up to use as a template or guideline going forward. The second was to set up high availability in our data center, i.e. add a second switch and ASA 5505 to the mix. The cloud project was completed, though it took a little longer than expected. The data center work was scheduled to be completed this past Friday, but I didn't quite get there.

The problem was that I had planned to do more than was actually feasible in the time frame I had. We host clients in the data center, and some of them have events that happen well into the evening. It has made it difficult in the past to schedule downtime for any kind of maintenance and I generally am not able to start until almost midnight. I'm not the nightbird I used to be since I have a kid now, so the late night tech work has gotten more difficult. When I worked for WFM and we did our PCI conversion for the region the weeks were filled with nights where I got started at 10pm and didn't leave until the sun came up...after working a full day beforehand. Not so much any more.

I went to the data center at 10pm, along with the Developer who acts as IT backup. The list of things to do:

  • replace a failed server
  • rack a second PDU
  • install second switch and ASA
  • configure switch
  • upgrade existing ASA (both RAM and the software)
  • copy config from new ASA to old one
  • set up HA
  • test
Even writing it now it looks like a lot more than could/should be attempted in one night, but I was running short on time and really wanted to finish what I'd started. I should also point out that the separate tasks also were more detailed. For example, the failed server to replace? I had to build it on onsite because the replacement also involved moving RAM from the old server to the new one. The new (i.e. a refurbed one that we had laying around) didn't have any RAM at all so it couldn't be prepped ahead of time. Luckily since I used Mondo Rescue it was a quicker task than it might have been, but still. Also, I was not only attempting to add redundancy to our infrastructure but also make it more secure by introducing VLANs and moving away from the flat network topology that we have in place. I wanted to put the web servers in a DMZ and keep the database servers inside, standard stuff. Lastly, the newer software for the ASA requires more RAM for the Security Plus license and has a whole new syntax for things like ACLs and NAT, so I had to put in the time to re-learn those things for configuring the new firewall. 

Of course, nothing worked as intended. The Mondo Rescue server build had to be attempted 3 times before it worked. There weren't rack screws included with the PDU we ordered. And although Layer 2 switching is pretty simple in general and I didn't foresee an issue with trying to install a second switch and set up VLANs and the necessary trunking, I couldn't get traffic to flow between switches. I added a second switch, configured it connected it to the new ASA, and couldn't get the two to talk. My pc would talk to the switch, the ASA would talk to the outside, but I could not get from my PC to the outside via the switch and the ASA. By this time it was approaching 2 in the morning, and my brain was fried and not effectively troubleshooting at all. Big suck. 

I would have stayed there all night trying to get things working, but the developer convinced me that I needed to simplify my tasks and move on. In other words, screw the VLANs, set everything up as it was, and regroup. In the end I wound up with leaving the second ASA and switch installed, but not actually connected to the network. In short, the evening was a disaster and I only got 3 of the tasks completed. My error was definitely in attempting to do too much. In retrospect, it would have been smarter to leave the existing network topology alone, and simply add the new hardware, leaving the security portion of it to my predecessor. 

Lesson learned. 

No comments:

Post a Comment