Wednesday, November 17, 2010

Imaging Linux

So, I've been raving about this Backup and Recovery book by Curtis Preston to anyone who will listen (and can still find some joy in revisiting old concepts). What I like the most about this book is that it offers a lot of common sense, nuts and bolts advice about backups: the why, the how, the what, and most importantly gives valuable tips on how to design your own backup strategy. As I'd mentioned in a previous post there's a whole section on bare metal recoveries and the various options for Windows and Linux. For Windows it pretty much boils down to doing some kind of alt-boot imaging, either whole disk or partitioned, but with Linux you can also create live images provided you're not using LVM, software RAID, or using extended partitions. Luckily for me my environment doesn't use any of that (except for one server).

The steps, in a nutshell, are to use dd to backup your drive, and use a live CD like Knoppix to put the image back in place. My setup is pretty simple with only 3 partitions: /, /boot, and swap. I made a share on my SBS 2003 box, and used dd to back up the partitions and the MBR. I also made a copy of fstab for reference. Then I trashed my drive by using dd to zero out the MBR and the first 1GB of each partition.

First I tried the restore by booting off of my Knoppix bootable thumb drive. I rebooted once I'd gotten everything back in place and...what's this? Huh. Interesting little error: ALERT! /dev/disk/by-uuid/ #### does not exist. Dropping to shell.

Ummm, okay. Hmmm. So I do some digging and get a lot of good information. The thing about plans not going right is you inevitably end up learning something from it, which is great. Those learned things are often the kind of thing you don't get from every day use either. I can take that attitude you see because this is all test and not production. Phew. Score one for the test environment team.

Anywho, my Googling leads me down the path of /dev/disk/by-uuid and checking fstab and in the end I think my error was that I had used a USB-drive for the recovery. The OS recognized the device as /dev/sda, so the actual drives in the server were mounted as /dev/sdb. That was my thinking at least, so the fix would be to simply change the UUIDs in fstab to whatever /dev/disk/by-uuid was seeing. Or so I thought.

I made the change and rebooted. Same error. What happened? Turns out I got turned upside down and was comparing content in /dev/ instead of in /media/sdb1/dev/... This had to have happened because of using a USB drive, so let me go ahead and recreate this test scenario but this time I'm going to use a CD. The instructions I was following never mentioned a thumb drive anyway. I was being cute; clearly it was not the time or place to be cute.

I won't bore you with the long tale of my CD-burning woes, but let's just say that several disks later I came to find that Knoppix doesn't support SATA drives, so it would not successfully boot the CD. Next step: external CD drive. Holy cow I'm going through a lot for this lab. With the external drive I finally am able to successfully boot into Knoppix so I try my experiment again. Guess what? Same error. Weird upon weird. I checked everything: fstab, grub.cfg, /dev/disk/by-uuid-- the "missing" device is very much there in all places. There is no problem here. I tried changing fstab to reference the actual device instead of the UUID. Still no joy.

At this point I am ready to declare the experiment a complete failure and move on to Plan B: virtualization. That's where we're looking into heading anyway, but virtualization is not fast and easy to implement if you're not willing to pay the bucks...and we're not. I wanted to get something in place fairly quickly because it makes me nervous to not have some kind of quick recovery solution in place for production servers, especially web servers providing content to clients. At this point though I may very well be putting in more time for this than is worth it. It's a tough call because I hate to admit defeat, especially for something that was supposed to be so simple, and every time I try it again I say to myself, "Alright, if it doesn't work this time I'm done." Sounds like a bad relationship. :)

I think if I can't get it off the ground by the end of the week I will officially call it quitsters on this little project.

Tuesday, November 16, 2010

Backup And Recovery

Let me start off by saying that I hate backup and recovery. It's always been one of my least favorite parts of IT. Let's face it: it's not sexy. Plus, it's hard to get decision-makers to take it seriously and put forth the big bucks to do it properly until something goes wrong and suddenly they can't get their decade-old email back or someone's really really really really important spreadsheet is hosed. Backups also require constant attention and tweaking (what good's a backup if it doesn't reflect the latest dataset?) and backup technology seems needlessly complicated. I almost lost my mind when I encountered my first tape library. Partition what now? LT-who? And let's not even start with the differential vs. incremental vs. daily vs. normal grandfather/father/son/holy ghost holy cow!

Did I mention that I don't like backups?

My new gig however was begging for a new backup strategy. To my surprise there was no autoloader, no tape at all. And no vaulting. Backups were being done via a number of scripts and scheduled tasks to a USB-attached hard drive. Everything was reliant on this one little Seagate. No duplication or anything. Eek. The girl who hates backs up has to make a plan.

So, in keeping with my newfound resolve to not simply do what I've been doing just because it's what I've been doing, I went back to the drawing board for backups. I did know that I was not going to suggest the tape backup route. Tape is not reliable. It gets old, I've seen more than a few tape-based backups with those annoyingly cryptic Symantec errors that don't make you feel too confident about your ability to recover data at all times. Can we really trust "Completed with exceptions"?

I went looking for options for vaulting. Truth be told my previous company had been making some large moves in that direction, and I was all for it. I'm evaluating Venyu and i365 to see who'll give me the most robust, reliable solution for my servers. The weird thing for me is that the majority of our servers are Linux. I know what needs to get backed up to restore Windows…or at least I feel comfortable saying I do. C:\. System State. Data. Voila. Linux, things are a little more spread out. I can back up all of /, but then I'm backing up a lot of unnecessary stuff like /proc and /tmp and a number of directories within /var that don't need to come along for the ride. The benefit is that I make sure I don't miss random configs that are floating around the system that I didn't know about. Like say, that /usr/share/tomcat6/webapps directory. Yeah, that'd be good to have in a backup.

So I'm feeling my way around that and I think, given the small real estate that those Linux files take up (backing up / sans some of those other directories still only yielded a 2GB backup), that I'll go the "better safe than sorry" route. Admins are so very cavalier when it comes to Linux boxes as well that it makes me doubt myself. I don't think you'll ever hear a Windows Admin say, "Eh, just back up a couple of directories from C, and the data. Everything else we can put back together pretty quickly." With Linux though no one seems to be very concerned about the thoroughness of the backup selection.

The other thing I'm testing out is imaging the servers. I've been testing doing live imaging of the Linux servers (a nice feature not available with Windows). I've been following the tips at http://www.backupcentral.com/wiki/index.php/Linux_&_Windows_Bare_Metal_Recovery and using dd to create images. The book definitely makes it seem easier than it is. Maybe I just have had a bad string of luck with this as well. I'll go into details about my trials and tribulations in a later post. Now I must get me to some network reading. You can never revisit networking skills too often imo. 

Saturday, November 13, 2010

Being the Voice

I worked for a rather large consulting firm prior to my current position. I was one of a team of Engineers who went onsite to help clients, in addition to another group of remote support Engineers who helped folks out over the phone. Because we supported a broad range of businesses of varying sizes, and because of the nature of group consulting in this manner, standards were important to have and keep. We had preferred vendors and products that we used, which made it easier for us to be consistent in our ability to support our clients. You need a backup solution? Backup Exec (that is before we started rolling towards vaulting and other cloud-based solutions). Need AV? AVG is the way to go. You get the picture. There was a recommended solution for most things. In addition to the aforementioned benefit of providing a consistent message and suite of solutions to clients, it also made things easier for a newcomer to the field, which I was in a lot of ways despite having been a Network Admin for the previous 3 years. Whole different ballgame.

One of the challenges in my new position is getting myself out of that mindset. It worked well for that situation; I'm all in favor of standardization. In my new position though there are no standards, and so I am in a role of making decisions on my own (mostly) about what solutions will best suit my new company. My instinct was to go with what I knew.

For example, I was tasked with looking into implementing SSL for some of our customer websites. My first instinct was to look at Thawte because that was one of the vendors my previous company typically used for certificates. It never occurred to me to look elsewhere or do any further research until one of the developers sent me a link for a company called StartCom. I looked into their offerings and they have 2-year SSL wilcard certs, Class 2, for $99! Thawte's price for something similar was apparently so high they couldn't even list it on their site; they required that you contact them to get info (and then didn't even bother to respond to the inquiry, thank you very much). My new company is very cost-conscious.

That was a good lesson, a wake-up call in some ways. It's very easy to go with what you know, but not always the best way. It's like buying the same brand of toilet paper for your family every time you go shopping. You buy Charmin because it's what your parents bought so you were brought up knowing Charmin as the preferred toilet paper solution in your household, and never looked at the other toilet paper brands that came and went in the aisle until someone else told you, "Hey, did you know you could get 10% more toilet paper for less price? And it's just as good?"

So, I've learned to re-think everything. Just because a solution was good and even preferable in my previous environment, it's not sufficient to simply go with it. It seems easy enough to adopt this, but when a solution is hinging on you it's very tempting and reassuring to go with something you have experience with and that you know works, even if it isn't necessarily the right fit for your situation.

That being said, I won't be going with StartCom anyway. Their organization certs require you to sign off your first born! Seriously, I tried to make my way through their site which is not an easy task. The instructions for getting a cert aren't exactly clear. To their credit I did get a very fast response to my email inquiry, from the CEO himself. I had asked him to verify that my understanding of the process was correct, and he said it was. In order to get an organizational cert you first have to get an individual cert, which requires you to make a copy of a combination of documents that include the front and back of your license, the cover of your passport and the first couple of pages, and a picture of yourself. Once you've done that and gotten an individual cert you can then apply for an organizational cert which requires another round of documentation (though less). Call me paranoid, but I'm not keen on sending copies of that kind of documentation to just anyone, especially a CA. I've gotten SSL certs before and never had to send any personal information. It just doesn't seem worth it.