Thursday, November 1, 2012

Aerospike (formerly Citrusleaf)

I was asked to deploy 3 development servers to run Aerospike, formerly Citrusleaf. I was, as always, psyched to have an opportunity to learn something new. I got that and then some. Mysteries upon discoveries upon more mysteries. I could have simply sneaker-netted the install since we're only talking 3 servers here, but the idea was to get it as automated as possible so that when it comes time to roll this into production and we're spinning up 30+ servers, we're not standing in front of KVMs and crash carts all day.

So, where do I begin?

Let's start with the hardware. We use Kickstart and PXE boot to provision new servers. This was already set up by our senior admin so the only thing I needed to do to get the base image installed was edit an existing config to match the layout for the servers I was to provision, and add the new image to the boot menu in Cobbler. That was simple enough but the first install I did failed at the step where the drives were partitioned. The servers have 8 SSDs, and I only wanted to install using the first 2 (I'll explain why later). Our Kickstart config creates 2 RAID 1 pairs using a portion of each disk for the OS and on top of that we do LVM.

I did a fairly simple edit where I specified sda and sdb as the disks for the OS RAID, and left the remaining disks alone. The relevant part of the Kickstart file looks like this:

bootloader --location=mbr --driveorder=sda,sdb,sdc,sdd,sde,sdf,sdg,sdh --append=" rhgb crashkernel=auto quiet"
part raid.0011 --size=500 --asprimary --ondrive=sdc
part raid.0012 --size=1 --grow --ondrive=sdc
part raid.0021 --size=500 --asprimary --ondrive=sdd
part raid.0022 --size=1 --grow --ondrive=sdd

This is actually the edited file because what I hadn't realized was that sda and sdb being the OS disks is the standard I'm used to mainly because my installs have been manual, allowing me to select the disks I want to use. Or, I've worked in systems that only had two disks so there was no issue. In a system with 8 disks however, if you're trying to specify the disk to use for the install and not use all of them, you have to be aware of the fact that the system may initialize disks in a different order than alphabetical. This was something that the senior admin had seen with another set of servers that also used SSDs, and he advised me to change the config as above, specifying sdc and sdd as the first two drives. It worked, though I am still not entirely sure why.

Once the base OS was installed, the next hardware issue I ran into was also specific to using SSDs. Aerospike's documentation requires that the drives be zeroed out (they suggest using dd to do it) and overprovisioned if they aren't already. I was unfamiliar with SSDs prior to this so I did some research because I wanted to understand the underlying technology better than simply "knowing" that they were faster than traditional rotational disks, and I also wanted to understand why I needed to perform the steps that Aerospike specified.

The short story is that I had to zero out the drives because SSD only writes to freshly erased pages. The overprovisioning is for improved performance and prolonging the life of the drive. It sets aside a protected area of the hard drive that the OS can't see, and the SSD controller uses that space for some of its housekeeping tasks.

As I said, because the ultimate goal is to be able to roll this out easily to many servers, I needed to create a script that would perform these steps automatically. I'm no scripter. It's definitely a hole in my skill set. I understand the concept; I get that you can essentially put a bunch of shell commands in a text file and make them go, go, go. I get variables, I get what for and while and if loops do. It's when you have to put it all together that it all comes apart. It's the equivalent of knowing individual words in a language and knowing how to ask "Where is the bathroom?", but having more difficulty asking, say, "Do you think we'd have a better chance fitting the armoire through the doorway if we oriented it vertically instead of horizontally?" The script I needed to make was asking about the armoire folks.

I came up with what I feel was a pretty impressive script for my first time out of the gate doing something of this magnitude. I got some valuable additions from the senior guy as well. He gave me a snippet that allowed me to separate out the OS drives from the remaining drives so that I didn't inadvertently erase them, a useful tool since the OS drives could have different labels on different servers. I was sorted then. Kickstart was working to install the base OS and any dependencies and I had a script that would provision the remaining drives.

And thus began the next problem. The utility I used to change the size of the drives is called hdparm. It's a handy little tool that apparently has all kinds of useful flags for altering the behavior and performance of drives. The only ones I needed was the -N flag, which queries the drive for size info, and -Npxxxxx, which sets the size of the drive. I ran into some weird issues with hdparm on the system though. Querying was fine, but when I tried to set the drive size I got the following output:

max sectors   = 3907029168/14715056(18446744073321613488?), HPA setting seems invalid (buggy kernel device driver?)

Couple of things to note here. Obviously the message that the kernel driver seems buggy is not normal. The output itself is also not right. There should only be two numbers here: a numerator and a denominator. Instead there are three numbers. I don't even know what the second number is. I spent quite a bit of time trying to figure out why hdparm wasn't reporting properly. I wasn't comfortable going forward and making changes to drives when the information I was gathering wasn't clear or reliable. The output should have simply said that HPA was enabled or disabled. I also found that the commands weren't taking reliably. At one point I had a system that had 3 drives with the overprovisioned size info set, and the remainder were not, despite all of them having had the same commands issued via my script.

I found out that you can only issue one hdparm command to a drive per boot cycle. I had been issuing these commands left and right as I tried to get things worked out. Once I rebooted, everything looked right. Well, the output was still buggy as above, but at least the numerator was the proper number if I was assuming that the last number was the actual drive size. I still have to figure out what's going on with the reporting, but at least at this point hdparm is working as it should. Now on to the actual software install!

No comments:

Post a Comment