Tuesday, March 22, 2011

Monitoring with Nagios and OMSA

I'm more experienced with Red Hat and its variants (CentOS) and had only heard of Ubuntu in the context of being a home/desktop system, so I was surprised when I joined my current employer and discovered that they use Ubuntu Server across the board. I'm pretty careful not to be one of those IT people who automatically assume that any solution that doesn't conform with prior experience is inferior. I mean, as much as I would have liked to have swooped in and converted everyone to RH, it's not the best way to go about things. You observe, you note, and make a decision based on real performance metrics and values, not personal preference alone. This is especially true when you're talking about a company that develops its own SAAS offering. The developers are used to a specific platform right, so you'd better be sure it's worth it to shake that up for them.


So, let me get this out of the way. Ubuntu as a distro isn't bad, although its whacky way of installing things from repo is a little challenging. Why Debian needs to rearrange apps like Apache and Tomcat and put them in /etc instead of /opt, or spread them out between /etc and /usr (as in the case of Tomcat) is beyond me. If you want the directory structure to conform to what everyone else in the *Nix universe is using, you have to install from source. This is what I wound up doing when I set up a Nagios server, because it gets pretty frustrating trying to follow well-established instructions, and the ones you find in the Ubuntu documentation are just not comprehensive enough. I was using a great resource,  Ramesh Natarajan's Nagios Core 3 ebook, but it quickly got a little complicated translating Ubuntu's directory structure to the much more sensible layout of the compiled version.

Anywho, Ubuntu is fine except for that, but it lacks the kind of support that a server distro meant for production environments should have. Case in point: these are Dell R300s. Dell has a pretty neat tool, OpenManage Server Administrator, that can be used to monitor the server. You get detailed information on the storage controllers, hard drives, firmware, etc. It's good stuff and I have used it in every other environment. In Windows it's easy to install, and I got it working on RH systems without too much fuss as well. Unfortunately, there isn't a Dell-support, official OMSA release for Ubuntu/Debian. There is a repo available now at http://linux.dell.com/repo/community/deb, which was worked on by some engineers from both Dell and Canonical, but it isn't flawless and support is limited to a listserv.

I installed this software on my 10.04 x64 server, a test server that I'm working on to deploy to our colo. I started with version 6.4, and it installed without too much trouble. I had to edit my /etc/apt/sources.list.d/ directory to add this unofficial repository, and then apt-get update. I installed the main components and started the dataeng service. I immediately lost my SSH session and was unable to get back in. I ultimately had to log in to the console (good thing the server was onsite) and restart networking services. This worked for a while, but I lost connectivity again after a bit. I uninstalled OMSA and all went back to the way it was. Weird. I tried it again with fewer components, just the barebones, and had the same series of incidents. I could find nothing online about why this would be interfering with my network connection.

I then tried installing an older version, 6.3. Same thing happened, and I restarted networking again. So far it's been up and going after that restart for almost an hour I'd guess, so I started to experiment with some of the commands you can use with omreport. The ultimate goal is to use this with check_dell_openmanage. I wanted to test the commands locally and make sure OMSA worked properly before starting the next step of integrating it with Nagios. Good thing.


admin@Server1:~$ omreport chassis hwperformance
Error! No Hardware Peformance probes found on this system.
admin@Server1:~$ omreport chassis memory
Memory Information

Error : Memory object not found

So, the NIC is still functional right now, but I am unable to actually get any information off of the system. Coincidentally, I also found a posting that describes the problem I've been having with the network card: http://lists.us.dell.com/pipermail/linux-poweredge/2011-February/044224.html. That poster was also using 6.4 so maybe it's specific to that release.

Now I'm off to try and solve this new mystery. There is definitely something to be said for using a distribution with popular support. I understand that the only way for a distro like Ubuntu to get that kind of mainstream support is for more people to adopt it, and Dell and other major OEMs won't provide that if it's not being used in production server environments, but it's definitely challenging to be on the back edge of that movement. 

Wednesday, March 9, 2011

Wireless Security and Handhelds

I just switched the wireless security in our office over from WEP, highly insecure and just not done in business environments anymore, to WPA. I would have chosen WPA2 but our wireless AP doesn't support it and is unfortunately EOF by the manufacturer so there are no firmware updates forthcoming to add this functionality. I could muck around with DD-WRT but I prefer to steer clear from that kind of experimentation on production devices. No one would be happy if I brick our lone wireless AP.

As with any change, you always know that something will break, and sure enough I had an employee come to me and report that her HTC Evo had stopped connecting to the wireless even though I'd updated her settings the day before. It was stuck in a loop of trying to get an IP address. I checked the AP and the MAC address wasn't showing up in the table at all. A little research yielded a little-known factoid: first of all, Android phones have problems connecting to WPA using AES and function much better using TKIP, and second of all AES is not actually part of basic WPA but is actually specific to WPA2. It may have something to do with the fact that AES is hardware-based encryption whereas TKIP is software-based. Once I changed the encryption to TKIP, her phone connected with no problem. Of course, TKIP is less secure, but we have to make adjustments.

One to grow on. 

Tuesday, March 1, 2011

Virtualizing Linux

Virtualization offers some amazing benefits and is an intriguing option for any business looking to cut costs on hardware, introduce greater flexibility in their deployment scenarios, and make their systems redundant without having to investigate in essentially 2 or more of everything. I began researching virtualization as an option for our environment, and settled on comparing XenServer from Citrix and VMWare. Ubuntu is our Linux distro of choice.

Virtualization ain't cheap, not if you're trying to realize all of the benefits. Sometimes decision-makers don't get that, and are prone to throw around the idea without understanding the upfront investment necessary to successfully implement it. It became immediately clear that VMWare, the industry standard, was also the most expensive. At a minimum you need vSphere Essentials Plus to get HA and vMotion, not to mention the need to invest in a fast storage system like a SAN. This can easily start you off in the area of $3500. On the other hand, you can start off for $1000 per server with XenServer. XenServer doesn't officially support Ubuntu 64-bit, but I figured I'd try it out anyway. Ultimately though this turned out to be a real barrier and I couldn't successfully create virtual Ubuntu servers at all. I wiped the server and went with ESXi4 instead.

VMWare does actually officially support Ubuntu, even the 64-bit version, which was nice, and I found it pretty easy to get a basic machine up and going pretty quickly. What I wanted to try next was P2V. We have way too many new servers to go completely virtual, but the ability to create virtual copies of physical servers and spin those up in the event of a server failure, or even to add to our farm by using a base image, would be pretty valuable. I downloaded and used the Standalone Converter, which did not work at...all. I pretty much tore my hair out trying to figure out why the conversion was failing. I thought it was a network problem, then perhaps a permissions problem. In the end I figured out that it was a problem with GRUB. If I understood correctly Ubuntu 10.04 uses GRUB2, and VMWare attempts to install GRUB when doing the conversion, which resulted in the virtual machine being unbootable. I attempted to use the original Ubuntu ISO as an emergency boot disc and edit GRUB manually, but that failed miserably.

I struggled with this for a bit and tried to get this working a number of ways. In all of my research and internet wanderings I came across a suggestion to create images of the physical servers using something like Clonezilla and using those as the basis for P2V. It isn't exactly the same since the benefit of P2V is hot-cloning, but I found another imaging tool called Mondo Rescue which actually does the trick nicely. It allows you to create ISOs without having to take the server down, although it is inaccessible while the image is being created. But still, it means no one has to go to the colo to physically reboot the server and create an image, which in turn means we'll be more likely to keep images recent.

Mondo Rescue in and of itself is a pretty awesome utility as well. I made images of my test boxes and restored them a few different times, and it was pretty easy to do, so I'm pleased about that.