Wednesday, November 27, 2013

The Long and Dirty Story of Me and Perforce (with a cameo from our friends Backup and VMware)

This will read somewhat as a comedy. Considering how much time I have spent on a task that was meant to be trivial (or at least should have been trivial)...well, guess I am laughing too, but it's more of those shaking-my-head-in-dismay laughs.

I was tasked with making an image of our Perforce server. Okay, I've done this before. I've used Mondo Rescue in my previous job to do just this very thing, so I'm not anticipating trouble. First thing's first. I log in to get the lay of the land because this box isn't/wasn't under the jurisdiction of Ops, and I've never been on it before. I check it out and find that it's running RHEL 4. Oh dear. Our other boxes are CentOS and at least 5, so this machine has not seen love in a long time. Out of curiosity I start poking around, checking out the specs, wanting to know what kind of hardware we're dealing with. Dmidecode tells me it's a virtual machine. Not only that, but VMware. We're running XenServer now, but no worries. We still have a couple of vSphere Clients installed on our limited supply of Windows boxes. Looks like we have one VMware host in the data center, so let's hit it.

Except...the perforce machine isn't on that host. Oh dear again. That means we have an undocumented VMware server somewhere. To the Googles!

First try was looking to see if there's any way to ascertain a hosts's name/IP from the guest itself. No such luck. Apparently this functionality is locked down for security reasons. Next stop: this fellow wrote a handy little VMware scanner tool that pings your network for live hosts and then makes a call to VMware's API to test if the other end of the ping is indeed a VMware server. Brilliant. Found the little sucker easily with this tool. IP in and I went to log in...and found that none of the standard passwords worked. Montage of me trying every single password we have in our password file until I stumble across the one that works. Very efficient, I know.

Finally I'm able to connect to the host. At this point I figured I could simply clone the vm in question and call it a day. I could even use the cloned vm to work on converting it to XenServer. Here is where I run into one of the many limitations of ESXi vs the paid version of VMware. You can't clone a running vm in ESXi. You can't make a clone period. You can export to OVF, but you have to power the guest down. When you're talking about a perforce server in heavy use by your development team, not to mention a host that hasn't been powered down in who knows how long, people get nervous, with good reason. Suddenly the scope increases. Before you can power down the guest and export to OVF, you have to take a backup and verify that it works so that you can recover the data if things go south.

Did you notice that I said "take a backup"? One thing I've noticed over the years is that you will rarely come across a corporate Windows environment that doesn't have some kind of backup going on, be it Backup Exec or even the built-in backup utility that Windows servers have. Linux environments? For some reason backups take a backseat and more often than not or left to a handful of scripts scattered here and there that get written and cronned, never to be heard from again. Such was the case here. There was in fact a cronjob that was supposed to be backing up the perforce checkpoint, journal, and versioned files, but that hadn't reliably ran since December.
Now we have a new task: take a backup of the server, and fix the existing backups. A quick glance at the script and the error logs tells me what the issue is, and I quickly fix that. I let the backup run successfully at midnight, and the next night I attempt to grab a copy of the backup. It's 35GB, compressed. It takes a while to grab that across the LAN. Once I have it in hand, I proceed with the next bit which is to set up a Perforce server and test the restored data. At first I tried to use Perforce's hosted trial. I figured it would be quick and easy to simply use some of my data. After signing up for it I found that they provide you with a dataset to work with, so that wasn't going to help me. I then downloaded the free 20-person license version of Perforce and install it on my Mac. All good there (at least I get it to start up) until I attempt to untar the backup. It filled up my disk. I didn't have enough space.

Next we jump to a fresh CentOS install with plenty of space. Now we're cooking. I grab a copy of the executable, load it on my server. So, quick note here. The documentation for Perforce leaves a bit to be desired. One thing is that there is, to my knowledge, no actual installation as I tend to think of it. For me an installation entails copying some man files in, maybe a README or some other manual, some executables and libraries, a directory structure that includes a config file, that kind of thing. Certainly for a server. Perforce's download gives you a single executable, p4d. That's it. The instructions say download the executable, chmod + x it, and away you go. It makes reference to environment variables that can apparently be set in a file, but doesn't give you much instruction about the file. It's apparently something you create and put somewhere, and then there are a bunch of options you can either specify there or put on the command line for startup, such as where the root of Perforce will be. This is all cobbled together along with references to p4 admin, which is a command line tool for interacting with Perforce, but the documentation never comes out and says that that's what it is and that maybe you should download it since it's not included by default.

So, in case you ever have to throw together a Perforce server on the fly, here's what I know:

  • download both p4d and p4, which are two different tools
  • create directories manually where you want to write the journals and checkpoint files, and where you want the perforce root to be. The perforce root is where the db files and depots and everything else pertaining to the data you hold in perforce lives. 
  • create an unprivileged user and chown the directories above 
  • start perforce using p4d -dr /usr/local/p4root -J /var/log/p4journal -L /var/log/p4err -p 1666. What you're essentially doing here is telling perforce to run as a daemon (-d), where the root of the perforce server is (-r), where to put the journal (-j), where to log (-l) and what port clients should connect to (-p). 
  • stop perforce by using p4 admin stop
After getting this stuff sorted (and figuring out how to use the tools) I was able to start up a second perforce server. I got the backups, restored them to the server, and tried to start it up. Unfortunately, since the database I was restoring had a user file with 40-something users in it, the 20-person free license I'd downloaded was insufficient and I couldn't start the server up properly. I tried to find a way to delete users, but ran into a gang of problems based on open files. I fell prey to the Rabbit Hole Effect and wasted like a day working on a Python script to automatically query perforce for users with open files and delete them, which ultimately did not work due to some missing credentials and permissions and other shenanigans related to the innerworkings of Perforce. Finally I contacted Perforce and explained what I was trying to do and requested a 45-user license trial. They sent me the appropriate file and I attempted to load up the server using that.

Two things went wrong here:

  1. They actually sent me a 20-person license again. Same problem as before here. 
  2. When I tried to start the server up, it somehow made the actual prod server go down. 
This is a big deal of course. I had no idea until developers started approaching me and saying, "Hey, do you know what's going on with Perforce?" Whoops. I was able to start it back up, but then I was afraid to keep trying with my replica because I didn't know how in the world what I'd done could have affected the prod server, but it clearly had. Despite the fact that we had no support contract with them anymore, Perforce was kind enough to work with me. It turns out that if you run the startup command, /usr/local/bin/p4d -L /home/perforce/logs/p4err & in my case, there is an environment variable that tells you what perforce server to point to. If it's not set, it automatically tries to connect to a server called Perforce, which happens to be the name of our prod server. So, with no host flags set in the command, the p4d command reached out over the network to control our production Perforce server called Perforce. 

Did I mention that this is not explicitly stated in the instructions as far as I could find? 

So that solved that mystery. I got a new license file for 45 people and with a better knowledge of the hidden environment variables, was able to get the second Perforce server up and running and successfully restored a backup. It only took a week! And I am now intimately familiar with Perforce. Which is great since the company is moving to Git. :) 

No comments:

Post a Comment