Thursday, February 23, 2012

Missing MySQL Files

I'm back with yet another MySQL mystery. Seriously, I think MySQL has it out for me. I turned replication on today and did a refresh of 'SHOW SLAVE STATUS\G;' eveyr few seconds. Replication stopped twice. Once it was because I'd neglected to chown my entire /var/lib/mysql directory to the mysql user and group. The second time was because it couldn't find a table that a query was attempting to modify. I went into the directory and sure enough the .frm file was there (which defined the table; metadata) but the MYD (data) and MYI (index) files weren't there. I worked with one of our developers to try and figure out what had happened and where it'd gone. I restored backups from a variety of days to see when that files had gone missing. I wanted to see if it had been related to any other server work I'd done, such as migrating MySQL over to a different machine. At no point were the files there.

I ran across a post that reminded me that MyISAM and InnoDB engines store their files differently. We use MyISAM as the default so in theory no table should have been using a different engine, but I was pulling up blanks so I gave it a look.


mysql> show create table mynewtable\G;

*************************** 1. row ***************************

       Table: worktypematerials

Create Table: CREATE TABLE `mynewtable` (

  `Marker` int(10) unsigned NOT NULL AUTO_INCREMENT,

  `ParentMarker` varchar(50) DEFAULT NULL,

  `Material` varchar(20) DEFAULT NULL,

  `Description` text,

  `Quantity` double DEFAULT NULL,

  `SortOrder` int(11) DEFAULT NULL,

  PRIMARY KEY (`Marker`)

) ENGINE=InnoDB DEFAULT CHARSET=latin1

1 row in set (0.00 sec)

The table is using InnoDB. That's why the replication couldn't find the files the statement seemed to be looking for. How is it that this lone database has tables using a different storage engine than the others? Unclear. One speculation is that because we originally migrated these database files from their internal server, they may have been using InnoDB in their environment.

The solution seemed to be to simply convert the engine from InnoDB to MyISAM with the ALTER TABLE statement. Unfortunately the slave server didn't think there was a table so trying that operation yielded an error that it did not exist. What was really odd was that the setup on the master server was the same. The same ib_logfile files were in place, the .frm file was there on both machines, but for some reason the master could read the table just fine but the slave could not. Pretty frustrating. The log files showed:

120222 15:21:14 [ERROR] Cannot find or open table database01/blahblahtable

It pointed me to http://dev.mysql.com/doc/refman/5.1/en/innodb-troubleshooting.html which said the issue is an orphaned .frm file for which there is no corresponding table inside InnoDB, with the solution being "simply delete the .frm file". But, ummm...I actually want that table. And in fact I can see it in SHOW TABLES. One poster had done the exact same steps as I (stopped mysql, zipped the data directory up, moved everything over) but had deleted the InnoDB files, which I had actually not done.This meant that although all of the required files were there, there was still some inconsistency between InnoDB's internal storage and the frm file.

I started to post a question to Stack Exchange to get some help on how to fix this mess. Sometimes the act of trying to explain a problem in detail (and a little time away from it) helps you to see more options. As I was explaining my situation I realized that since I had the .frm file, maybe I could recreate the tables from it. I found this article from Percona, but it pretty much said you could use it to be able to describe the table but nothing else (i.e. insert data into it) which I didn't really see the point of. I found another option that looked like it could work (and the very first paragraph explained my situation perfectly, though it made me feel like a dolt), although it seemed a little more handsy than I'd hoped and introduces a margin for error (what if I missed an .frm file?).

I came up with the idea to let our in-house application recreate the tables for me. We have a database check sequence that can be run from within the application to update/create tables as necessary, so I figured I could point my installation of the app to the slave server, and run the database tools. This could in theory work except replication has been off for a bit now and quite frankly I still want to convert the tables to MyISAM for consistency (and so that this doesn't happen again since the source of the problem is that I didn't know we had InnnoDB tables at all).

One of the developers came up with another solution: dump the database to my local machine, which uses MyISAM as the default, and restore it locally. Since the InnoDB Engine isn't available on my Windows machine (due to the way I installed MySQL), it will change the InnoDB databases to MyISAM. Then dump the newly converted database back up to the database server and restore it there. Take another dump of the data directory and copy it over to the slave server and start the replication process over again.

Sounds like a plan. I'll let you know how it goes. 

Wednesday, February 22, 2012

Unable To Start MySQL

There seems to be a campaign by the PTB to make MySQL the Bringer of My Grief. It's getting to the point where I'm starting to dread doing anything in MySQL because I just know it's going to lead down a rabbit hole. Case in point:

I set up MySQL replication in our data center last week, or rather, I tried to set up replication. The foundation was laid, but when I went to start the MySQL service on the slave server I was denied. It refused to start, displaying an error that it was unable to connect to the MySQL sock file. Since it was 2am at this point, I decided to tackle it another day rather than risk some sleep-deprived fumblings that could quickly make things go from annoying to catastrophic.

I'm glad I made that move since further investigation revealed this to not be a "one and done" resolution, or one-size-fits-all situation. Here are the symptoms:

userprompt$ sudo service mysql status

mysql respawn/post-start, (post-start) process 21012

userprompt$ mysql -u root -p

ERROR 2002 (HY000): Can't connect to local MySQL server through socket '/var/run/mysqld/mysqld.sock' (2)

userprompt$ sudo service mysql stop

mysql stop/waiting

userprompt$ sudo service mysql start

Nothing but hang-time on that last one.

There are a number of reasons this could happen. It could be due to the bind-address directive in my.cnf. Some report that it doesn't work in Ubuntu 10.04 if you use anything other than the loopback address (127.0.0.1). I saw it happen once because I actually had the wrong IP address in (moving the server from one subnet to another). I read one person's account whose workaround was to create the debian-sys-maint account again, although I didn't see a reason for doing this since the account was there already. Others have reported that the issue is due to mysqld not starting if eth0 hasn't come up, which could be a problem if you're not using eth0 (such as in the case of bonding your NICs).

I checked out my logs. A good tenet is "Always go to the logs". Even if you think you know what the problem is, it takes 2 minutes to verify it (or disprove it!) by checking the logs. It's really easy to forget this step and get quickly swept away in chasing clues. More times that I care to count the solution was super simple and could have been solved in minutes by checking the logs, but I wasted time working on assumptions. That being said, it's also easy to let the logs mislead you and get bogged down in some obscure and/or unrelated error message.

But I digress. The logs showed the following:
 2:04:57 [ERROR] /usr/sbin/mysqld: Can't find file: './mysql/host.frm' (errno: 13)

120216  2:04:57 [ERROR] Fatal error: Can't open and lock privilege tables: Can't find file: './mysql/host.frm' (errno: 13)

120216  2:05:27 [Note] Plugin 'FEDERATED' is disabled.

^G/usr/sbin/mysqld: Can't find file: './mysql/plugin.frm' (errno: 13)

120216  2:05:27 [ERROR] Can't open the mysql.plugin table. Please run mysql_upgrade to create it.

120216  2:05:27  InnoDB: Started; log sequence number 0 44263

That's an important clue. I looked in /var/lib/mysql/mysql to verify the presence of those files, and they were indeed there. Since these databases were copied from the master via a mysqldump, I was lucky enough to have the prototype available so I could make a comparison. I noticed that the owner and group for the files in that directory were root on my slave server, and set to mysql:mysql on the slave. I chowned the entire directory and tried to start mysql again. Sure enough, it started.

Needless to say I'm psyched that the solution turned out to be so simple. of course, once I turn on replication the real fun begins. Stay tuned.

Tuesday, February 21, 2012

Where Did My Files Go: Root, NAS, and Linux Heartache

Well, this is interesting.

I wanted to set up a share on the NSS4000 to use as a holding spot for the homes folders I would be creating for my virtual FTP users (another post for another time). I was having difficulty adjusting the permissions on the share; I couldn't change the owner to the FTP user, which meant the whole setup wouldn't work. I found that the NSS4000 has the root_squash option hard-coded NSS4000 has the root_squash option hard-coded and the only way to fix this (other than some hacky methods to enable SSH) was to do a firmware upgrade. The instructions for the upgrade were not encouraging either. Not only do you have to run the firmware upgrade procedure twice, but it is also common for the first run-through to hang and require a hard reset after waiting 15 minutes or so. If your unit is in a data center you see the difficulty with this method.

I bit the bullet, scheduled downtime, and went out to our data center in the cold, dark, wee hours of the night to do the upgrade. Sure enough I did have to pull the plug on the thing. Luckily I had other things to do in the meantime (again, another post) so I wasn't bored.

It wasn't too long after (i.e. the next day) that I started getting backup failure reports from our provider.