Wednesday, February 22, 2012

Unable To Start MySQL

There seems to be a campaign by the PTB to make MySQL the Bringer of My Grief. It's getting to the point where I'm starting to dread doing anything in MySQL because I just know it's going to lead down a rabbit hole. Case in point:

I set up MySQL replication in our data center last week, or rather, I tried to set up replication. The foundation was laid, but when I went to start the MySQL service on the slave server I was denied. It refused to start, displaying an error that it was unable to connect to the MySQL sock file. Since it was 2am at this point, I decided to tackle it another day rather than risk some sleep-deprived fumblings that could quickly make things go from annoying to catastrophic.

I'm glad I made that move since further investigation revealed this to not be a "one and done" resolution, or one-size-fits-all situation. Here are the symptoms:

userprompt$ sudo service mysql status

mysql respawn/post-start, (post-start) process 21012

userprompt$ mysql -u root -p

ERROR 2002 (HY000): Can't connect to local MySQL server through socket '/var/run/mysqld/mysqld.sock' (2)

userprompt$ sudo service mysql stop

mysql stop/waiting

userprompt$ sudo service mysql start

Nothing but hang-time on that last one.

There are a number of reasons this could happen. It could be due to the bind-address directive in my.cnf. Some report that it doesn't work in Ubuntu 10.04 if you use anything other than the loopback address (127.0.0.1). I saw it happen once because I actually had the wrong IP address in (moving the server from one subnet to another). I read one person's account whose workaround was to create the debian-sys-maint account again, although I didn't see a reason for doing this since the account was there already. Others have reported that the issue is due to mysqld not starting if eth0 hasn't come up, which could be a problem if you're not using eth0 (such as in the case of bonding your NICs).

I checked out my logs. A good tenet is "Always go to the logs". Even if you think you know what the problem is, it takes 2 minutes to verify it (or disprove it!) by checking the logs. It's really easy to forget this step and get quickly swept away in chasing clues. More times that I care to count the solution was super simple and could have been solved in minutes by checking the logs, but I wasted time working on assumptions. That being said, it's also easy to let the logs mislead you and get bogged down in some obscure and/or unrelated error message.

But I digress. The logs showed the following:
 2:04:57 [ERROR] /usr/sbin/mysqld: Can't find file: './mysql/host.frm' (errno: 13)

120216  2:04:57 [ERROR] Fatal error: Can't open and lock privilege tables: Can't find file: './mysql/host.frm' (errno: 13)

120216  2:05:27 [Note] Plugin 'FEDERATED' is disabled.

^G/usr/sbin/mysqld: Can't find file: './mysql/plugin.frm' (errno: 13)

120216  2:05:27 [ERROR] Can't open the mysql.plugin table. Please run mysql_upgrade to create it.

120216  2:05:27  InnoDB: Started; log sequence number 0 44263

That's an important clue. I looked in /var/lib/mysql/mysql to verify the presence of those files, and they were indeed there. Since these databases were copied from the master via a mysqldump, I was lucky enough to have the prototype available so I could make a comparison. I noticed that the owner and group for the files in that directory were root on my slave server, and set to mysql:mysql on the slave. I chowned the entire directory and tried to start mysql again. Sure enough, it started.

Needless to say I'm psyched that the solution turned out to be so simple. of course, once I turn on replication the real fun begins. Stay tuned.

2 comments:

  1. This is exactly what happened to me, however im not a linux expert. Could you clarify what you mean by
    "chowned the entire directory and tried to start mysql again"?
    Thank you very much

    ReplyDelete
  2. "chown" is short for change owner, so the command to do so recursively (i.e. throughout the entire directory) is "chown -R user:group dir" where user and group are the actual user and group on your system that you want to have ownership, and dir is the directory that contains the files you want to change. In this scenario I likely did "chown -R mysql:mysql /var/lib/mysql" to make the user and group for the /var/lib/mysql directory and files belong to the mysql group and user.

    By restart mysql I just restarted the mysql service. The syntax varies sometimes depending on your distro, but you can try "service mysql status" first to see if it's running, and then "service mysql start" or "service mysql restart". If that doesn't work you can always start it by typing "/etc/init.d/mysql start".

    HTH

    ReplyDelete