Tuesday, April 1, 2014

Zabbix Performance

Email and monitoring are two services that I believe can and should be outsourced if you don't have the people power to dedicate to them fully. Like so many technologies they can be fairly simple to get up and running, but how often is the out-of-the-box install sufficient for your business? You can install Exchange fairly easily, and there are countless tutorials and articles out there to guide you through it, but what happens when it breaks or when some new functionality needs to be added? Monitoring systems love to advertise with catchy slogans like "Get your entire infrastructure monitored in less than 10 minutes". That's true...if all you're monitoring are basic Linux stats like disk space, cpu, memory, etc. Need to add JMX queries, or monitor a custom app? Now it's time to roll up your sleeves and get to work.

We were using LogicMonitor to monitor our entire infrastructure when I came onboard almost two years ago (my Old Boss had brought it in). Every device, Java apps, file sizes, whatever custom queries we needed to keep an eye on our production platform, was handled by this app. My New Old Boss (who started about 6 months ago after my Old Boss quit, and has since left as well) came in and wanted to chuck the system. He was a big advocate of "if you can do it in-house, do it". LogicMonitor wasn't perfect, and we had our share of problems with it, but in hindsight a large part of that pain was that we hadn't followed through with set up. We were using default values and triggers for a lot of things, and they created noise. We didn't tune the metrics we received from LogicMonitor's setup team to match our environment. Rather than invest the time to learn the system we had, we scrapped it and went with Zabbix.

I hate Zabbix. I hated it from the beginning. We didn't have much of a burn-in of the product; it got installed, we started "testing" it, and suddenly it was our production monitoring platform for all of the new equipment we were putting into production. This was part of a major rollout as we were introducing a new platform, so it all became one and the same. One of my biggest complaints about Zabbix was that it wasn't user-friendly. Adding a host and its metrics to Zabbix had a pretty unnatural step process, as did adding users/groups and setting up alerts. For example, to set up a host with alerting or particular metrics you have to first add the host, then create items, then create triggers for the items, then create actions for the triggers. You can create host groups to aggregate hosts of like purpose, and you can create a template to group items, but you can't apply a template to a host group—you have to apply them to hosts individually still. When I raised concerns about the complexity of its front-end—because really, a GUI should not be more difficult to use than say doing creating the same thing from the command-line in Nagios as an example—New Old Boss explained that Zabbix was "expert-friendly".

Months later, New Old Boss has moved on and I've inherited Zabbix, which I pretty much let New Old Bus handle while he was here, and what a bag of trouble it has turned into. "Bag of trouble" is of course polite blog speak for what I really want to call it.