Monitoring is an important role of any serious ops team. It can also be a big drain on resources to make sure everything that needs to be monitored is actually being checked. As with many things, a bit of automation can save a vast amount of time here.
Assuming you are already using Puppet, one potentially ideal situation is having your monitoring config built automatically by the server that needs monitoring. This config can then be shipped back to your monitoring server by puppet and your newly installed (or updated) machine will be automatically added to monitoring. Puppet has built in resources types specifically for Nagios configs but what about CheckMK? This needs a slightly different type of config and, without a resource type we are left with either writing some serious Ruby code or a bit of improvisation! I’m going to outline a technique I workout after several days of head scratching.
What follows is based somewhat on Jeremy Thornhill’s excellent post. Using Jeremy’s code, I found some issues with getting the config file to be manifested before the inventory and reload. Creating a reliable dependency between all three pieces proved an interesting problem.
One convenience of CheckMK is that it allows us to place fragment files in a directory and have them consolidated when they are built to the final Nagios config. So, what we want to aim for with our Puppet-based monitoring configurator is one config fragment per monitored server which will land in /etc/checkmk/conf.d/puppet.
Our fragment is going to look like this:
For those unfamiliar with CheckMK, this defines a server call ‘web01.matsharpe.com,’ and specifies a couple of host tags that indicate what will need monitoring on the server. Since some of our environments do interesting things with DNS, I also specified the ‘brokendns’ tag by default. This causes the first line above to come into play and forcibly define the IP address for the host.
Following on from the above, lets imagine we have a simple puppet class that installs apache for us. We want our CheckMK server to apply a host tag that will cause apache to be monitored.
Here is the class that we’ll apply to the web server:
This is pretty standard with the exception of the last line of the class. The ‘http’ is the host tag that will ultimately be written to the CheckMK config on the monitoring server. We can specify the ‘addtag’ line anywhere in our puppet code base, as many times as we like in order to monitor different aspects of the system. Note: There is no need to use an ‘include’ line - the above is literally all you need. It is also important to note that you’ll need to configure the rest of CheckMK to understand the host tags that you are using and what checks need to be executed for each tag.
Now, lets get onto the main monitoring class. In our code base, we have a module called ‘nagios.’ This includes ‘client-side’ config (i.e. the monitored server) and ‘server-side’ config (i.e. the monitoring server.) Here is our main configurator class with comments in-line:
Config template (collection.mk.erb) - Essentially a block of Ruby code, this iterates through all the classes on the client looking for instances of ‘Addtag.’ These are then dropped into an array and, on the last line, converted to a list of host tags.
Putting it all together! Finally, we need to call the config into existence on the monitoring server. Since we may have a puppet module called ‘Nagios’ this can be done with a server.pp file that would include something like the following.
And there you have it - fully automated CheckMK configs!