Building CheckMK configurations with Puppet

Monitoring is an important role of any serious ops team. It can also be a big drain on resources to make sure everything that needs to be monitored is actually being checked. As with many things, a bit of automation can save a vast amount of time here.

Assuming you are already using Puppet, one potentially ideal situation is having your monitoring config built automatically by the server that needs monitoring. This config can then be shipped back to your monitoring server by puppet and your newly installed (or updated) machine will be automatically added to monitoring. Puppet has built in resources types specifically for Nagios configs but what about CheckMK? This needs a slightly different type of config and, without a resource type we are left with either writing some serious Ruby code or a bit of improvisation! I’m going to outline a technique I workout after several days of head scratching.

What follows is based somewhat on Jeremy Thornhill’s excellent post. Using Jeremy’s code, I found some issues with getting the config file to be manifested before the inventory and reload. Creating a reliable dependency between all three pieces proved an interesting problem.

One convenience of CheckMK is that it allows us to place fragment files in a directory and have them consolidated when they are built to the final Nagios config. So, what we want to aim for with our Puppet-based monitoring configurator is one config fragment per monitored server which will land in /etc/checkmk/conf.d/puppet.

Our fragment is going to look like this:

ipaddresses["web01.matsharpe.com"] = "1.2.3.4"

all_hosts += ["web01.matsharpe.com|brokendns|http|linux",]

For those unfamiliar with CheckMK, this defines a server call ‘web01.matsharpe.com,’ and specifies a couple of host tags that indicate what will need monitoring on the server. Since some of our environments do interesting things with DNS, I also specified the ‘brokendns’ tag by default. This causes the first line above to come into play and forcibly define the IP address for the host.

Following on from the above, lets imagine we have a simple puppet class that installs apache for us. We want our CheckMK server to apply a host tag that will cause apache to be monitored.

Here is the class that we’ll apply to the web server:

class apache2 {
  package { 'apache2': ensure => installed, }

  nagios::checkmk::addtag {'http': }
}

This is pretty standard with the exception of the last line of the class. The ‘http’ is the host tag that will ultimately be written to the CheckMK config on the monitoring server. We can specify the ‘addtag’ line anywhere in our puppet code base, as many times as we like in order to monitor different aspects of the system. Note: There is no need to use an ‘include’ line - the above is literally all you need. It is also important to note that you’ll need to configure the rest of CheckMK to understand the host tags that you are using and what checks need to be executed for each tag.

Now, lets get onto the main monitoring class. In our code base, we have a module called ‘nagios.’ This includes ‘client-side’ config (i.e. the monitored server) and ‘server-side’ config (i.e. the monitoring server.) Here is our main configurator class with comments in-line:

#
# Main client side defined type. All we do here is establish a dependancy chain from an instance
# of the 'addtag' type.
#
define nagios::checkmk::addtag {
  # Just define our dependency, nothing else
  include nagios::checkmk::build_exported_resource
  Nagios::Checkmk::Addtag["$title"] -> Class['nagios::checkmk::build_exported_resource']
}

#
# Build_Exported_Resources - This is where the config is actually generated.
#
# We define the config file fragment (from a template) and an exec that will inventory
# the host. It is important to note that, although these resources are created on the client,
# they are actually realised on the monitoring server.
#
class nagios::checkmk::build_exported_resource {
  $mk_confdir = "/etc/check_mk/conf.d/puppet"
  $checkmk_no_resolve = true
  
  if $::fqdn {
    $mkhostname = $::fqdn
  } else {
    $mkhostname = $::hostname
  }

  if $ec2_public_ipv4 {
    $override_ip = $ec2_public_ipv4
  } else {
    $override_ip = $ipaddress
  }

  # Running 'puppet node clean --unexport '
  # on the puppet master will cause these resources to be cleanly
  # removed from the check_mk server.

  # the exported file resource; the template will create a valid snippet
  # of python code in a file named after the host
  @@file { "$mk_confdir/$mkhostname.mk":
    content => template("nagios/collection.mk.erb"),
    notify  => Exec["checkmk_inventory_$mkhostname"],
    tag     => "checkmk_conf",
  }

  @@exec { "checkmk_inventory_$mkhostname":
    command     => "/usr/bin/cmk -I $mkhostname",
    notify      => Exec["checkmk_refresh"],
    refreshonly => true,
    tag         => "checkmk_inventory",
    onlyif      => "test -f $mk_confdir/$mkhostname.mk"
  }

}

#
# Import_Resources. This is called only on the monitoring server and realises 
# the resources from the above sections. It appears to be important that this code
# is in the same file as the client side code.
# The overall result is that:
#   We should get an appropriately named file in /etc/checkmk/conf.d/puppet/
#   "/usr/bin/cmk -I hostname" should be called to inventory the host
#   "/usr/bin/cmk -O" should be executed to rebuild the config
#
# If we remove a host from puppet (by running "puppet node clean --unexport" the
# hosts monitoring config should be neatly removed.
#
class nagios::checkmk::import_resources {
  $mk_confdir = "/etc/check_mk/conf.d/puppet"

  exec { "checkmk_refresh":
    command     => "/usr/bin/cmk -O",
    refreshonly => true,
  }

  # Realise all the file fragments exported from the monitored hosts
  File <<| tag == "checkmk_conf" |>>
  # in addition, each one will have a corresponding exec resource, used to re-inventory changes
  Exec <<| tag == "checkmk_inventory" |>>

  file { "$mk_confdir":
    ensure  => directory,
    purge   => true,
    recurse => true,
    notify  => Exec["checkmk_refresh"],
  }
}

Config template (collection.mk.erb) - Essentially a block of Ruby code, this iterates through all the classes on the client looking for instances of ‘Addtag.’ These are then dropped into an array and, on the last line, converted to a list of host tags.

<% mktags = [] -%>
<% system "host #{mkhostname}" -%>
<% if (has_variable?("checkmk_no_resolve") || ( $? != 0 )) then -%>
ipaddresses["<%= mkhostname -%>"] = "<%= override_ip %>"
<% mktags << "brokendns" -%>
<% end -%>
<% scope.catalog.resource_keys.select { |r,name| r == "Nagios::Checkmk::Addtag"}.each do |r,name| -%>
<% mktags << name -%>
<% end -%>
all_hosts += ["<%= mkhostname -%>|<%= mktags.sort.join('|') %>",]

Putting it all together! Finally, we need to call the config into existence on the monitoring server. Since we may have a puppet module called ‘Nagios’ this can be done with a server.pp file that would include something like the following.

class nagios::server {
  include nagios::checkmk::import_resources
}

And there you have it - fully automated CheckMK configs!