DeutschEnglish

Submenu

 - - - By CrazyStat - - -

13. August 2014

Icinga: Monitor refused mails in postfix mailqueue

Filed under: Linux,Server Administration — Tags: , , , , , , , , , , — Christopher Kramer @ 12:09

In case your server gets listed on blacklists, mails will get refused by destination servers and stick in the deferred mail queue for some time until the sender finally gets a mailer daemon.

As it takes some time until the sender gets the mailer daemon and informs the server admin, it would be better if you could directly get notified by Icinga/Nagios when a mail is in the deferred queue because the destination server refused it.

Therefore I wrote a small shell script which I want to share with you here. I am assuming Debian Wheezy with Icinga and a postfix mailserver.

Create the shell script with the actual plugin in

/usr/lib/nagios/plugins/check_mailq_blacklist :
#!/bin/sh
# detects if mails in mail queue were refused by destination server (because of blacklist?)
# From https://blog.christosoft.de/2014/08/icinga-monitor-refused-mails-postfix-mailqueue/
# Version: 2017-03-07

if mailq | grep -qP "(refused to talk to me(?!(.*out of connection slots)))|(unsolicited mail originating from your IP)|(temporarily deferred due to user complaints)"
then
  mails=`mailq | grep -oP "(refused to talk to me(?!(.*out of connection slots)))|(unsolicited mail originating from your IP)|(temporarily deferred due to user complai$
  echo "$mails mail(s) were refused, check mailq!"
  if [ "$mails" -le 10 ] && [ "$mails" -gt 1 ]; then
    # 2-10 mails -> warning
    echo "\nWarning. | refused=$mails;2;11;0"
    return 1;
  fi
  if [ "$mails" -gt 10 ]; then
    # more than 10 mails -> critical
    echo "\nCriticial! | refused=$mails;2;11;0"
    return 2;
  fi
  return 1;
else
  echo "Ok, there seems to be no refused mail in the mailq | refused=0;2;11;0"
  exit 0;
fi

This will check for the texts “refused to talk to me” (not followed by “out of connection slots”) and “unsolicited mail originating from your IP” in the mailq output. These are the most common errors you get when the destination server has your server’s IP blacklisted.  In case at least one mail was refused, this causes a warning state in icinga. If more than 10 mails were refused, it causes a critical state.

Now you need to make this script executable:

chmod +x /usr/lib/nagios/plugins/check_mailq_blacklist

Now create the config file for the plugin in

/etc/nagios-plugins/config/mailq_blacklist.cfg :
# 'check_mailq_blacklist' command definition
define command{
        command_name    check_mailq_blacklist
        command_line    /usr/lib/nagios/plugins/check_mailq_blacklist
}

So now we have the command and need to define a service that uses it. Let’s say we use this locally for localhost. In

/etc/icinga/objects/localhost_icinga.cfg

add:

define service{
        use                             generic-service
        host_name                       localhost
        service_description             Mail Queue Refused Mail
        check_command                   check_mailq_blacklist
        }

This is it, just restart icinga and you are done:

service icinga restart

I hope this is of use to somebody.

Of course it is also useful to monitor in Icinga, if you are on some of the most used blacklists. A script to do this can be found here.

Recommendation

Try my Open Source PHP visitor analytics script CrazyStat.

7. February 2013

Icinga / Nagios: Notify a group of contacts about a group of hosts

In Nagios/Icinga, you can easily define which contacts or contact groups get notified for a certain service in the service definition:

 define service{
        host_name               linux-server
        service_description     check-disk-sda1
        check_command           check-disk!/dev/sda1
        max_check_attempts      5
        check_interval          5
        retry_interval          3
        check_period            24x7
        notification_interval   30
        notification_period     24x7
        notification_options    w,c,r
        contact_groups          linux-admins
        }

(Source of this example: Icinga documentation)

So only contacts of the contact group “linux-admins” would be informed about problems regarding this service.

You could also use the “contacts” directive to list individual contacts or list multiple contact groups.

But often, the responsibility of admins is not defined through services, but through hosts. Usually, there is a group of admins for linux servers and one for windows servers. Or a group for intranet servers and one for internet servers. As admins usually are annoyed if they get notifications about servers they are not responsible for, it is usually a good idea to only notify those admins that are responsible.

So you can also do this at the host-definition:

 define host{
        host_name                       bogus-router
        alias                           Bogus Router #1
        address                         192.168.1.254
        parents                         server-backbone
        check_command                   check-host-alive
        check_interval                  5
        retry_interval                  1
        max_check_attempts              5
        check_period                    24x7
        process_perf_data               0
        retain_nonstatus_information    0
        contact_groups                  router-admins
        notification_interval           30
        notification_period             24x7
        notification_options            d,u,r
        }

(Source of example: icinga documentation)

So only the contact_group “router_admins” would be notified for this host.

But one thing where the “contacts” and “contact_groups” directive is missing, is the hostgroups definition. It is not possible to directly assign a contact group  or list of contacts to a hostgroup or the other way round. So here is how it can be done with another type of definition.

Group your hosts

First, define a group of hosts for each group of admins. So for example, group all intranet servers in one and all internet servers in another group. You probably already did this.

define hostgroup{
        hostgroup_name          intranet-servers
        alias                   Intranet Servers
        members                 intra1, intra2, intra3
}
define hostgroup{
        hostgroup_name          internet-servers
        alias                   Internet Servers
        members                 inter1, inter2, inter3
}

See the icinga documentation for details. Note to use the shortnames in “members”.

You can also define things the other way round: When defining a host, say which hostgroup it belongs to:

define host{
        use                     generic-host
        host_name               intra1
        alias                   intra1.local
        address                 192.168.10.1
        hostgroups              intranet-servers
        }

See documentation for details.

Group your contacts

Next, group your contacts. So create a contact-group for each group of admins so we can later assign this contact group to the corresponding group of hosts.

Example:

define contactgroup{
        contactgroup_name       intranet-admins
        alias                   Intranet Administrators
        members                 alice, bob
        }
define contactgroup{
        contactgroup_name       internet-admins
        alias                   Internet Administrators
        members                 charley
        }

See documentation. Again, you can also define it the other way round (list the contact groups at the contact-definition).

Assign contact groups to host groups

Now comes the interesting part. To do this, we use a “Hostescalation definition“.

Example:

 define hostescalation{
        hostgroup_name          intranet-servers
        first_notification      1
        last_notification       0
        notification_interval   60
        contact_groups          intranet-admins
        }

 define hostescalation{
        hostgroup_name          internet-servers
        first_notification      1
        last_notification       0
        notification_interval   60
        contact_groups          internet-admins
        }

This will make sure internet-admins get informed about internet-servers and intranet-admins about intranet-servers. “last-notification 0” means that all notifications will get sent to this group of contacts. You can adjust the notification_interval (in minutes) if you want.

The cool thing here is that you can also define that if the problem still occurs after 5 notifications, the other team of admins gets notified:

define hostescalation{
        hostgroup_name          intranet-servers
        first_notification      1
        last_notification       3
        notification_interval   30
        contact_groups          intranet-admins
        }
define hostescalation{
        hostgroup_name          intranet-servers
        first_notification      4
        last_notification       0
        notification_interval   60
        contact_groups          internet-admins, intranet-admins
        }

This would notify “intranet-admins” 3 times (every 30 minutes) about problems with “intranet-servers”. If the problem is still not solved, “internet-admins” will get notified as well. So the internet-admins won’t get bothered with short problems that the intranet-admins can fix, but will still get informed if the problem is not solved for some time.

More information on hostescalation and serviceescalation in the documentation here, here and here.

I hope this helped somebody.

30. January 2013

Nagios / Icinga: Monitor (local) memory usage

Filed under: Linux,Server Administration — Tags: , , , , , , , , , , , , , — Christopher Kramer @ 17:55

Nagios and its fork icinga are great monitoring tools. They come with a bundle of plugins to monitor standard services such as HTTP, SMTP, POP3, load and stuff like that. And there are lots of 3rd party plugins available for almost everything else you can think of.

But one standard thing that is missing in the official nagios-plugins package is a plugin to check memory usage (of the local machine).

So here is how to install one. I assume a Debian system with Icinga running – you might want to adjust paths for other distros or nagios.

  1. Download the plugin here
    e.g. from the shell:

    wget https://exchange.icinga.com/exchange/check_memory/files/784/check_memory.pl
  2. Then move the file to the other plugins
    mv check_memory.pl /usr/lib/nagios/plugins/check_memory.pl
  3. Make it executable
    chmod +x /usr/lib/nagios/plugins/check_memory.pl
  4. Try to run it:
    perl /usr/lib/nagios/plugins/check_memory.pl -w 50% -c 25%
  5. This should give something like “CHECK_MEMORY OK – […] free […]”. If an error occurs, you probably need to install the perl module Nagios::Plugin. On Debian, the easiest way is:
    apt-get install libnagios-plugin-perl

    On other distros, you might use CPAN:

    perl -MCPAN -e 'install Nagios::Plugin'

    This will ask you lots of questions and install lots of dependencies (where you should say “yes”).

  6. Configure the check_memory command. To do this, create a file /etc/nagios-plugins/config/memory.cfg with this content:
    # 'check_memory' command definition
    define command{
            command_name    check_memory
            command_line    perl /usr/lib/nagios/plugins/check_memory.pl -w $ARG1$ -c $ARG2$
            }
  7. Now you can use the check_memory command to define a service. For example, add this to /etc/icinga/objects/localhost_icinga.cfg (assuming you define localhost-services there):
    define service{
            use                             generic-service
            host_name                       localhost
            service_description             Memory
            check_command                   check_memory!50%!25%
            }

    This will send you a warning when memory usage is 50% and critical when only 25% is free. You might want to adjust these values of course depending on what is normal on your system and how early you want to be notified.

  8. Check your configuration:
    /usr/local/icinga/bin/icinga -v /etc/icinga/icinga.cfg
  9. Restart Icinga / Nagios if the preflight-check was okay:
    /etc/init.d/icinga restart

This should be it.

I hope this helped somebody.

To monitor memory usage of a remote server, you’ll need SNMP for example. Maybe I’ll post another blog post on this soon.