- - - By CrazyStat - - -

26. February 2017

Postgrey: let nagios/icinga tell you what domains might need whitelisting

Filed under: Linux,Server Administration — Tags: , , , , , — Christopher Kramer @ 16:58

The problem: greylisting significantly delays mail from big mail providers with many servers

I still think greylisting with postgrey is very effective against spam. But now and then, mail from some senders get delayed for hours. This especially is the case for big mail service providers like Amazon SES, Microsoft Exchange Online Protection or Mailchimp. These providers have lots of servers and IP-addresses and the first attempt to deliver an email usually comes from a different server than the second. Thus, postgrey will not find a matching triplet and block the second attempt again. This will continue until one of the mail servers tried twice and postgrey found a matching tripled. This might take hours and thus cause significant delay of mails.

Therefore, if you want to use postgrey without delaying lots of mail for several hours, you need to whitelist these big email services with many servers. As new services come up and others go, you need some way to find out which servers try to send mail to your server and frequently get denied because no triplet is found.

The solution: Check your mail logs

To get an idea of what is happening and which servers might need whitelisting, I wrote a small command that analyzes your mail.log-files. Here it is:

grep -ohP 'reason=new, client_name=[^,.]+\.[^,.]+\.\K[^,.]+\.[^,]+' /var/log/mail.log* | sort | uniq -c | sort -nr | less

So what does this do?

It scans your /var/log/mail.log* files  for occurrences containing reason=new, which is what postgrey logs when no triplet was found. It then takes the domain of the server that is trying to send the mail to you, stripping the first subdomain, as this might vary from server to server that the mail service provider uses. It then sorts, counts and orders the result.

The result of this command might look like this (stripped after some lines):


You should check especially all the lines that have a lot of occurrences. They might be domains from big email service providers that need whitelisting for the reasons explained above. For example, the mail from and is mail sent by customers of these mail providers that usually is not spam. Especially, the mail servers of these providers do retry delivery lots of times, so greylisting would not block spam anyway, it would just cause significant delays. So these domains should be whitelisted.

But the domains in this list might also only be domains that are used by an ISP for dynamic IP addresses that are actually sending spam. For example, seems to be a big ISP from Thailand and multiple dynamic IP-addresses from this ISP tried to send spam, resulting in 17 occurrences where no triplet was found. This is mail that you want to be blocked/delayed.

Once you decided which domains you want to whitelist, you can do so by adding them in your whitelist-file (on Debian this is usually /etc/postgrey/whitelist_clients).

The entries should look like this:


Note: You should escape all dots with a backslash, as done in the example. Otherwise, a dot matches any character, so would also whitelist, which might be a domain that spammers register on purpose to get through servers that use inaccurate whitelists.

Automation with nagios/icinga

You could run the above command from time to time to check if new services need whitelisting. But it would be more easy, if you get notified when a new domain pops up in this list that you have not checked yet, right?

Therefore, I wrote a small nagios/icinga plugin that checks if any new domains appear on this list. It is a bit quick and dirty but does its job.

The main part is a bash script, that you could place in /usr/lib/nagios/plugins/check_postgrey_whitelist :


hosts=`grep -ohP 'reason=(new|early-retry[^,]*), client_name=[^,.]+\.[^,.]+\.\K[^,.]+(\.co|\.com)\.?[^,]+' $log | sort | uniq -c | awk '$1>=15{print $2}' | sort`;
noWhitelist=`cat $ignoreFile | sort`;

diff=`comm -23 <(echo "$hosts") <(echo "$noWhitelist") | sed ':a;N;$!ba;s/\n/ /g'`;


if [ -z "$diff" ]; then
        echo "Okay, the top domains are whitelisted or ignored";
        exit $OK;
        echo "Whitelist or ignore these senders: $diff";
        exit $WARNING;

This script ignores all domains listed in /etc/nagios-plugins/postgrey_no_whitelist  (one per line), assuming you checked that you do not want to whitelist these. So create this file as well and put the domains in that you checked and do not want to whitelist:

Now we need to define the command for this plugin:

Create the configuration where your other plugins are defined, e.g. /etc/nagios-plugins/config/postgrey_whitelist.cfg

define command {
        command_name    check_postgrey_whitelist
        command_line    /usr/lib/nagios/plugins/check_postgrey_whitelist

Now you need to create a service for your localhost that runs this command.

In your localhost service definition, e.g. /etc/icinga/objects/localhost_icinga.cfg , add this:

define service{
        use                             generic-service
        host_name                       localhost
        service_description             Postgrey Whitelist
        check_command                   check_postgrey_whitelist

Finally, make sure that nagios/icinga has read-access to your mail-logfile. You can adjust which file to check in the script above, I chose mail.log.1. So you might chown this file to the nagios user like this:

chown nagios:adm mail.log.1

Of course logrotation needs to know that new rotated files should have this owner. On a debian system, you can configure logrotate to do so in /etc/logrotate.d/rsyslog. Adjust it similar to this one:

        rotate 4
                invoke-rc.d rsyslog rotate > /dev/null

        rotate 4
        create 440 nagios adm
                invoke-rc.d rsyslog rotate > /dev/null

So I removed the mail.log from the first logrotate definition and created a new one below which is exactly the same but contains an extra line create 440 nagios adm to tell logrotate which user and permission to assign the logfiles to.

You might need to adjust some stuff depending on your distro, but I hope this helps somebody to set this up.

Update 06.03.2017: Adjusted script so it does not consider second-level domains like or as single senders anymore.


Try my Open Source PHP visitor analytics script CrazyStat.

9. December 2014

Icinga: Group all services in a servicegroup instead of using a wildcard

Filed under: Linux,Server Administration — Tags: , , , , , , — Christopher Kramer @ 16:15

At some places you can use the * wildcard as a service description (which requires use_regexp_matching=0), but sometimes it does not seem to work:

Error: Could not expand services specified

Therefore, I simply wanted to group all services in one servicegroup. That’s quite easy if you use a generic service template as a basis for all services. First, create a servicegroup “allservices”:

define servicegroup {
        servicegroup_name               allservices
        alias                           All Services

Then edit your generic service template (see servicegroups line):

# generic service template definition
define service{
        name                            generic-service ; The 'name' of this service template
        active_checks_enabled           1       ; Active service checks are enabled
        passive_checks_enabled          1       ; Passive service checks are enabled/accepted
        parallelize_check               1       ; Active service checks should be parallelized (disabling this can lead to major performance probl$
        obsess_over_service             1       ; We should obsess over this service (if necessary)
        check_freshness                 0       ; Default is to NOT check service 'freshness'
        notifications_enabled           1       ; Service notifications are enabled
        event_handler_enabled           1       ; Service event handler is enabled
        flap_detection_enabled          1       ; Flap detection is enabled
        failure_prediction_enabled      1       ; Failure prediction is enabled
        process_perf_data               1       ; Process performance data
        retain_status_information       1       ; Retain status information across program restarts
        retain_nonstatus_information    1       ; Retain non-status information across program restarts
        notification_interval           0               ; Only send notifications on status change by default.
        is_volatile                     0
        check_period                    24x7
        normal_check_interval           5
        retry_check_interval            1
        max_check_attempts              4
        notification_period             24x7
        notification_options            w,u,c,r
        contact_groups                  admins
        servicegroups                   allservices ; ADD THIS TO ADD ALL SERVICES INTO THE allservices GROUP
        register                        0       ; DONT REGISTER THIS DEFINITION - ITS NOT A REAL SERVICE, JUST A TEMPLATE!

This assumes all your services use this template like this (see use line):

define service {
        hostgroup_name                  ssh-servers
        service_description             SSH
        check_command                   check_ssh
        use                             generic-service  ; USE THE TEMPLATE ABOVE
        notification_interval           0 ; set > 0 if you want to be renotified


Now you can easily use this servicegroup for example in serviceescalations (see servicegroup_name line):

 define serviceescalation{
        hostgroup_name          intranet-servers
        servicegroup_name       allservices
        first_notification      1
        last_notification       0
        notification_interval   1440
        contact_groups          intranet-admins

Hope this helps somebody. I guess it works the same way in Nagios.

13. August 2014

Icinga: Monitor refused mails in postfix mailqueue

Filed under: Linux,Server Administration — Tags: , , , , , , , , , , — Christopher Kramer @ 12:09

In case your server gets listed on blacklists, mails will get refused by destination servers and stick in the deferred mail queue for some time until the sender finally gets a mailer daemon.

As it takes some time until the sender gets the mailer daemon and informs the server admin, it would be better if you could directly get notified by Icinga/Nagios when a mail is in the deferred queue because the destination server refused it.

Therefore I wrote a small shell script which I want to share with you here. I am assuming Debian Wheezy with Icinga and a postfix mailserver.

Create the shell script with the actual plugin in

/usr/lib/nagios/plugins/check_mailq_blacklist :
# detects if mails in mail queue were refused by destination server (because of blacklist?)
# From
# Version: 2017-03-07

if mailq | grep -qP "(refused to talk to me(?!(.*out of connection slots)))|(unsolicited mail originating from your IP)|(temporarily deferred due to user complaints)"
  mails=`mailq | grep -oP "(refused to talk to me(?!(.*out of connection slots)))|(unsolicited mail originating from your IP)|(temporarily deferred due to user complai$
  echo "$mails mail(s) were refused, check mailq!"
  if [ "$mails" -le 10 ] && [ "$mails" -gt 1 ]; then
    # 2-10 mails -> warning
    echo "\nWarning. | refused=$mails;2;11;0"
    return 1;
  if [ "$mails" -gt 10 ]; then
    # more than 10 mails -> critical
    echo "\nCriticial! | refused=$mails;2;11;0"
    return 2;
  return 1;
  echo "Ok, there seems to be no refused mail in the mailq | refused=0;2;11;0"
  exit 0;

This will check for the texts “refused to talk to me” (not followed by “out of connection slots”) and “unsolicited mail originating from your IP” in the mailq output. These are the most common errors you get when the destination server has your server’s IP blacklisted.  In case at least one mail was refused, this causes a warning state in icinga. If more than 10 mails were refused, it causes a critical state.

Now you need to make this script executable:

chmod +x /usr/lib/nagios/plugins/check_mailq_blacklist

Now create the config file for the plugin in

/etc/nagios-plugins/config/mailq_blacklist.cfg :
# 'check_mailq_blacklist' command definition
define command{
        command_name    check_mailq_blacklist
        command_line    /usr/lib/nagios/plugins/check_mailq_blacklist

So now we have the command and need to define a service that uses it. Let’s say we use this locally for localhost. In



define service{
        use                             generic-service
        host_name                       localhost
        service_description             Mail Queue Refused Mail
        check_command                   check_mailq_blacklist

This is it, just restart icinga and you are done:

service icinga restart

I hope this is of use to somebody.

Of course it is also useful to monitor in Icinga, if you are on some of the most used blacklists. A script to do this can be found here.

7. February 2013

Icinga / Nagios: Notify a group of contacts about a group of hosts

In Nagios/Icinga, you can easily define which contacts or contact groups get notified for a certain service in the service definition:

 define service{
        host_name               linux-server
        service_description     check-disk-sda1
        check_command           check-disk!/dev/sda1
        max_check_attempts      5
        check_interval          5
        retry_interval          3
        check_period            24x7
        notification_interval   30
        notification_period     24x7
        notification_options    w,c,r
        contact_groups          linux-admins

(Source of this example: Icinga documentation)

So only contacts of the contact group “linux-admins” would be informed about problems regarding this service.

You could also use the “contacts” directive to list individual contacts or list multiple contact groups.

But often, the responsibility of admins is not defined through services, but through hosts. Usually, there is a group of admins for linux servers and one for windows servers. Or a group for intranet servers and one for internet servers. As admins usually are annoyed if they get notifications about servers they are not responsible for, it is usually a good idea to only notify those admins that are responsible.

So you can also do this at the host-definition:

 define host{
        host_name                       bogus-router
        alias                           Bogus Router #1
        parents                         server-backbone
        check_command                   check-host-alive
        check_interval                  5
        retry_interval                  1
        max_check_attempts              5
        check_period                    24x7
        process_perf_data               0
        retain_nonstatus_information    0
        contact_groups                  router-admins
        notification_interval           30
        notification_period             24x7
        notification_options            d,u,r

(Source of example: icinga documentation)

So only the contact_group “router_admins” would be notified for this host.

But one thing where the “contacts” and “contact_groups” directive is missing, is the hostgroups definition. It is not possible to directly assign a contact group  or list of contacts to a hostgroup or the other way round. So here is how it can be done with another type of definition.

Group your hosts

First, define a group of hosts for each group of admins. So for example, group all intranet servers in one and all internet servers in another group. You probably already did this.

define hostgroup{
        hostgroup_name          intranet-servers
        alias                   Intranet Servers
        members                 intra1, intra2, intra3
define hostgroup{
        hostgroup_name          internet-servers
        alias                   Internet Servers
        members                 inter1, inter2, inter3

See the icinga documentation for details. Note to use the shortnames in “members”.

You can also define things the other way round: When defining a host, say which hostgroup it belongs to:

define host{
        use                     generic-host
        host_name               intra1
        alias                   intra1.local
        hostgroups              intranet-servers

See documentation for details.

Group your contacts

Next, group your contacts. So create a contact-group for each group of admins so we can later assign this contact group to the corresponding group of hosts.


define contactgroup{
        contactgroup_name       intranet-admins
        alias                   Intranet Administrators
        members                 alice, bob
define contactgroup{
        contactgroup_name       internet-admins
        alias                   Internet Administrators
        members                 charley

See documentation. Again, you can also define it the other way round (list the contact groups at the contact-definition).

Assign contact groups to host groups

Now comes the interesting part. To do this, we use a “Hostescalation definition“.


 define hostescalation{
        hostgroup_name          intranet-servers
        first_notification      1
        last_notification       0
        notification_interval   60
        contact_groups          intranet-admins

 define hostescalation{
        hostgroup_name          internet-servers
        first_notification      1
        last_notification       0
        notification_interval   60
        contact_groups          internet-admins

This will make sure internet-admins get informed about internet-servers and intranet-admins about intranet-servers. “last-notification 0” means that all notifications will get sent to this group of contacts. You can adjust the notification_interval (in minutes) if you want.

The cool thing here is that you can also define that if the problem still occurs after 5 notifications, the other team of admins gets notified:

define hostescalation{
        hostgroup_name          intranet-servers
        first_notification      1
        last_notification       3
        notification_interval   30
        contact_groups          intranet-admins
define hostescalation{
        hostgroup_name          intranet-servers
        first_notification      4
        last_notification       0
        notification_interval   60
        contact_groups          internet-admins, intranet-admins

This would notify “intranet-admins” 3 times (every 30 minutes) about problems with “intranet-servers”. If the problem is still not solved, “internet-admins” will get notified as well. So the internet-admins won’t get bothered with short problems that the intranet-admins can fix, but will still get informed if the problem is not solved for some time.

More information on hostescalation and serviceescalation in the documentation here, here and here.

I hope this helped somebody.

30. January 2013

Nagios / Icinga: Monitor (local) memory usage

Filed under: Linux,Server Administration — Tags: , , , , , , , , , , , , , — Christopher Kramer @ 17:55

Nagios and its fork icinga are great monitoring tools. They come with a bundle of plugins to monitor standard services such as HTTP, SMTP, POP3, load and stuff like that. And there are lots of 3rd party plugins available for almost everything else you can think of.

But one standard thing that is missing in the official nagios-plugins package is a plugin to check memory usage (of the local machine).

So here is how to install one. I assume a Debian system with Icinga running – you might want to adjust paths for other distros or nagios.

  1. Download the plugin here
    e.g. from the shell:

  2. Then move the file to the other plugins
    mv /usr/lib/nagios/plugins/
  3. Make it executable
    chmod +x /usr/lib/nagios/plugins/
  4. Try to run it:
    perl /usr/lib/nagios/plugins/ -w 50% -c 25%
  5. This should give something like “CHECK_MEMORY OK – […] free […]”. If an error occurs, you probably need to install the perl module Nagios::Plugin. On Debian, the easiest way is:
    apt-get install libnagios-plugin-perl

    On other distros, you might use CPAN:

    perl -MCPAN -e 'install Nagios::Plugin'

    This will ask you lots of questions and install lots of dependencies (where you should say “yes”).

  6. Configure the check_memory command. To do this, create a file /etc/nagios-plugins/config/memory.cfg with this content:
    # 'check_memory' command definition
    define command{
            command_name    check_memory
            command_line    perl /usr/lib/nagios/plugins/ -w $ARG1$ -c $ARG2$
  7. Now you can use the check_memory command to define a service. For example, add this to /etc/icinga/objects/localhost_icinga.cfg (assuming you define localhost-services there):
    define service{
            use                             generic-service
            host_name                       localhost
            service_description             Memory
            check_command                   check_memory!50%!25%

    This will send you a warning when memory usage is 50% and critical when only 25% is free. You might want to adjust these values of course depending on what is normal on your system and how early you want to be notified.

  8. Check your configuration:
    /usr/local/icinga/bin/icinga -v /etc/icinga/icinga.cfg
  9. Restart Icinga / Nagios if the preflight-check was okay:
    /etc/init.d/icinga restart

This should be it.

I hope this helped somebody.

To monitor memory usage of a remote server, you’ll need SNMP for example. Maybe I’ll post another blog post on this soon.