The problem: greylisting significantly delays mail from big mail providers with many servers
I still think greylisting with postgrey is very effective against spam. But now and then, mail from some senders get delayed for hours. This especially is the case for big mail service providers like Amazon SES, Microsoft Exchange Online Protection or Mailchimp. These providers have lots of servers and IP-addresses and the first attempt to deliver an email usually comes from a different server than the second. Thus, postgrey will not find a matching triplet and block the second attempt again. This will continue until one of the mail servers tried twice and postgrey found a matching tripled. This might take hours and thus cause significant delay of mails.
Therefore, if you want to use postgrey without delaying lots of mail for several hours, you need to whitelist these big email services with many servers. As new services come up and others go, you need some way to find out which servers try to send mail to your server and frequently get denied because no triplet is found.
The solution: Check your mail logs
To get an idea of what is happening and which servers might need whitelisting, I wrote a small command that analyzes your mail.log-files. Here it is:
grep -ohP 'reason=new, client_name=[^,.]+\.[^,.]+\.\K[^,.]+\.[^,]+' /var/log/mail.log* | sort | uniq -c | sort -nr | less
So what does this do?
It scans your /var/log/mail.log* files for occurrences containing reason=new, which is what postgrey logs when no triplet was found. It then takes the domain of the server that is trying to send the mail to you, stripping the first subdomain, as this might vary from server to server that the mail service provider uses. It then sorts, counts and orders the result.
The result of this command might look like this (stripped after some lines):
579 protection.outlook.com
503 eu-west-1.amazonses.com
113 facebook.com
54 mcsv.net
52 rsgsv.net
48 mcdlv.net
28 amazonses.com
17 asianet.co.th
You should check especially all the lines that have a lot of occurrences. They might be domains from big email service providers that need whitelisting for the reasons explained above. For example, the mail from protection.outlook.com and amazonses.com is mail sent by customers of these mail providers that usually is not spam. Especially, the mail servers of these providers do retry delivery lots of times, so greylisting would not block spam anyway, it would just cause significant delays. So these domains should be whitelisted.
But the domains in this list might also only be domains that are used by an ISP for dynamic IP addresses that are actually sending spam. For example, asianet.co.th seems to be a big ISP from Thailand and multiple dynamic IP-addresses from this ISP tried to send spam, resulting in 17 occurrences where no triplet was found. This is mail that you want to be blocked/delayed.
Once you decided which domains you want to whitelist, you can do so by adding them in your whitelist-file (on Debian this is usually /etc/postgrey/whitelist_clients).
The entries should look like this:
/.*\.protection\.outlook\.com$/
/.*\.amazonses\.com$/
/.*\.facebook\.com$/
/.*\.booking\.com$/
/.*\.mcsv\.net$/
/.*\.rsgsv\.net$/
/.*\.mcdlv\.net$/
/.*\.mandrillapp\.com$/
Note: You should escape all dots with a backslash, as done in the example. Otherwise, a dot matches any character, so protection.outlook.com would also whitelist protectionAoutlook.com, which might be a domain that spammers register on purpose to get through servers that use inaccurate whitelists.
Automation with nagios/icinga
You could run the above command from time to time to check if new services need whitelisting. But it would be more easy, if you get notified when a new domain pops up in this list that you have not checked yet, right?
Therefore, I wrote a small nagios/icinga plugin that checks if any new domains appear on this list. It is a bit quick and dirty but does its job.
The main part is a bash script, that you could place in /usr/lib/nagios/plugins/check_postgrey_whitelist :
#!/bin/bash
log='/var/log/mail.log.1'
ignoreFile='/etc/nagios-plugins/postgrey_no_whitelist'
hosts=`grep -ohP 'reason=(new|early-retry[^,]*), client_name=[^,.]+\.[^,.]+\.\K[^,.]+(\.co|\.com)\.?[^,]+' $log | sort | uniq -c | awk '$1>=15{print $2}' | sort`;
noWhitelist=`cat $ignoreFile | sort`;
diff=`comm -23 <(echo "$hosts") <(echo "$noWhitelist") | sed ':a;N;$!ba;s/\n/ /g'`;
OK=0;
WARNING=1;
CRITICAL=2;
UNKNOWN=3;
if [ -z "$diff" ]; then
echo "Okay, the top domains are whitelisted or ignored";
exit $OK;
else
echo "Whitelist or ignore these senders: $diff";
exit $WARNING;
fi
This script ignores all domains listed in /etc/nagios-plugins/postgrey_no_whitelist (one per line), assuming you checked that you do not want to whitelist these. So create this file as well and put the domains in that you checked and do not want to whitelist:
asianet.co.th
example.com
Now we need to define the command for this plugin:
Create the configuration where your other plugins are defined, e.g. /etc/nagios-plugins/config/postgrey_whitelist.cfg
define command {
command_name check_postgrey_whitelist
command_line /usr/lib/nagios/plugins/check_postgrey_whitelist
}
Now you need to create a service for your localhost that runs this command.
In your localhost service definition, e.g. /etc/icinga/objects/localhost_icinga.cfg , add this:
define service{
use generic-service
host_name localhost
service_description Postgrey Whitelist
check_command check_postgrey_whitelist
}
Finally, make sure that nagios/icinga has read-access to your mail-logfile. You can adjust which file to check in the script above, I chose mail.log.1. So you might chown this file to the nagios user like this:
chown nagios:adm mail.log.1
Of course logrotation needs to know that new rotated files should have this owner. On a debian system, you can configure logrotate to do so in /etc/logrotate.d/rsyslog. Adjust it similar to this one:
/var/log/mail.info
/var/log/mail.warn
/var/log/mail.err
/var/log/daemon.log
/var/log/kern.log
/var/log/auth.log
/var/log/user.log
/var/log/lpr.log
/var/log/cron.log
/var/log/debug
/var/log/messages
{
rotate 4
weekly
missingok
notifempty
compress
delaycompress
sharedscripts
postrotate
invoke-rc.d rsyslog rotate > /dev/null
endscript
}
/var/log/mail.log
{
rotate 4
weekly
missingok
notifempty
compress
delaycompress
sharedscripts
create 440 nagios adm
postrotate
invoke-rc.d rsyslog rotate > /dev/null
endscript
}
So I removed the mail.log from the first logrotate definition and created a new one below which is exactly the same but contains an extra line create 440 nagios adm to tell logrotate which user and permission to assign the logfiles to.
You might need to adjust some stuff depending on your distro, but I hope this helps somebody to set this up.
Update 06.03.2017: Adjusted script so it does not consider second-level domains like co.uk or com.br as single senders anymore.