In Nagios/Icinga, you can easily define which contacts or contact groups get notified for a certain service in the service definition:
define service{
host_name linux-server
service_description check-disk-sda1
check_command check-disk!/dev/sda1
max_check_attempts 5
check_interval 5
retry_interval 3
check_period 24x7
notification_interval 30
notification_period 24x7
notification_options w,c,r
contact_groups linux-admins
}
(Source of this example: Icinga documentation)
So only contacts of the contact group “linux-admins” would be informed about problems regarding this service.
You could also use the “contacts” directive to list individual contacts or list multiple contact groups.
But often, the responsibility of admins is not defined through services, but through hosts. Usually, there is a group of admins for linux servers and one for windows servers. Or a group for intranet servers and one for internet servers. As admins usually are annoyed if they get notifications about servers they are not responsible for, it is usually a good idea to only notify those admins that are responsible.
So you can also do this at the host-definition:
define host{
host_name bogus-router
alias Bogus Router #1
address 192.168.1.254
parents server-backbone
check_command check-host-alive
check_interval 5
retry_interval 1
max_check_attempts 5
check_period 24x7
process_perf_data 0
retain_nonstatus_information 0
contact_groups router-admins
notification_interval 30
notification_period 24x7
notification_options d,u,r
}
(Source of example: icinga documentation)
So only the contact_group “router_admins” would be notified for this host.
But one thing where the “contacts” and “contact_groups” directive is missing, is the hostgroups definition. It is not possible to directly assign a contact group or list of contacts to a hostgroup or the other way round. So here is how it can be done with another type of definition.
Group your hosts
First, define a group of hosts for each group of admins. So for example, group all intranet servers in one and all internet servers in another group. You probably already did this.
define hostgroup{
hostgroup_name intranet-servers
alias Intranet Servers
members intra1, intra2, intra3
}
define hostgroup{
hostgroup_name internet-servers
alias Internet Servers
members inter1, inter2, inter3
}
See the icinga documentation for details. Note to use the shortnames in “members”.
You can also define things the other way round: When defining a host, say which hostgroup it belongs to:
define host{
use generic-host
host_name intra1
alias intra1.local
address 192.168.10.1
hostgroups intranet-servers
}
See documentation for details.
Group your contacts
Next, group your contacts. So create a contact-group for each group of admins so we can later assign this contact group to the corresponding group of hosts.
Example:
define contactgroup{
contactgroup_name intranet-admins
alias Intranet Administrators
members alice, bob
}
define contactgroup{
contactgroup_name internet-admins
alias Internet Administrators
members charley
}
See documentation. Again, you can also define it the other way round (list the contact groups at the contact-definition).
Assign contact groups to host groups
Now comes the interesting part. To do this, we use a “Hostescalation definition“.
Example:
define hostescalation{
hostgroup_name intranet-servers
first_notification 1
last_notification 0
notification_interval 60
contact_groups intranet-admins
}
define hostescalation{
hostgroup_name internet-servers
first_notification 1
last_notification 0
notification_interval 60
contact_groups internet-admins
}
This will make sure internet-admins get informed about internet-servers and intranet-admins about intranet-servers. “last-notification 0” means that all notifications will get sent to this group of contacts. You can adjust the notification_interval (in minutes) if you want.
The cool thing here is that you can also define that if the problem still occurs after 5 notifications, the other team of admins gets notified:
define hostescalation{
hostgroup_name intranet-servers
first_notification 1
last_notification 3
notification_interval 30
contact_groups intranet-admins
}
define hostescalation{
hostgroup_name intranet-servers
first_notification 4
last_notification 0
notification_interval 60
contact_groups internet-admins, intranet-admins
}
This would notify “intranet-admins” 3 times (every 30 minutes) about problems with “intranet-servers”. If the problem is still not solved, “internet-admins” will get notified as well. So the internet-admins won’t get bothered with short problems that the intranet-admins can fix, but will still get informed if the problem is not solved for some time.
More information on hostescalation and serviceescalation in the documentation here, here and here.
I hope this helped somebody.