In Nagios/Icinga, you can easily define which contacts or contact groups get notified for a certain service in the service definition:
define service{ host_name linux-server service_description check-disk-sda1 check_command check-disk!/dev/sda1 max_check_attempts 5 check_interval 5 retry_interval 3 check_period 24x7 notification_interval 30 notification_period 24x7 notification_options w,c,r contact_groups linux-admins }
(Source of this example: Icinga documentation)
So only contacts of the contact group “linux-admins” would be informed about problems regarding this service.
You could also use the “contacts” directive to list individual contacts or list multiple contact groups.
But often, the responsibility of admins is not defined through services, but through hosts. Usually, there is a group of admins for linux servers and one for windows servers. Or a group for intranet servers and one for internet servers. As admins usually are annoyed if they get notifications about servers they are not responsible for, it is usually a good idea to only notify those admins that are responsible.
So you can also do this at the host-definition:
define host{ host_name bogus-router alias Bogus Router #1 address 192.168.1.254 parents server-backbone check_command check-host-alive check_interval 5 retry_interval 1 max_check_attempts 5 check_period 24x7 process_perf_data 0 retain_nonstatus_information 0 contact_groups router-admins notification_interval 30 notification_period 24x7 notification_options d,u,r }
(Source of example: icinga documentation)
So only the contact_group “router_admins” would be notified for this host.
But one thing where the “contacts” and “contact_groups” directive is missing, is the hostgroups definition. It is not possible to directly assign a contact groupĀ or list of contacts to a hostgroup or the other way round. So here is how it can be done with another type of definition.
Group your hosts
First, define a group of hosts for each group of admins. So for example, group all intranet servers in one and all internet servers in another group. You probably already did this.
define hostgroup{ hostgroup_name intranet-servers alias Intranet Servers members intra1, intra2, intra3 } define hostgroup{ hostgroup_name internet-servers alias Internet Servers members inter1, inter2, inter3 }
See the icinga documentation for details. Note to use the shortnames in “members”.
You can also define things the other way round: When defining a host, say which hostgroup it belongs to:
define host{ use generic-host host_name intra1 alias intra1.local address 192.168.10.1 hostgroups intranet-servers }
See documentation for details.
Group your contacts
Next, group your contacts. So create a contact-group for each group of admins so we can later assign this contact group to the corresponding group of hosts.
Example:
define contactgroup{ contactgroup_name intranet-admins alias Intranet Administrators members alice, bob } define contactgroup{ contactgroup_name internet-admins alias Internet Administrators members charley }
See documentation. Again, you can also define it the other way round (list the contact groups at the contact-definition).
Assign contact groups to host groups
Now comes the interesting part. To do this, we use a “Hostescalation definition“.
Example:
define hostescalation{ hostgroup_name intranet-servers first_notification 1 last_notification 0 notification_interval 60 contact_groups intranet-admins } define hostescalation{ hostgroup_name internet-servers first_notification 1 last_notification 0 notification_interval 60 contact_groups internet-admins }
This will make sure internet-admins get informed about internet-servers and intranet-admins about intranet-servers. “last-notification 0” means that all notifications will get sent to this group of contacts. You can adjust the notification_interval (in minutes) if you want.
The cool thing here is that you can also define that if the problem still occurs after 5 notifications, the other team of admins gets notified:
define hostescalation{ hostgroup_name intranet-servers first_notification 1 last_notification 3 notification_interval 30 contact_groups intranet-admins } define hostescalation{ hostgroup_name intranet-servers first_notification 4 last_notification 0 notification_interval 60 contact_groups internet-admins, intranet-admins }
This would notify “intranet-admins” 3 times (every 30 minutes) about problems with “intranet-servers”. If the problem is still not solved, “internet-admins” will get notified as well. So the internet-admins won’t get bothered with short problems that the intranet-admins can fix, but will still get informed if the problem is not solved for some time.
More information on hostescalation and serviceescalation in the documentation here, here and here.
I hope this helped somebody.
This is handy, thanks for the tip.
Comment by user001 — 25. March 2013 @ 17:10
Hi,
Good idea on hostescalations. I am looking for something similar but in service escalations. I can basically emulate your idea to service escalation but is there any way to specify method of contact. We check for the same services in prod and non-prod environment. For prod, we get want to get notified by SMS but for non-prod email. Other other creating two service escalation configs (one for prod and one for non-prod), is there an easy way to specify method of notification logic i.e if prod, send SMS; if non-prod, send email?
Comment by Sidd — 12. February 2014 @ 00:08
@ Sidd: This works.
Service escalations are possible more or less the same way:
http://docs.icinga.org/latest/en/objectdefinitions.html#objectdefinitions-serviceescalation
This is my first idea on how to solve your problem:
You define every contact twice: Once with the sms number for sms notification and once with the email for mail notifications. Then group all sms-contacts in one contactgroup and all mail-contacts in another contactgroup. And then assign the sms-contactgroup to the prod-servers and the mail-contactgroup to the non-prod servers.
Have not tried this, but it should work this way.
Comment by Christopher Kramer — 12. February 2014 @ 21:39