Hi guys!<br><br>I am now officially baffled on how Nagios handles service escalations and notifications. I'm using Nagios 3.2.3 on SLES 10 SP3 and my current setup is this:<br><br>service_escalation.cfg:<br><br>define serviceescalation {<br>
       service_description     http_80<br>       host_name               apache02<br>       first_notification      1<br>       last_notification       5<br>       notification_interval   60<br>       escalation_period       Office_Hours<br>
       contact_groups          unix-sms, dba-email, dev-email<br>}<br><br>define serviceescalation {<br>       service_description     http_80<br>       host_name               apache02<br>       first_notification      6<br>
       last_notification       8<br>       notification_interval   90<br>       escalation_period       Office_Hours<br>       contact_groups          unix-sms, dba-email, dev-email, unix-supervisor, dev-supervisor<br>}<br>
<br>define serviceescalation {<br>       service_description     http_80<br>       host_name              apache02<br>       first_notification      1<br>       last_notification       0<br>       notification_interval   60<br>
       escalation_period       24x7<br>       contact_groups          unix-admins-email<br>}<br><br>The users defined in the service_escalation.cfg have their contacts.cfg configured like this:<br><br>define contact{<br>        contact_name                            unix-sms<br>
        alias                                   Team UNIX<br>        host_notification_period                Early_Morning<br>        service_notification_period            Early_Morning<br>        host_notification_options               u,d,r<br>
        service_notification_options            w,c,u,r<br>        host_notification_commands              host-notify-by-epager<br>        service_notification_commands           notify-by-epager<br>        email                                   <a href="mailto:unix@email.org">unix@email.org</a><br>
}<br><br>define contact{<br>
        contact_name                            unix-supervisor<br>
        alias                                   Team UNIX Supervisor<br>
        host_notification_period                Early_Morning<br>
        service_notification_period            Early_Morning<br>
        host_notification_options               u,d,r<br>
        service_notification_options            w,c,u,r<br>
        host_notification_commands              host-notify-by-epager<br>
        service_notification_commands           notify-by-epager<br>
        email                                   <a href="mailto:unixsupervisor@email.org">unixsupervisor@email.org</a><br>
}<br><br>timeperiod.cfg looks like this:<br><br>define timeperiod{<br>        timeperiod_name         Office_Hours<br>        alias                   Office_Hours<br>        sunday                  09:00-20:00<br>        monday                  09:00-20:00<br>
        tuesday                 09:00-20:00<br>        wednesday               09:00-20:00<br>        thursday                09:00-20:00<br>        friday                  09:00-20:00<br>        saturday                09:00-20:00<br>
}<br><br>define timeperiod{<br>        timeperiod_name         Early_Morning<br>        alias                   Early_Morning<br>        sunday                  07:00-22:10<br>        monday                  07:00-22:10<br>
        tuesday                 07:00-22:10<br>        wednesday               07:00-22:10<br>        thursday                07:00-22:10<br>        friday                  07:00-22:10<br>        saturday                07:00-22:10<br>
}<br><br>With these configurations in place, http_80 service goes down at 10pm every night (scheduled downtime). I am expecting that notifications starting from 10pm onwards will go *only* to unix-admins-email because of the service_escalation.cfg file. And it happily did, at least for the critical notifications.<br>
<br>Now the fun part comes in. The recovery notification was sent to the unix-sms, dba-email, dev-email, unix-supervisor, dev-supervisor groups at 7:03am, when it returned to OK status, which is weird because the critical notifications from 10pm to 6am (next day) was sent only and only to the  unix-admins-email group.<br>
<br>Plus, I read from the Nagios docs that it will not send recovery notifications to those who did not receive the critical/warning/unknown notifications in the first place.<br><br>So my questions are:<br>Why did Nagios send the recovery alert to the supervisors, who did not know that the service was down in the first place because they did not receive the critical alert? <br>
Did Nagios took their defined timeperiods into consideration when it send the recovery alert?<br><br>TIA!<br>