Passive service on Windows 2003 box notifies intermittently

C. Bensend benny at bennyvision.com
Thu Nov 19 15:50:04 CET 2009


Hey folks,

   I'm working on ironing out Windows event log alerting for our
eleventy billion Windows hosts, and they're slowly but surely
driving me insane.

   I am using Steve Shipway's Nagios EventLog Agent, as I need the
end users to be able to add/edit/remove their own alerts as they
see fit.  However, *I* am having a helluva time getting this all
working together.  Sorry for the length of this email, I've
included a metric buttload of data.

   I have the following service definition on the Nagios host (from
objects.cache):


define service {
        host_name       winhost
        service_description     System EventLog
        check_period    24x7 passive checks
        check_command   check_passive_service!0!No critical system events
contact_groups  testing-admins
        notification_period     24x7 passive checks
        initial_state   o
        check_interval  5.000000
        retry_interval  2.000000
        max_check_attempts      1
        is_volatile     0
        parallelize_check       1
        active_checks_enabled   0
        passive_checks_enabled  1
        obsess_over_service     1
        event_handler_enabled   1
        low_flap_threshold      0.000000
        high_flap_threshold     0.000000
        flap_detection_enabled  1
        flap_detection_options  o,w,u,c
        freshness_threshold     14400
        check_freshness 1
        notification_options    u,w,c,r
        notifications_enabled   1
        notification_interval   360.000000
        first_notification_delay        0.000000
        stalking_options        n
        process_perf_data       1
        failure_prediction_enabled      1
        retain_status_information       1
        retain_nonstatus_information    1
        }


The check_passive_service command is defined as such:


define command {
        command_name    check_passive_service
        command_line    $USER1$/check_dummy $ARG1$ "$ARG2$"
        }


The "24x7 passive checks" timeperiod is defined as such:


define timeperiod {
        timeperiod_name 24x7 passive checks
        alias   24x7 passive checks - single alert notifies
        sunday  00:00-24:00
        monday  00:00-24:00
        tuesday 00:00-24:00
        wednesday       00:00-24:00
        thursday        00:00-24:00
        friday  00:00-24:00
        saturday        00:00-24:00
        }


The testing-admins contact group is defined as such:


define contactgroup {
        contactgroup_name       testing-admins
        alias   Bensend testing group
        members cbensend
        }


On the Windows side, I have a EventLog Agent alert set up like so:

   Name:  User Initiated System Reboot
   Event Log to Check:  System
   Which Events to Alert:  Information, Warning, Error
   Match String:  has initiated the restart of computer HOSTNAME
   Service Name:  System EventLog
   Service Status:  (2) Critical


   The Agent and NSCA are communicating fine, I get a notification
each time I restart the agent.  However, the System EventLog alert
matches the regexp string above, but does not notify.  After resetting
all passive services so they are in an OK state, here are the log
entries from the Nagios side when I reboot the Windows machine with
my explanations and comments (please pardon the crappy line wrapping):


Nov 19 08:20:05 hostname nagios: EXTERNAL COMMAND:
PROCESS_SERVICE_CHECK_RESULT;winhost;EventLog Agent;1;HEARTBEAT  [WARN
#1]: Service starting

-- OK, that's the Nagios EventLog Agent starting.

Nov 19 08:20:06 hostname nagios: EXTERNAL COMMAND:
PROCESS_SERVICE_CHECK_RESULT;winhost;System EventLog;2;System [info]
[USER32 #1074]: The process Explorer.EXE has initiated the restart of
computer WINHOST on behalf of user DOMAIN\me for the following reason:
Application: Maintenance (Planned)  Reason Code: 0x84040001  Shutdown
Type: restart  Commen

-- That is the passive event coming in from NSCA, so the Agent is
   working and communicating with NSCA just fine

Nov 19 08:20:06 hostname nagios: EXTERNAL COMMAND:
PROCESS_SERVICE_CHECK_RESULT;winhost;System EventLog;2;System [info]
[USER32 #1074]: The process svchost.exe has initiated the restart of
computer WINHOST on behalf of user NT AUTHORITY\SYSTEM for the following
reason: No title for this reason could be found  Reason Code: 0x80070020
Shutdown Type: restart

-- Ditto here

Nov 19 08:20:12 hostname nagios: PASSIVE SERVICE CHECK: winhost;EventLog
Agent;1;HEARTBEAT  [WARN #1]: Service starting

-- I believe that's Nagios picking up the passive service check data
   from the named pipe

Nov 19 08:20:12 hostname nagios: SERVICE ALERT: winhost;EventLog
Agent;WARNING;HARD;1;HEARTBEAT  [WARN #1]: Service starting

-- Nagios generates a service alert for the agent

Nov 19 08:20:12 hostname nagios: SERVICE NOTIFICATION: me;winhost;EventLog
Agent;WARNING;notify-service-by-email;HEARTBEAT  [WARN #1]: Service
starting

-- Yay, Nagios notifies me via email because the Nagios EventLog
   Agent has started up

Nov 19 08:20:12 hostname nagios: PASSIVE SERVICE CHECK: winhost;System
EventLog;2;System [info] [USER32 #1074]: The process svchost.exe has
initiated the restart of computer WINHOST on behalf of user NT
AUTHORITY\SYSTEM for the following reason: No title for this reason could
be found  Reason Code: 0x80070020  Shutdown Type: restart

-- Nagios picking up the passive service data from the named pipe?

Nov 19 08:20:12 hostname nagios: SERVICE ALERT: winhost;System
EventLog;CRITICAL;HARD;1;System [info] [USER32 #1074]: The process
svchost.exe has initiated the restart of computer WINHOST on behalf of
user NT AUTHORITY\SYSTEM for the following reason: No title for this
reason could be found  Reason Code: 0x80070020  Shutdown Type: restart

-- OK, Nagios generates a service alert here.  Yay.  But ...

Nov 19 08:20:12 hostname nagios: PASSIVE SERVICE CHECK: winhost;System
EventLog;2;System [info] [USER32 #1074]: The process Explorer.EXE has
initiated the restart of computer WINHOST on behalf of user DOMAIN\me for
the following reason: Application: Maintenance (Planned)  Reason Code:
0x84040001  Shutdown Type: restart  Commen


   That's it.  No notification.  No nothing else, and I didn't skip
any log entries other than one of the NSClient++ services getting a
connection refused while the host was rebooting.

   And what makes this worse is that it's not consistent - I get the
entries from NSCA every time, but I only get the notifications SOME
of the time.  Here is one that *did* work:


Nov 19 08:39:24 hostname nagios: EXTERNAL COMMAND:
PROCESS_SERVICE_CHECK_RESULT;winhost;EventLog Agent;1;HEARTBEAT  [WARN
#1]: Service starting

-- OK, again, the passive EventLog Agent service starts

Nov 19 08:39:24 hostname nagios: EXTERNAL COMMAND:
PROCESS_SERVICE_CHECK_RESULT;winhost;System EventLog;2;System [info]
[USER32 #1074]: The process Explorer.EXE has initiated the restart of
computer WINHOST on behalf of user DOMAIN\Me for the following reason:
Application: Maintenance (Planned)  Reason Code: 0x84040001  Shutdown
Type: restart  Commen

-- The agent kicks in, and sends the desired alert to NSCA

Nov 19 08:39:25 hostname nagios: EXTERNAL COMMAND:
PROCESS_SERVICE_CHECK_RESULT;winhost;System EventLog;2;System [info]
[USER32 #1074]: The process svchost.exe has initiated the restart of
computer WINHOST on behalf of user NT AUTHORITY\SYSTEM for the following
reason: No title for this reason could be found  Reason Code: 0x80070020
Shutdown Type: restart

-- Nagios notices

Nov 19 08:39:32 hostname nagios: PASSIVE SERVICE CHECK: winhost;System
EventLog;2;System [info] [USER32 #1074]: The process Explorer.EXE has
initiated the restart of computer WINHOST on behalf of user DOMAIN\Me for
the following reason: Application: Maintenance (Planned)  Reason Code:
0x84040001  Shutdown Type: restart  Commen

-- Ditto

Nov 19 08:39:32 hostname nagios: SERVICE ALERT: winhost;System
EventLog;CRITICAL;HARD;1;System [info] [USER32 #1074]: The process
Explorer.EXE has initiated the restart of computer WINHOST on behalf of
user DOMAIN\Me for the following reason: Application: Maintenance
(Planned)  Reason Code: 0x84040001  Shutdown Type: restart  Commen

-- Nagios generates a service alert

Nov 19 08:39:32 hostname nagios: SERVICE NOTIFICATION:
cbensend;winhost;System EventLog;CRITICAL;notify-service-by-email;System
[info] [USER32 #1074]: The process Explorer.EXE has initiated the restart
of computer WINHOST on behalf of user DOMAIN\Me for the following reason:
Application: Maintenance (Planned)  Reason Code: 0x84040001  Shutdown
Type: restart  Commen

-- And this time, it generates a service *NOTIFICATION*.  Why this time
   and not the last?

Nov 19 08:39:32 hostname nagios: PASSIVE SERVICE CHECK: winhost;EventLog
Agent;1;HEARTBEAT  [WARN #1]: Service starting
Nov 19 08:39:32 hostname nagios: SERVICE ALERT: winhost;EventLog
Agent;WARNING;HARD;1;HEARTBEAT  [WARN #1]: Service starting
Nov 19 08:39:32 hostname nagios: SERVICE NOTIFICATION:
cbensend;winhost;EventLog Agent;WARNING;notify-service-by-email;HEARTBEAT
[WARN #1]: Service starting
Nov 19 08:39:32 hostname nagios: PASSIVE SERVICE CHECK: winhost;System
EventLog;2;System [info] [USER32 #1074]: The process svchost.exe has
initiated the restart of computer WINHOST on behalf of user NT
AUTHORITY\SYSTEM for the following reason: No title for this reason could
be found  Reason Code: 0x80070020  Shutdown Type: restart


   This is the first time I've done anything with passive service
checks; am I just not understanding something silly?  Or .. ?  This
is Nagios 3.2.0 running on RHEL 5.4 (built from source), BTW.

Thanks folks,

Benny


-- 
"It's not all about getting up and putting four slices of kickass
in a two slice toaster."         -- ark86, on Fazed.net






------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list