Failing event_handlers and ocsp/ochp_command silently fail & not logged

Brian A. Seklecki lavalamp at spiritual-machines.org
Wed Aug 27 16:55:02 CEST 2008


>From back in January -- there has been other discussion on similar
issues, but no discussion / traction on this bug (and of course, no
medium to report it, track it, document it, submit bugs/patches, etc.)

The problem is that the exact same code forking/exec'ing code is used
for:

 - Service/Host Checks
 - Event Handlers
 - Notify Commands
 - OCHP/OCSP Handler
 - Performance Data Handlers

Result codes are explicitly registered with the API.

126 and 127 are also checked for explicitly and warned/logged (but only
in recent versions).  Of course 0,1,2,3 are evaluated as Service/Host
check API values.

  568   /* check for possibly missing scripts/binaries/etc */
  569   if(result==126 || result==127){

The problem is that 0,1,2,3 and != 126/127 can have different
connotations and for non host-service check forks, but the
method/function called, my_system(), doesn't have a way of
distinguishing the calling method to change logging behavior, which it
should.

The problem is further complicated by rampant use of pipes and other
exotic Bourne-style expressions being used in command_line variables
withing Nagios (one book in particular set this in motion), which,
depending oh how compliant a Bourne-shell is, can behave differently on
various systems.  Examples below, and, by no means, are they mean to be
definitive, as how bash(1) forks may behave entirely different than
exec().

Embedded perl could also further complicate things (but of course)

Solution 1:
  - Teach my_system() to behave different for non-healthcheck forks

Solution 2:
  - Call a shell script wrapper for OCSP/OCSP/Pref/Event/Notify  

Solution 3 (added begrudgingly):
  - Tell send_nsca and other builtins to use error codes > 3,  4->125
    and 129->231 are available, but this doesn't fix the problems 
    with pipes outlined below.


~BAS


$ echo test | /doesntexist
-bash: /doesntexist: No such file or directory
$ echo $?
127


$ /doesntexist | echo foo    
foo
-bash: /doesntexist: No such file or directory
$ echo $?
0


$ echo > test.sh
$ chmod -x test.sh
$ /home/seklecki/test.sh
-bash: /home/seklecki/test.sh: Permission denied
$ echo $?
126


$ echo test | ./test.sh
-bash: ./test.sh: Permission denied
$ echo $?
126

$ ./test.sh | echo foo
-bash: ./test.sh: Permission denied
foo
$ echo $?
0

$ echo fuck shit ass | /usr/local/sbin/send_nsca -H cock.gobbling.asshat
Could not open config file 'send_nsca.cfg' for reading.
Error: Config file 'send_nsca.cfg' contained errors...
$ echo $?
2




On Wed, 2008-01-02 at 15:06 -0500, Brian A. Seklecki wrote:
> What happens if ocsp/ohcp commands return non-zero status?
> 
> # send_nsca -H doesnt.fucking.exist -c foo/etc/nagios/send_nsca.cfg
> Invalid host name 'doesnt.fucking.exist'
> Error: Could not connect to host doesnt.fucking.exist on port 5667
> # echo $?
> 2
> 
> When this happens, its a very serious problem.  Nothing is logged. This
> results in a silent failure. 
> 
> Obviously, send_nsca should transmit to a hostname in hosts(5) and/or to
> an IP address that is highly available resolving any dependency on DNS. 
> 
> But even with that in mind, this exec()/fork() model behavior is
> pragmatically incorrect.
> 
> The code should be checking result code for return values != 0, and
> printing a critical error to the logs.
> 
> Moreover, even with debug_level=99999999999999999999999999
> 
> No warning / error / notice occurs:
> 
> 
> [1199302511.076169] [001.0] [pid=75615] handle_host_state()
> [1199302511.076189] [001.0] [pid=75615]
> obsessive_compulsive_host_check_processor()
> [1199302511.076229] [001.0] [pid=75615] get_raw_command_line()
> [1199302511.076261] [2320.2] [pid=75615] Raw Command Input: /bin/echo
> $HOSTNAME$//$HOSTSTATEID$//'$HOSTOUTPUT$' | /usr/local/sbin/send_nsca -H
> fbsd01.cfi.biz -c /usr/local/etc/nagios/send_nsca.cfg -d "//"
> [1199302511.076284] [2320.2] [pid=75615] Expanded Command
> Output: /bin/echo $HOSTNAME$//$HOSTSTATEID$//'$HOSTOUTPUT$'
> | /usr/local/sbin/send_nsca -H fbsd01.cfi.biz
> -c /usr/local/etc/nagios/send_nsca.cfg -d "//"
> [1199302511.076289] [016.2] [pid=75615] Raw obsessive compulsive host
> processor command line: /bin/echo $HOSTNAME$//$HOSTSTATEID
> $//'$HOSTOUTPUT$' | /usr/local/sbin/send_nsca -H fbsd01.cfi.biz
> -c /usr/local/etc/nagios/send_nsca.cfg -d "//"
> [1199302511.076664] [001.0] [pid=75615] process_macros()
> [1199302511.076683] [2048.1] [pid=75615] **** BEGIN MACRO PROCESSING
> ***********
> [1199302511.076700] [2048.1] [pid=75615] Processing: '/bin/echo
> $HOSTNAME$//$HOSTSTATEID$//'$HOSTOUTPUT$' | /usr/local/sbin/send_nsca -H
> fbsd01.cfi.biz -c /usr/local/etc/nagios/send_nsca.cfg -d "//"'
> [1199302511.076720] [2048.2] [pid=75615]   Processing part: '/bin/echo '
> [1199302511.076739] [2048.2] [pid=75615]   Not currently in macro.
> Running output (10): '/bin/echo '
> [1199302511.076758] [2048.2] [pid=75615]   Processing part: 'HOSTNAME'
> [1199302511.076780] [2048.2] [pid=75615]   Uncleaned macro.  Running
> output (16): '/bin/echo fbsd01'
> [1199302511.077138] [2048.2] [pid=75615]   Just finished macro.  Running
> output (16): '/bin/echo fbsd01'
> [1199302511.077157] [2048.2] [pid=75615]   Processing part: '//'
> [1199302511.077176] [2048.2] [pid=75615]   Not currently in macro.
> Running output (18): '/bin/echo fbsd01//'
> [1199302511.077194] [2048.2] [pid=75615]   Processing part:
> 'HOSTSTATEID'
> [1199302511.077216] [2048.2] [pid=75615]   Uncleaned macro.  Running
> output (19): '/bin/echo fbsd01//0'
> [1199302511.077235] [2048.2] [pid=75615]   Just finished macro.  Running
> output (19): '/bin/echo fbsd01//0'
> [1199302511.077254] [2048.2] [pid=75615]   Processing part: '//''
> [1199302511.077273] [2048.2] [pid=75615]   Not currently in macro.
> Running output (22): '/bin/echo fbsd01//0//''
> [1199302511.077291] [2048.2] [pid=75615]   Processing part: 'HOSTOUTPUT'
> [1199302511.079235] [2048.2] [pid=75615]   Uncleaned macro.  Running
> output (63): '/bin/echo fbsd01//0//'PING OK - Packet loss = 0%, RTA =
> 0.97 ms'
> [1199302511.079337] [2048.2] [pid=75615]   Just finished macro.  Running
> output (63): '/bin/echo fbsd01//0//'PING OK - Packet loss = 0%, RTA =
> 0.97 ms'
> [1199302511.079805] [2048.2] [pid=75615]   Processing part: ''
> | /usr/local/sbin/send_nsca -H fbsd01.cfi.biz
> -c /usr/local/etc/nagios/send_nsca.cfg -d "//"'
> [1199302511.080427] [2048.2] [pid=75615]   Not currently in macro.
> Running output (157): '/bin/echo fbsd01//0//'PING OK - Packet loss = 0%,
> RTA = 0.97 ms' | /usr/local/sbin/send_nsca -H fbsd01.cfi.biz
> -c /usr/local/etc/nagios/send_nsca.cfg -d "//"'
> [1199302511.081348] [2048.1] [pid=75615]   Done.  Final output:
> '/bin/echo fbsd01//0//'PING OK - Packet loss = 0%, RTA = 0.97 ms'
> | /usr/local/sbin/send_nsca -H fbsd01.cfi.biz
> -c /usr/local/etc/nagios/send_nsca.cfg -d "//"'
> [1199302511.081823] [2048.1] [pid=75615] **** END MACRO PROCESSING
> *************
> [1199302511.082308] [016.2] [pid=75615] Processed obsessive compulsive
> host processor command line: /bin/echo fbsd01//0//'PING OK - Packet loss
> = 0%, RTA = 0.97 ms' | /usr/local/sbin/send_nsca -H fbsd01.cfi.biz
> -c /usr/local/etc/nagios/send_nsca.cfg -d "//"
> [1199302511.083217] [001.0] [pid=75615] my_system()
> [1199302511.084280] [256.1] [pid=75615] Running command '/bin/echo
> fbsd01//0//'PING OK - Packet loss = 0%, RTA = 0.97 ms'
> | /usr/local/sbin/send_nsca -H fbsd01.cfi.biz
> -c /usr/local/etc/nagios/send_nsca.cfg -d "//"'...
> [1199302511.091760] [001.0] [pid=80369] process_macros()
> [1199302511.092248] [001.0] [pid=80369] process_macros()
> [1199302511.093349] [001.0] [pid=80369] process_macros()
> [1199302511.094769] [001.0] [pid=80369] process_macros()
> [1199302511.095734] [001.0] [pid=80369] process_macros()
> [1199302511.096702] [001.0] [pid=80369] process_macros()



-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list