Distributed Monitoring Central Server no status changes

Paul Landauer pllandauer at comcast.net
Wed Feb 25 22:26:09 CET 2009


> 
> On Feb 25, 2009, at 12:54 PM, Paul Landauer wrote:
> 
> > On Wed, 2009-02-25 at 12:06 -0600, Marc Powell wrote:
> 
> > I'm using 2 servers following the documentation at
> > http://nagios.sourceforge.net/docs/3_0/distributed.html
> 
> Thanks.
> 
> >> - example host and service definitions from both servers (complete
> >> definitions please)
> > Definitions are the same on both servers.
> > Example host definition:
> > define host{
> > 	use	generic-host
> > 	host_name	surf
> > 	alias	Surf Control
> > 	address	ip_address_of_surf_is_here
> > 	max_check_attempts	5
> > 	check_command	check-host-alive
> > 	check_interval	5
> > 	retry_interval	1
> > 	check_period	24x7
> > 	contact_groups	admins
> > 	notification_interval	30
> > 	notification_period	24x7
> > 	notification_options	d,u,r
> > 	}
> >
> > Example Service Definitions (surf is a member of  
> > sunrise_windows_servers):
> > define service{
> > 	use			generic-service
> > 	hostgroup_name		sunrise_windows_servers
> > 	service_description	NSClient++ Version
> > 	check_command		check_nt!CLIENTVERSION
> > 	}
> 
> For future reference, these are not 'complete' since you use  
> templates. There's lots of important information within those  
> templates that's needed when troubleshooting as well. I expect that  
> the definitions are indeed different between the servers when you take  
> the templates into account otherwise your central server is doing  
> active checks of the services in addition to receiving the passive  
> checks, overwriting their results. (I don't think this is the problem).
> 
> >> - related nagios.log information from both servers
> > I included excerpts that I thought applied.  If you'd like the whole
> > log, let me know.
> > Nagios.log for Distributed server:
> > [1235575724] SERVICE ALERT: surf;Explorer;CRITICAL;HARD; 
> > 3;Explorer.exe:
> > not running
> > [1235575724] SERVICE NOTIFICATION:
> > nagiosadmin;surf;Explorer;CRITICAL;notify-service-by- 
> > email;Explorer.exe:
> > not running
> >
> > Nagios.log for Central Server:
> > [1235575777] EXTERNAL COMMAND:
> > PROCESS_SERVICE_CHECK_RESULT;surf;Explorer;0;Explorer.exe: not running
> > [1235575778] PASSIVE SERVICE CHECK: surf;Explorer;0;Explorer.exe: not
> > running
> 
> This is interesting and useful. As you can see, on your distributed  
> server, the status is 3 (CRITICAL) but by the time NSCA dumps it into  
> the command pipe on the central server, that has been translated to 0  
> (OK) by something in the process. This could be because nagios isn't  
> passing the correct status code to your submission script, your  
> submission script is not interpreting or passing it to send_nsca  
> correctly or nsca on the receiving side isn't reading it correctly.
> 
> >> - the contents of your check result submission script if it's not
> >> exactly like the documented one.
> > printfcmd="/usr/bin/printf"
> >
> > NscaBin="/usr/bin/send_nsca"
> > NscaCfg="/etc/nagios/send_nsca.cfg"
> > NagiosHost="I_have_the_ip_address_of_my_central_server_here"
> >
> > # Fire the data off to the NSCA daemon using the send_nsca script
> > $printfcmd "%s\t%s\t%s\t%s\n" "$1" "$2" "$3" "$4" | $NscaBin -H
> > $NagiosHost -p 5
> > 721 -c $NscaCfg
> 
> To say whether this is correct or not I'd have to see your OCSP  
> command definition. If you're using the $SERVICESTATE$ macro, then  
> this is broken. send_nsca expects a numeric state code but  
> $SERVICESTATE$ provides a grammatical code (OK, CRITICAL, etc).  
> Normally that needs to be translated to the proper numeric by the  
> submission script first but you can also use the $SERVICESTATEID$  
> macro instead to get the numeric code. My bets are on this being the  
> problem.
> 
> >> Running nagios and/or NSCA in debug mode on the central server might
> >> provide additional information.
> > Let me know if you still want this to be done.
> 
> Running NSCA in debug to see if it's receiving the 0 status code from  
> the distributed machine would further narrow down the source of the  
> problem.
> 
> --
> Marc


Marc,

You are correct sir!  I changed $SERVICESTATE$ to $SERVICESTATEID$ on
the distributed server and the central server is updating properly.  I
imagine that I'll need to use $HOSTSTATEID$ instead of $HOSTSTATE$ as
well.

paul


------------------------------------------------------------------------------
Open Source Business Conference (OSBC), March 24-25, 2009, San Francisco, CA
-OSBC tackles the biggest issue in open source: Open Sourcing the Enterprise
-Strategies to boost innovation and cut costs with open source participation
-Receive a $600 discount off the registration fee with the source code: SFAD
http://p.sf.net/sfu/XcvMzF8H
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list