Distributed nagios problem - service definition not found!

TIM MOORE MOORET10 at ODJFS.STATE.OH.US
Thu Oct 7 13:51:38 CEST 2004


Jan,
 
Here are the lines from the services.cfg of the distributed server:
 
define service {
host_name                      localhost
service_description            cpu
check_command                  check_local_load!3!5
use                            generic-service
max_check_attempts             3
normal_check_interval          3
retry_check_interval           1
check_period                   24x7
notifications_enabled          0
notification_interval          0
notification_period            24x7
notification_options           w,u,c,r
contact_groups                 admins
}
 
define service {
hostgroup_name                 ACDMZ_Switches,ACDMZ_Firewalls
service_description            Check Host Alive
check_command                  check-host-alive
max_check_attempts             3
normal_check_interval          5
retry_check_interval           1
check_period                   24x7
notification_interval          0
notification_period            24x7
notification_options           w,u,c,r
notifications_enabled          1
contact_groups                 noc
}
 
My check_command is check-host-alive and not ping.  Funny thing is that when the localhost cpu sends its checks, it seems to work.  Although, I still don't know what to look for on the central server.  Should I see some new hosts being added or does it only alarm when it fails?  Do I also have to add the hosts to the central server?  I only have the hosts in the ACDMZ_Switches defined on the distributed server.  Just curious how we get notified of problems from the distributed server.  I have a couple devices that I cannot reach via ping (check-host-alive) and they still never show as down on the central server gui.
 
Thanks for the help.
 
--------------------------------------
Tim Moore
DNS/Linux/Cisco Admin
ODJFS

>>> "Jan Scholten" <Jan.Scholten at iconz.net> 10/6/2004 4:51:05 PM >>>

Can you supply the relevant part of services.cfg?

It seems you have a misconfiguration. Are you sure the service is Check  
Host Alive and not PING (like default)?
I don't know whether Nagios likes a servie_name with a blank, so try it  
without!
So the return value ServiceName("Check Host Alive" in your case) must be  
the same  as your service_description in the services.cfg for that host.


Jan

> I just recently setup distributed nagios.  I followed the directions  
> very closely.  I first had a problem running the nsca daemon through  
> xinetd.  It just wouldn't listen for incoming on 5667.  I added the line  
> to /etc/services also.  Here is my config:
> service nsca
> {
>         flags           = REUSE
>         socket_type     = stream
>         wait            = no
>         user            = nagios
>         group           = nagios
>         server          = /usr/local/nagios/bin/nsca
>         server_args     = -c /usr/local/nagios/etc/nsca.cfg
>         log_on_failure  += USERID
>         disable         = no
>         only_from       = 10.12.225.50
> }
>
> If I run it from command line in daemon mode it works fine.
> My main problem, is that when passive checks are sent to the central  
> server I keep getting this error:
> Oct  6 15:02:28 noc-mon nsca[31620]: Connection from 10.12.225.50 port  
> 38784
> Oct  6 15:02:28 noc-mon nsca[31620]: Host address checks out ok
> Oct  6 15:02:28 noc-mon nsca[31620]: Handling the connection...
> Oct  6 15:02:29 noc-mon nsca[31620]: SERVICE CHECK -> Host Name:  
> 'localhost', Service Description: 'cpu', Return Code: '0', Output: 'OK -  
> load average: 0.00, 0.00, 0.00'
> Oct  6 15:02:29 noc-mon nsca[31620]: End of connection...
> Oct  6 15:02:30 noc-mon nagios: EXTERNAL COMMAND:  
> PROCESS_SERVICE_CHECK_RESULT;localhost;cpu;0;OK - load average: 0.00,  
> 0.00, 0.00
> Oct  6 15:02:39 noc-mon nsca[31817]: Connection from 10.12.225.50 port  
> 39040
> Oct  6 15:02:39 noc-mon nsca[31817]: Host address checks out ok
> Oct  6 15:02:39 noc-mon nsca[31817]: Handling the connection...
> Oct  6 15:02:40 noc-mon nsca[31817]: SERVICE CHECK -> Host Name:  
> 'acdmz-inside-sw2', Service Description: 'Check Host Alive', Return  
> Code: '0', Output: 'PING OK - Packet loss = 0%, RTA = 0.83 ms'
> Oct  6 15:02:40 noc-mon nsca[31817]: End of connection...
> Oct  6 15:02:40 noc-mon nagios: EXTERNAL COMMAND:  
> PROCESS_SERVICE_CHECK_RESULT;acdmz-inside-sw2;Check Host Alive;0;PING OK  
> - Packet loss = 0%, RTA = 0.83 ms
> Oct  6 15:02:44 noc-mon nagios: Warning:  Message queue contained  
> results for service 'Check Host Alive' on host 'acdmz-inside-sw2'.  The  
> service could not be found!
>
> The localhost check acts like it works, but the simple check-host-alive  
> service definition is not.  I know that that service definition is on  
> both servers.  They are both running v1.2.  Also, should I see something  
> on my central server's web gui showing these hosts down?  My host count  
> has not been affected at all by the hosts added to the distributed  
> server.  Am I missing something?  Is there something wrong with the  
> default check-host-alive service check?
> Thanks for any help,
> --------------------------------------
> Tim Moore
> DNS/Linux/Cisco Admin
> ODJFS
>



-- 
Jan Scholten
Research and Development Intern
Iconz.co.nz


-------------- next part --------------
A non-text attachment was scrubbed...
Name: TEXT.htm
Type: application/octet-stream
Size: 8908 bytes
Desc: not available
URL: <https://www.monitoring-lists.org/archive/users/attachments/20041007/c29aeaa3/attachment.obj>


More information about the Users mailing list