Uptime error

Andy Shellam (Mailing Lists) andy.shellam-lists at mailnetwork.co.uk
Wed Feb 28 11:01:45 CET 2007
Previous message: Uptime error
Next message: Uptime error
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Again, as I and Patrick have said, your host's check_command is only 
getting run when a service is deemed to have problems.

You're getting the difference in the uptime output in Nagios and the 
console because Nagios hasn't run the uptime command for the host for 
over a day.
If you're not retaining status information, then when you restart 
Nagios, it re-runs all it's checks, hence why it then gets updated.  
After that it is only run when a service fails.

What I still don't understand is how your uptime command ensures the 
router is up?  If the router is not up, then Nagios won't be running (as 
you're running it on the same host) so it seems quite pointless really.  
If the Lanlink checks that the LAN interface is up and connected - that 
makes sense, but then a check_ping to 127.0.0.1 as your host 
check_command would give the same result as the uptime, then you could 
have an "Uptime" service with your check_uptime command. 

That way you could be confident that the status detail in Nagios is 
reasonably up-to-date.

Andy.

sujith h wrote:
> Ok let me explain the scenario in detail.
> I have  my router in which nagios is installed.
> I have to monitor a few services like Lanstatus of
> net1 interface, Webserver, and DNS.
> Well now for this I have a host.cfg and services.cfg file
> where I had configured this way:
> Host.cfg
> define hostgroup{
>     hostgroup_name          Enpaq
>     alias                   Enpaq
>     members                 sujith
> }
>
> define host{
>     host_name               sujith
>     alias                   Enpaq Router
>     address                 127.0.0.1 <http://127.0.0.1>
>     max_check_attempts      1
>     check_command           check_uptime
>     check_period            24x7
>     contact_groups          admin
>     notification_interval   0
>     notification_period     24x7
>     notification_options    d,u,r,f
> }
>
> And services.cfg
> define service{
>     host_name               sujith
>     service_description     LAN LINK
>     check_command           check_link!net1
>     max_check_attempts      1
>     normal_check_interval   1
>     retry_check_interval    1
>     check_period            24x7
>     notification_interval   2
>     notification_period     24x7
>     notification_options    w,u,c,r,f
>     contact_groups          admin
> }
>
> define service{
>     host_name               sujith
>     service_description     WEB SERVER
>     check_command           check_procs!1:!1:!apache2!SNSlSsRl
>     max_check_attempts      1
>     normal_check_interval   1
>     retry_check_interval    1
>     check_period            24x7
>     notification_interval   2
>     notification_period     24x7
>     notification_options    w,u,c,r,f
>     contact_groups          admin
> }
>
>
> define service{
>     host_name               sujith
>     service_description     DNS
>     check_command           check_dig!sujith.elina.in
>     max_check_attempts      1
>     normal_check_interval   1
>     retry_check_interval    1
>     check_period            24x7
>     notification_interval   2
>     notification_period     24x7
>     notification_options    w,u,c,r,f
>     contact_groups          admin
> }
>
> This is all about the 2 files. And let me give the output also.
> When I type the command uptime in my router I see like this:
> sujith at sujith:~$uptime
>  15:15:50 up 4 days,  4:54,  1 user,  load average: 0.76, 0.70, 0.59
>
> And when I see the nagios page for the Host Detail section, the Status 
> Information shows me:
>
> 20:17:46 up 3 days, 9:56, 1 user, load average: 0.89, 0.86, 0.71 
>
> But the output that I had given you is from the running nagios page.
> If i would restart nagios then everything will be ok for a while.
> This is what I want to know why this output comes like this.
> Again in the /var/nagios/status.dat file is there which when I do
> ls -lt command , it shows me that its getting updated in each and
> every minute. Now if u again have any doubt on what I am trying to
> tell please do mail me. Again I do repeat that when I do
> /usr/local/nagios/bin/nagios -v /etc/nagios/etc/nagios.cfg
> It doesn't give me any error. But this misbehaviour is worrying
> me....
>
>
> Sujith
>
> Bangalore
>
> On 2/28/07, *Andy Shellam (Mailing Lists)* 
> <andy.shellam-lists at mailnetwork.co.uk 
> <mailto:andy.shellam-lists at mailnetwork.co.uk>> wrote:
>
>     I think you need to explain what you're trying to do overall, as
>     this isn't making any sense to me!
>
>     You cannot possibly determine which server is responding using
>     it's uptime output.
>     A more sensible option surely would be it's hostname?
>
>     #!/bin/bash
>     hostname
>     exit 0
>
>     However you're going to run into a lot of trouble trying to use
>     Nagios to monitor a server using its dynamic IP address.
>     I think to start with you need to be looking at a solution such as
>     Dynamic DNS (e.g. No-IP, DynDNS for commercial services, or Bind
>     with DNSSEC/TSIG if you use Bind within your organisation.)
>     Your machines would then change their DNS address when they detect
>     their IP address has changed, and you would definite it's dynamic
>     DNS hostname as the host_name directive to Nagios.
>     Only thing you'd have to watch would be if your DNS service fails,
>     Nagios will fail on everything.
>
>     Re your last e-mail, please give example outputs of what you see
>     on the command-line and what Nagios says.
>
>     Andy.
>
>     sujith h wrote:
>>     Actually the problem with the ping is that we are giving dynamic
>>     IP address
>>     so when the ip address changes its difficult to know who is
>>     responding to the
>>     ping.  And so we moved to ssh and uptime...
>>
>>     Sujith
>>
>>     Bangalore.
>>
>>     On 2/28/07, *sujith h* < sujith.linux at gmail.com
>>     <mailto:sujith.linux at gmail.com>> wrote:
>>
>>         Hi Andy,
>>
>>         Actually my problem is that it works for some 1 day or more.
>>         But after that I get these sort of
>>         results. So thats what makes me worry...
>>
>>         Sujith
>>         Bangalore
>>
>>
>>         On 2/28/07, *Andy Shellam (Mailing Lists)* <
>>         andy.shellam-lists at mailnetwork.co.uk
>>         <mailto:andy.shellam-lists at mailnetwork.co.uk>> wrote:
>>
>>             As Patrick said, this plugin (as a host check) will only
>>             get run if a service goes into a non-OK state, therefore
>>             showing old information until it a service fails.
>>
>>             I have exactly the same plugin (written myself with a
>>             couple of differences to yours) which runs as a service
>>             called "Uptime" on every host - this way it is always run
>>             every 5 minutes and shows up-to-date info.
>>
>>             The host check is a simple check_ping.
>>
>>             HTH
>>
>>             Andy.
>>
>>             sujith h wrote:
>>>
>>>             No am running this plugin for host check only
>>>
>>>             Sujith
>>>             On 2/28/07, *Morris, Patrick* < patrick.morris at hp.com
>>>             <mailto:patrick.morris at hp.com>> wrote:
>>>
>>>                 > when I click the Host Detail I can see that
>>>                 > in the status information section we have a
>>>                 different output.. such
>>>                 aser
>>>                 > 20:17:46 up 3 days, 9:56, 1 user, load average:
>>>                 0.89, 0.86, 0.71
>>>
>>>                 > Here u can see that am not getting the output
>>>                 synchronized.
>>>
>>>                 >               But the problem I had found with
>>>                 uptime only in
>>>                 > the Host Detail. The Service Details
>>>                 >               are running good as if now.
>>>
>>>                 Are you running this command as both a host check
>>>                 and a service check?
>>>                 If so, that's why you're seeing different output.
>>>                 The host check will
>>>                 only run if a service on it goes to a non-OK state,
>>>                 and probably hasn't
>>>                 run for a few days, when the host had only been up
>>>                 for two minutes.
>>>
>>>                 It looks like you've defined the same plugin for two
>>>                 checks. If that's
>>>                 the case, they are never going to match.
>>>
>>>
>>>             ------------------------------------------------------------------------
>>>
>>>             -------------------------------------------------------------------------
>>>             Take Surveys. Earn Cash. Influence the Future of IT
>>>             Join SourceForge.net's Techsay panel and you'll get the chance to share your
>>>
>>>
>>>
>>>             opinions on IT & business topics through brief surveys-and earn cash
>>>
>>>
>>>
>>>             http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV <http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV>
>>>
>>>
>>>               
>>>             ------------------------------------------------------------------------
>>>
>>>             _______________________________________________
>>>             Nagios-users mailing list
>>>             Nagios-users at lists.sourceforge.net
>>>
>>>
>>>              <mailto:Nagios-users at lists.sourceforge.net>
>>>             https://lists.sourceforge.net/lists/listinfo/nagios-users
>>>             ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
>>>             ::: Messages without supporting info will risk being sent to /dev/null
>>>
>>>             !DSPAM:37,45e52941103001189734924!
>>>               
>>
>>
>>             -- 
>>             Andy Shellam
>>             NetServe Support Team
>>
>>             the Mail Network
>>             "an alternative in a standardised world"
>>
>>             p: +44 (0) 121 288 0832/0839
>>             m: +44 (0) 7818 000834
>>
>>
>>
>>     !DSPAM:37,45e545e1103007241816293! 
>
>
>     -- 
>     Andy Shellam
>     NetServe Support Team
>
>     the Mail Network
>     "an alternative in a standardised world"
>
>     p: +44 (0) 121 288 0832/0839
>     m: +44 (0) 7818 000834
>
>
> !DSPAM:37,45e55161103009057313425! 


-- 
Andy Shellam
NetServe Support Team

the Mail Network
"an alternative in a standardised world"

p: +44 (0) 121 288 0832/0839
m: +44 (0) 7818 000834

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www.monitoring-lists.org/archive/users/attachments/20070228/3a993d9c/attachment.html>
-------------- next part --------------
-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
-------------- next part --------------
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null
Previous message: Uptime error
Next message: Uptime error
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Users mailing list