Uptime error

sujith h sujith.linux at gmail.com
Wed Feb 28 11:22:58 CET 2007


I think i foregot to explain a crucial point that nagios is running in my
router not in my
machine. If i had told u my machine, then I really apolagise for stealing
your precious
time. And is there anyway that I can trigger the check_uptime (plugin
written by me)
all the time when the router is up???. If so please do tell me. If the
router is down and
nagios doesnt run then its ok for me. Since that is a different issue. But
from your reply
I came to understand that check_uptime will be called for the first time
when the nagios
is started and then if any of the services fails then again the check_uptime
is called.


Sujith

Bangalore.

On 2/28/07, Andy Shellam (Mailing Lists) <
andy.shellam-lists at mailnetwork.co.uk> wrote:
>
>  Again, as I and Patrick have said, your host's check_command is only
> getting run when a service is deemed to have problems.
>
> You're getting the difference in the uptime output in Nagios and the
> console because Nagios hasn't run the uptime command for the host for over a
> day.
> If you're not retaining status information, then when you restart Nagios,
> it re-runs all it's checks, hence why it then gets updated.  After that it
> is only run when a service fails.
>
> What I still don't understand is how your uptime command ensures the
> router is up?  If the router is not up, then Nagios won't be running (as
> you're running it on the same host) so it seems quite pointless really.  If
> the Lanlink checks that the LAN interface is up and connected - that makes
> sense, but then a check_ping to 127.0.0.1 as your host check_command would
> give the same result as the uptime, then you could have an "Uptime" service
> with your check_uptime command.
>
> That way you could be confident that the status detail in Nagios is
> reasonably up-to-date.
>
> Andy.
>
> sujith h wrote:
>
> Ok let me explain the scenario in detail.
> I have  my router in which nagios is installed.
> I have to monitor a few services like Lanstatus of
> net1 interface, Webserver, and DNS.
> Well now for this I have a host.cfg and services.cfg file
> where I had configured this way:
> Host.cfg
> define hostgroup{
>     hostgroup_name          Enpaq
>     alias                   Enpaq
>     members                 sujith
> }
>
> define host{
>     host_name               sujith
>     alias                   Enpaq Router
>     address                 127.0.0.1
>     max_check_attempts      1
>     check_command           check_uptime
>     check_period            24x7
>     contact_groups          admin
>     notification_interval   0
>     notification_period     24x7
>     notification_options    d,u,r,f
> }
>
> And services.cfg
> define service{
>     host_name               sujith
>     service_description     LAN LINK
>     check_command           check_link!net1
>     max_check_attempts      1
>     normal_check_interval   1
>     retry_check_interval    1
>     check_period            24x7
>     notification_interval   2
>     notification_period     24x7
>     notification_options    w,u,c,r,f
>     contact_groups          admin
> }
>
> define service{
>     host_name               sujith
>     service_description     WEB SERVER
>     check_command           check_procs!1:!1:!apache2!SNSlSsRl
>     max_check_attempts      1
>     normal_check_interval   1
>     retry_check_interval    1
>     check_period            24x7
>     notification_interval   2
>     notification_period     24x7
>     notification_options    w,u,c,r,f
>     contact_groups          admin
> }
>
>
> define service{
>     host_name               sujith
>     service_description     DNS
>     check_command           check_dig!sujith.elina.in
>     max_check_attempts      1
>     normal_check_interval   1
>     retry_check_interval    1
>     check_period            24x7
>     notification_interval   2
>     notification_period     24x7
>     notification_options    w,u,c,r,f
>     contact_groups          admin
> }
>
> This is all about the 2 files. And let me give the output also.
> When I type the command uptime in my router I see like this:
> sujith at sujith:~$uptime
>  15:15:50 up 4 days,  4:54,  1 user,  load average: 0.76, 0.70, 0.59
>
> And when I see the nagios page for the Host Detail section, the Status
> Information shows me:
>
> 20:17:46 up 3 days, 9:56, 1 user, load average: 0.89, 0.86, 0.71
>
> But the output that I had given you is from the running nagios page.
> If i would restart nagios then everything will be ok for a while.
> This is what I want to know why this output comes like this.
> Again in the /var/nagios/status.dat file is there which when I do
> ls -lt command , it shows me that its getting updated in each and
> every minute. Now if u again have any doubt on what I am trying to
> tell please do mail me. Again I do repeat that when I do
> /usr/local/nagios/bin/nagios -v /etc/nagios/etc/nagios.cfg
> It doesn't give me any error. But this misbehaviour is worrying
> me....
>
>
> Sujith
>
> Bangalore
>
> On 2/28/07, Andy Shellam (Mailing Lists) <
> andy.shellam-lists at mailnetwork.co.uk> wrote:
>
> > I think you need to explain what you're trying to do overall, as this
> > isn't making any sense to me!
> >
> > You cannot possibly determine which server is responding using it's
> > uptime output.
> > A more sensible option surely would be it's hostname?
> >
> > #!/bin/bash
> > hostname
> > exit 0
> >
> > However you're going to run into a lot of trouble trying to use Nagios
> > to monitor a server using its dynamic IP address.
> > I think to start with you need to be looking at a solution such as
> > Dynamic DNS (e.g. No-IP, DynDNS for commercial services, or Bind with
> > DNSSEC/TSIG if you use Bind within your organisation.)
> > Your machines would then change their DNS address when they detect their
> > IP address has changed, and you would definite it's dynamic DNS hostname as
> > the host_name directive to Nagios.
> > Only thing you'd have to watch would be if your DNS service fails,
> > Nagios will fail on everything.
> >
> > Re your last e-mail, please give example outputs of what you see on the
> > command-line and what Nagios says.
> >
> > Andy.
> >
> > sujith h wrote:
> >
> > Actually the problem with the ping is that we are giving dynamic IP
> > address
> > so when the ip address changes its difficult to know who is responding
> > to the
> > ping.  And so we moved to ssh and uptime...
> >
> > Sujith
> >
> > Bangalore.
> >
> >  On 2/28/07, sujith h < sujith.linux at gmail.com> wrote:
> >
> > > Hi Andy,
> > >
> > > Actually my problem is that it works for some 1 day or more. But after
> > > that I get these sort of
> > > results. So thats what makes me worry...
> > >
> > > Sujith
> > > Bangalore
> > >
> > >
> > >  On 2/28/07, Andy Shellam (Mailing Lists) <andy.shellam-lists at mailnetwork.co.uk >
> > > wrote:
> > >
> > > >  As Patrick said, this plugin (as a host check) will only get run if
> > > > a service goes into a non-OK state, therefore showing old information until
> > > > it a service fails.
> > > >
> > > > I have exactly the same plugin (written myself with a couple of
> > > > differences to yours) which runs as a service called "Uptime" on every host
> > > > - this way it is always run every 5 minutes and shows up-to-date info.
> > > >
> > > > The host check is a simple check_ping.
> > > >
> > > > HTH
> > > >
> > > > Andy.
> > > >
> > > > sujith h wrote:
> > > >
> > > >
> > > > No am running this plugin for host check only
> > > >
> > > > Sujith
> > > >  On 2/28/07, Morris, Patrick < patrick.morris at hp.com > wrote:
> > > > >
> > > > > > when I click the Host Detail I can see that
> > > > > > in the status information section we have a different output..
> > > > > such
> > > > > aser
> > > > > > 20:17:46 up 3 days, 9:56, 1 user, load average: 0.89, 0.86, 0.71
> > > > >
> > > > > > Here u can see that am not getting the output synchronized.
> > > > >
> > > > > >               But the problem I had found with uptime only in
> > > > > > the Host Detail. The Service Details
> > > > > >               are running good as if now.
> > > > >
> > > > > Are you running this command as both a host check and a service
> > > > > check?
> > > > > If so, that's why you're seeing different output. The host check
> > > > > will
> > > > > only run if a service on it goes to a non-OK state, and probably
> > > > > hasn't
> > > > > run for a few days, when the host had only been up for two
> > > > > minutes.
> > > > >
> > > > > It looks like you've defined the same plugin for two checks. If
> > > > > that's
> > > > > the case, they are never going to match.
> > > > >
> > > > >
> > > >  ------------------------------
> > > >
> > > > -------------------------------------------------------------------------
> > > > Take Surveys. Earn Cash. Influence the Future of IT
> > > > Join SourceForge.net's Techsay panel and you'll get the chance to share your
> > > >
> > > >
> > > >
> > > > opinions on IT & business topics through brief surveys-and earn cash
> > > >
> > > >
> > > >
> > > > http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
> > > >
> > > >
> > > >   ------------------------------
> > > >
> > > > _______________________________________________
> > > > Nagios-users mailing list
> > > > Nagios-users at lists.sourceforge.net
> > > >
> > > >
> > > > https://lists.sourceforge.net/lists/listinfo/nagios-users
> > > > ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue.
> > > > ::: Messages without supporting info will risk being sent to /dev/null
> > > >
> > > > !DSPAM:37,45e52941103001189734924!
> > > >
> > > >
> > > >
> > > > --
> > > > Andy Shellam
> > > > NetServe Support Team
> > > >
> > > > the Mail Network
> > > > "an alternative in a standardised world"
> > > >
> > > > p: +44 (0) 121 288 0832/0839
> > > > m: +44 (0) 7818 000834
> > > >
> > > >
> > >
> > !DSPAM:37,45e545e1103007241816293!
> >
> >
> >
> > --
> > Andy Shellam
> > NetServe Support Team
> >
> > the Mail Network
> > "an alternative in a standardised world"
> >
> > p: +44 (0) 121 288 0832/0839
> > m: +44 (0) 7818 000834
> >
> >
> !DSPAM:37,45e55161103009057313425!
>
>
>
> --
> Andy Shellam
> NetServe Support Team
>
> the Mail Network
> "an alternative in a standardised world"
>
> p: +44 (0) 121 288 0832/0839
> m: +44 (0) 7818 000834
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www.monitoring-lists.org/archive/users/attachments/20070228/c981d2b4/attachment.html>
-------------- next part --------------
-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
-------------- next part --------------
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null


More information about the Users mailing list