Bug in Performance Data

Lawrence Findley larryfindley at yahoo.com
Fri Aug 6 00:45:17 CEST 2010


Thank you, Ethan for your response. 
The CGI reads status.dat. 

Here are some lines from one of the blocks:
servicestatus {
        host_name=wtf3a
        service_description=check_ses
        modified_attributes=0
        check_command=check_ddnfaults_ses
        check_period=24x7
        notification_period=24x7
        check_interval=5.000000
        retry_interval=1.000000
        event_handler=
        has_been_checked=1
        should_be_scheduled=1
        check_execution_time=46912587.078
        check_latency=0.399
        check_type=0
        current_state=0
        last_hard_state=0
        last_event_id=87848
        current_event_id=87862
        current_problem_id=0
        last_problem_id=43437
        current_attempt=1
        max_attempts=3
        state_type=1
        last_state_change=1280725323
        last_hard_state_change=1280082023
        last_time_ok=1281047224
        last_time_warning=0
        last_time_unknown=1280725253
        last_time_critical=1280081723
        plugin_output=CHECK_DDN_ENCLOSURE OK - No errors were found.
***

Please notice the check_execution time as more than a year in seconds. 

I don't see anything time-change related in the logs. After I filter out 
host/service alerts/notifications, nothing but auto-saves and start/stop 
information remain as follows: 


[1281028702] Auto-save of retention data completed successfully.
[1281028706] Caught SIGTERM, shutting down...
[1281028706] Successfully shutdown... (PID=6509)
[1281028706] Event broker module '/usr/local/nagios/modules/dnxServer.so' 
deinitialized successfully.
[1281028727] Nagios 3.2.1 starting... (PID=28820)
[1281028727] Local time is Thu Aug 05 10:18:47 PDT 2010
[1281028727] LOG VERSION: 2.0
[1281028727] Event broker module '/usr/local/nagios/modules/dnxServer.so' 
initialized successfully.
[1281028728] Finished daemonizing... (New PID=28821)
[1281028763] EXTERNAL COMMAND: 
SCHEDULE_FORCED_SVC_CHECK;nagios06;check_app_java_cluster;1281028760
[1281029029] Auto-save of retention data completed successfully.


Here is the detail from the 
https://nagios06.internal.shutterfly.com/nagios/cgi-bin/extinfo.cgi?type=4

Metric
Min.
Max.
Average
Check Execution Time:   0.00 sec 46912714.32 sec 26955114.505 sec 
Check Latency: 0.00 sec 3.40 sec 0.277 sec 
Percent State Change: 0.00% 37.43% 0.33%  


If I stop Nagios and remove retention.dat and status.dat and restart fresh, 
Nagios looks normal for about 2 minutes and then reports the 1.5 year execution 
time. 


Any idea on how to investigate and fix this bug? 

Thank you!

-Larry Findley

Sr. Systems Engineer 
Shutterfly

lfindley at shutterfly.com 


________________________________
From: Ethan Galstad <egalstad at nagios.org>
To: Nagios Developers List <nagios-devel at lists.sourceforge.net>
Sent: Wed, August 4, 2010 6:22:10 PM
Subject: Re: [Nagios-devel] Bug in Performance Data

Are there any message in the Nagios log file that relate to detected 
time changes?

The (stated) execution time for these checks is approx 542 days, which 
is strange.  Most time issues would show just a few hours offset, not 
almost 2 years time.

What times are reflected in the status.dat file?  Are you sure your CGI 
script is reading/processing the correct values from that file?


- Ethan Galstad



Lawrence Findley wrote:
> Hello Folks,
> This info is from a cgi script that we use to show execution times. 
> These are obviously incorrect. None of the checks actually take more 
> than a few seconds to complete.
> Any ideas? Thank you.
> -Larry Findley
>  
>  
> 
>  
> Wed Aug 4 17:00:01 PDT 2010
> 
> 
>  Top 10 Service Check Execution Times
> 
> HOST     SERVICE     TIME
> im477     check_rdf_content     46912687.968
> vividpics104     check_lab_min_procs     46912684.576
> vividpics158b     check-win-mem     46912683.695
> vividpics110e     check-win-cpu     46912683.695
> grf133     check_all_local_disk     46912683.695
> vividpics147c     check-win-mem     46912683.695
> vividpics162e     check-win-disk     46912683.695
> vividpics156d     check-win-disk     46912683.576
> vividpics144e     check-win-disk     46912683.576
> vividpics165b     check-win-mem     46912683.576
> 
> 
> ------------------------------------------------------------------------
> *From:* Lawrence Findley <larryfindley at yahoo.com>
> *To:* Nagios Developers List <nagios-devel at lists.sourceforge.net>
> *Sent:* Wed, August 4, 2010 4:35:20 PM
> *Subject:* Re: [Nagios-devel] Bug in Performance Data
> 
> Yes, Benny,
> we run ntp to keep everything correct.
> We use 4 satellites with DNX and I also verified that the time is 
> correct on all of the satellites too.
>  
> Wed Aug  4 16:26:13 PDT 2010
> nagios at nagios06 <mailto:nagios at nagios06> /usr/local/nagios/etc/objects $
> The Nagios server is not a VM.
>  
> Thank you for taking a look at this!
> -Larry Findley
>  
> Shutterfly
>  
> 
>  
> 
> ------------------------------------------------------------------------
> *From:* C. Bensend <benny at bennyvision.com>
> *To:* nagios-devel at lists.sourceforge.net
> *Sent:* Wed, August 4, 2010 12:15:27 PM
> *Subject:* Re: [Nagios-devel] Bug in Performance Data
> 
> 
> Is the time synced properly on your Nagios host?
> 
> Is this a VM?
> 
> Benny
> 
> 
>  > I found a bug where the performance stats do not reflect execution times
>  > accurately.
>  >
>  > version 3.2.1 shows execution times to be millions of seconds.
>  > Any ideas?
>  > Thank you.
>  > -Larry Findley
>  >
>  >  Monitoring Performance
>  > Service Check Execution Time: 0.00 / 46912714.32 / 30610220.908 sec
>  > Service Check Latency: 0.00 / 3.40 / 0.242 sec
>  > Host Check Execution Time: 0.01 / 8.02 / 1.077 sec
>  > Host Check Latency: 0.00 / 1.13 / 0.403 sec
>  > # Active Host / Service Checks: 2159 / 18342
>  > # Passive Host / Service Checks: 0 / 0
>  >
>  >
>  > ________________________________
>  > From: Lawrence Findley <larryfindley at yahoo.com 
> <mailto:larryfindley at yahoo.com>>
>  > To: Nagios Developers List <nagios-devel at lists.sourceforge.net 
> <mailto:nagios-devel at lists.sourceforge.net>>
>  > Sent: Tue, August 3, 2010 12:42:12 PM
>  > Subject: [Nagios-devel] 1.5 year execution time?
>  >
>  >
>  > Here is the text:
>  >
>  >
>  >  Monitoring Performance
>  > Service Check Execution Time: 0.00 / 46912714.32 / 30610220.908 sec
>  > Service Check Latency: 0.00 / 3.40 / 0.242 sec
>  > Host Check Execution Time: 0.01 / 8.02 / 1.077 sec
>  > Host Check Latency: 0.00 / 1.13 / 0.403 sec
>  > # Active Host / Service Checks: 2159 / 18342
>  > # Passive Host / Service Checks: 0 / 0
>  > I have tried removing status.dat and retention files. Within a few
>  > minutes, it
>  > goes back to these numbers.
>  >
>  > Anyone with idea?
>  > Thank you.
>  > -Larry Findley
>  >
>  >
>  >      
> ------------------------------------------------------------------------------
>  > The Palm PDK Hot Apps Program offers developers who use the
>  > Plug-In Development Kit to bring their C/C++ apps to Palm for a share
>  > of $1 Million in cash or HP Products. Visit us here for more details:
>  > 
> 
http://p.sf.net/sfu/dev2dev-palm_______________________________________________
>  > Nagios-devel mailing list
>  > Nagios-devel at lists.sourceforge.net 
> <mailto:Nagios-devel at lists.sourceforge.net>
>  > https://lists.sourceforge.net/lists/listinfo/nagios-devel
>  >
> 
> 
> -- 
> "Something's going on in this house - last night, I saw a face!"
> "Did it have a nose?"
> "Yes!"
> "That sounds like a face all right."
>                                      -- Scary Movie 4
> 

------------------------------------------------------------------------------
The Palm PDK Hot Apps Program offers developers who use the
Plug-In Development Kit to bring their C/C++ apps to Palm for a share
of $1 Million in cash or HP Products. Visit us here for more details:
http://p.sf.net/sfu/dev2dev-palm
_______________________________________________
Nagios-devel mailing list
Nagios-devel at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-devel



      
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www.monitoring-lists.org/archive/developers/attachments/20100805/5990b8e4/attachment.html>
-------------- next part --------------
------------------------------------------------------------------------------
This SF.net email is sponsored by 

Make an app they can't live without
Enter the BlackBerry Developer Challenge
http://p.sf.net/sfu/RIM-dev2dev 
-------------- next part --------------
_______________________________________________
Nagios-devel mailing list
Nagios-devel at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-devel


More information about the Developers mailing list