question about slow startup and retained data

Frost, Mark {PBC} mark.frost1 at pepsico.com
Sun Oct 17 06:10:27 CEST 2010


After adding a fair number of hosts/services based on templates -- all with a number of dependent services -- we're seeing Nagios taking a fair amount of time to start up now.  We're using Nagios 3.2.1.  Startup times seemed to be in the vicinity of 4 minutes.  During that time Nagios chews up 100% of one CPU core and eventually 2 CPU cores, then settles down.  I assumed it was time for me to investigate the fast-startup options and deal with at least the dependency checking.

Note that this host in question is the "central" node in a distributed setup so virtually everything it gets is a passive check result.

When I tried starting with '-s', I found that the first block (Object Config Processing Times) went very quickly and then it hung on the second block (Retention Data Times) ran for a while as indicated.  Everything else after that seemed to go fairly quickly to my surprise.  So apparently, my problem is with retained data.

My relevant nagios.cfg entries are as follows:

retain_state_information=1
retention_update_interval=60
use_retained_program_state=1
use_retained_scheduling_info=1
retained_host_attribute_mask=0
retained_service_attribute_mask=0
retained_process_host_attribute_mask=0
retained_process_service_attribute_mask=0
retained_contact_host_attribute_mask=0
retained_contact_service_attribute_mask=0

So we definitely want to make use of historical data.   I see in the config file comment that using retained state may come at the cost of increased startup times.   None of the speedup options I see seem to say that they try to help startup time with retained status.   Am I stuck?  Do I either need to live with the 194-second processing time (and that will go up as we add more hosts/services over time) or do without retained data?

nagios -s output:

Object Config Source: Config files (uncached)

OBJECT CONFIG PROCESSING TIMES      (* = Potential for precache savings with -u option)
----------------------------------
Read:                 0.042305 sec
Resolve:              0.008222 sec  *
Recomb Contactgroups: 0.002196 sec  *
Recomb Hostgroups:    0.006549 sec  *
Dup Services:         0.026616 sec  *
Recomb Servicegroups: 0.289033 sec  *
Duplicate:            0.012538 sec  *
Inherit:              0.005349 sec  *
Recomb Contacts:      0.000000 sec  *
Sort:                 0.000001 sec  *
Register:             0.076975 sec
Free:                 0.008261 sec
                      ============
TOTAL:                0.478046 sec  * = 0.350505 sec (73.32%) estimated savings


RETENTION DATA TIMES
----------------------------------
Read and Process:     194.016362 sec
                      ============
TOTAL:                194.016362 sec


Timing information on configuration verification is listed below.

CONFIG VERIFICATION TIMES          (* = Potential for speedup with -x option)
----------------------------------
Object Relationships: 0.054524 sec
Circular Paths:       0.822133 sec  *
Misc:                 0.005159 sec
                      ============
TOTAL:                0.881816 sec  * = 0.822133 sec (93.2%) estimated savings


EVENT SCHEDULING TIMES
-------------------------------------
Get service info:        0.014405 sec
Get host info info:      0.001732 sec
Get service params:      0.000013 sec
Schedule service times:  0.000801 sec
Schedule service events: 0.000461 sec
Get host params:         0.000001 sec
Schedule host times:     0.000144 sec
Schedule host events:    0.000038 sec
                         ============
TOTAL:                   0.017595 sec


Projected scheduling information for host and service checks
is listed below.  This information assumes that you are going
to start running Nagios with your current config files.

HOST SCHEDULING INFORMATION
---------------------------
Total hosts:                     870
Total scheduled hosts:           19
Host inter-check delay method:   SMART
Average host check interval:     300.00 sec
Host inter-check delay:          15.79 sec
Max host check spread:           30 min
First scheduled check:           Sat Oct 16 23:26:24 2010
Last scheduled check:            Sat Oct 16 23:28:46 2010


SERVICE SCHEDULING INFORMATION
-------------------------------
Total services:                     7569
Total scheduled services:           34
Service inter-check delay method:   SMART
Average service check interval:     292.94 sec
Inter-check delay:                  8.62 sec
Interleave factor method:           SMART
Average services per host:          8.70
Service interleave factor:          1
Max service check spread:           30 min
First scheduled check:              Sat Oct 16 23:31:16 2010
Last scheduled check:               Sat Oct 16 23:33:26 2010


CHECK PROCESSING INFORMATION
----------------------------
Check result reaper interval:       2 sec
Max concurrent service checks:      Unlimited


PERFORMANCE SUGGESTIONS
-----------------------
I have no suggestions - things look okay.



Thanks

Mark

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www.monitoring-lists.org/archive/users/attachments/20101017/cc8dbf23/attachment.html>
-------------- next part --------------
------------------------------------------------------------------------------
Download new Adobe(R) Flash(R) Builder(TM) 4
The new Adobe(R) Flex(R) 4 and Flash(R) Builder(TM) 4 (formerly 
Flex(R) Builder(TM)) enable the development of rich applications that run
across multiple browsers and platforms. Download your free trials today!
http://p.sf.net/sfu/adobe-dev2dev
-------------- next part --------------
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null


More information about the Users mailing list