serious performance issue

shadih rahman shadhin71 at gmail.com
Thu Apr 9 15:55:08 CEST 2009


Now my nagios is not running any check at all.  I get a lot of "looks like
it was orphaned" message and then nagios just sit there.  Can someone help
me with this.  I will add some entries from nagios.debug and  nagios.log
along with my nagios.cfg.  Thanks in advance.




nagios.debug:

[1239284464.560241] [016.2] [pid=15690] Found another host check event for
this
host @ Thu Apr  9 08:59:56 2009
[1239284464.560248] [016.2] [pid=15690] New host check event occurs after
the ex
isting event, so we'll ignore it.
[1239284464.560253] [016.2] [pid=15690] Keeping original host check event
(ignor
ing the new one).
[1239284464.560261] [016.1] [pid=15690] ** Async check result for host
'iab323pc
20.atg.columbia.edu' handled: new state=0



nagios.log:


[1239254607] Warning: The check of host 'et251pc70.atg.columbia.edu' looks
like
it was orphaned (results never came back).  I'm scheduling an immediate
check of
 the host...
[1239254607] Warning: The check of host 'et251pc71.atg.columbia.edu' looks
like
it was orphaned (results never came back).  I'm scheduling an immediate
check of
 the host...
[1239254607] Warning: The check of host 'et251pc72.atg.columbia.edu' looks
like
it was orphaned (results never came back).  I'm scheduling an immediate
check of
 the host...


nagiostats:

Nagios Stats 3.0.6
Copyright (c) 2003-2008 Ethan Galstad (www.nagios.org)
Last Modified: 12-01-2008
License: GPL

CURRENT STATUS DATA
------------------------------------------------------
Status File:                            /var/log/nagios/status.dat
Status File Age:                        0d 0h 0m 4s
Status File Version:                    3.0.6

Program Running Time:                   0d 15h 37m 5s
Nagios PID:                             15690
Used/High/Total Command Buffers:        0 / 1 / 4096

Total Services:                         2783
Services Checked:                       2783
Services Scheduled:                     2782
Services Actively Checked:              2783
Services Passively Checked:             0
Total Service State Change:             0.000 / 38.820 / 0.328 %
Active Service Latency:                 244.062 / 37353.761 / 22185.948 sec
Active Service Execution Time:          0.010 / 15.072 / 0.293 sec
Active Service State Change:            0.000 / 38.820 / 0.328 %
Active Services Last 1/5/15/60 min:     0 / 0 / 0 / 0
Passive Service Latency:                0.000 / 0.000 / 0.000 sec
Passive Service State Change:           0.000 / 0.000 / 0.000 %
Passive Services Last 1/5/15/60 min:    0 / 0 / 0 / 0
Services Ok/Warn/Unk/Crit:              2571 / 14 / 143 / 55
Services Flapping:                      19
Services In Downtime:                   0

Total Hosts:                            3037
Hosts Checked:                          3005
Hosts Scheduled:                        3030
Hosts Actively Checked:                 3037
Host Passively Checked:                 0
Total Host State Change:                0.000 / 57.170 / 0.448 %
Active Host Latency:                    0.000 / 36712.008 / 19785.947 sec
Active Host Execution Time:             0.000 / 30.011 / 1.589 sec
Active Host State Change:               0.000 / 57.170 / 0.448 %
Active Hosts Last 1/5/15/60 min:        0 / 0 / 0 / 299
Passive Host Latency:                   0.000 / 0.000 / 0.000 sec
Passive Host State Change:              0.000 / 0.000 / 0.000 %
Passive Hosts Last 1/5/15/60 min:       0 / 0 / 0 / 0
Hosts Up/Down/Unreach:                  2854 / 183 / 0
Hosts Flapping:                         16
Hosts In Downtime:                      0

Active Host Checks Last 1/5/15 min:     0 / 0 / 0
   Scheduled:                           0 / 0 / 0
   On-demand:                           0 / 0 / 0
   Parallel:                            0 / 0 / 0
   Serial:                              0 / 0 / 0
   Cached:                              0 / 0 / 0
Passive Host Checks Last 1/5/15 min:    0 / 0 / 0
Active Service Checks Last 1/5/15 min:  0 / 0 / 0
   Scheduled:                           0 / 0 / 0
   On-demand:                           0 / 0 / 0
   Cached:                              0 / 0 / 0
Passive Service Checks Last 1/5/15 min: 0 / 0 / 0

External Commands Last 1/5/15 min:      0 / 0 / 0



nagios.cfg:

log_file=/var/log/nagios/nagios.log
cfg_file=/etc/nagios/commands.cfg
cfg_file=/etc/nagios/contacts.cfg
cfg_file=/etc/nagios/timeperiods.cfg
cfg_file=/etc/nagios/templates.cfg
cfg_dir=/etc/nagios/hosts
cfg_dir=/etc/nagios/services
object_cache_file=/var/log/nagios/objects.cache
precached_object_file=/var/log/nagios/objects.precache
resource_file=/etc/nagios/resource.cfg
status_file=/var/log/nagios/status.dat
status_update_interval=60
nagios_user=nagios
nagios_group=nagios
check_external_commands=1
command_check_interval=-1
command_file=/var/log/nagios/rw/nagios.cmd
external_command_buffer_slots=4096
lock_file=/var/log/nagios/nagios.lock
temp_file=/var/log/nagios/nagios.tmp
temp_path=/tmp
event_broker_options=8
broker_module=/usr/lib64/nagios/ndomod.o config_file=/etc/nagios/ndomod.cfg
log_rotation_method=m
log_archive_path=/var/log/nagios/archives
use_syslog=1
log_notifications=1
log_service_retries=1
log_host_retries=1
log_event_handlers=1
log_initial_states=0
log_external_commands=1
log_passive_checks=1
service_inter_check_delay_method=s
max_service_check_spread=30
service_interleave_factor=s
host_inter_check_delay_method=s
max_host_check_spread=30
max_concurrent_checks=0
check_result_reaper_frequency=10
max_check_result_reaper_time=20
check_result_path=/var/log/nagios/spool/checkresults
max_check_result_file_age=3600
cached_host_check_horizon=15
cached_service_check_horizon=15
enable_predictive_host_dependency_checks=1
enable_predictive_service_dependency_checks=1
soft_state_dependencies=0
auto_reschedule_checks=0
auto_rescheduling_interval=30
auto_rescheduling_window=180
sleep_time=0.25
service_check_timeout=60
host_check_timeout=30
event_handler_timeout=30
notification_timeout=60
ocsp_timeout=5
perfdata_timeout=5
retain_state_information=1
state_retention_file=var/log/nagios/retention.dat
retention_update_interval=60
use_retained_program_state=1
use_retained_scheduling_info=1
retained_host_attribute_mask=0
retained_service_attribute_mask=0
retained_process_host_attribute_mask=0
retained_process_service_attribute_mask=0
retained_contact_host_attribute_mask=0
retained_contact_service_attribute_mask=0
interval_length=60
use_aggressive_host_checking=0
execute_service_checks=1
accept_passive_service_checks=1
execute_host_checks=1
accept_passive_host_checks=1
enable_notifications=1
enable_event_handlers=1
process_performance_data=0
translate_passive_host_checks=0
passive_host_checks_are_soft=0
check_for_orphaned_services=1
check_for_orphaned_hosts=1
check_service_freshness=1
service_freshness_check_interval=60
check_host_freshness=0
host_freshness_check_interval=60
additional_freshness_latency=15
enable_flap_detection=1
low_service_flap_threshold=5.0
high_service_flap_threshold=20.0
low_host_flap_threshold=5.0
high_host_flap_threshold=20.0
date_format=us
enable_embedded_perl=0
use_embedded_perl_implicitly=0
illegal_object_name_chars=`~!$%^&*|'"<>?,()=
illegal_macro_output_chars=`~$&|'"<>
use_regexp_matching=0
use_true_regexp_matching=0
admin_email=sr2690 at columbia.edu
daemon_dumps_core=0
use_large_installation_tweaks=1
enable_environment_macros=1
debug_level=-1
debug_verbosity=2
debug_file=/var/log/nagios/nagios.debug
max_debug_file_size=1000000


On Wed, Apr 8, 2009 at 1:56 AM, fancyrabbit <fancyrabbit at gmail.com> wrote:

> i met almost the same issue.
> after tweaking enable_embedded_perl=0, the load average was brought up but
> latencies became lower.
>
> On Wed, Apr 8, 2009 at 11:54 AM, shadih rahman <shadhin71 at gmail.com>wrote:
>
>> I am seeing a ton of orphaned error message for both services and hosts.
>> I am running nagios on a quad core 2.2 GHZ machine running 4 GHZ memory.  I
>> will paste my configuration file below.  I have the machine sending ndo to a
>> local database sitting on a 170 GB Hard drive.  nagios is obcessing on both
>> host and services and sending data to a machine with identical
>> configuration.  I am doing failover using NSCA.  Please advise on this.
>>
>>
>>
>>
>>
>> nagios.cfg
>>
>>
>>
>> log_file=/var/log/nagios/nagios.log
>> cfg_file=/etc/nagios/commands.cfg
>> cfg_file=/etc/nagios/contacts.cfg
>> cfg_file=/etc/nagios/timeperiods.cfg
>> cfg_file=/etc/nagios/templates.cfg
>> cfg_dir=/etc/nagios/hosts
>> cfg_dir=/etc/nagios/services
>> object_cache_file=/var/log/nagios/objects.cache
>> precached_object_file=/var/log/nagios/objects.precache
>> resource_file=/etc/nagios/resource.cfg
>> status_file=/var/log/nagios/status.dat
>> status_update_interval=60
>> nagios_user=nagios
>> nagios_group=nagios
>> check_external_commands=1
>> command_check_interval=-1
>> command_file=/var/log/nagios/rw/nagios.cmd
>> external_command_buffer_slots=8192
>> lock_file=/var/log/nagios/nagios.lock
>> temp_file=/var/log/nagios/nagios.tmp
>> temp_path=/tmp
>> event_broker_options=8
>> broker_module=/usr/lib64/nagios/ndomod.o
>> config_file=/etc/nagios/ndomod.cfg
>> log_rotation_method=m
>> log_archive_path=/var/log/nagios/archives
>> use_syslog=1
>> log_notifications=1
>> log_service_retries=1
>> log_host_retries=1
>> log_event_handlers=1
>> log_initial_states=0
>> log_external_commands=1
>> log_passive_checks=1
>> service_inter_check_delay_method=n
>> max_service_check_spread=30
>> service_interleave_factor=s
>> host_inter_check_delay_method=s
>> max_host_check_spread=30
>> max_concurrent_checks=0
>> check_result_reaper_frequency=2
>> max_check_result_reaper_time=10
>> check_result_path=/var/log/nagios/spool/checkresults
>> max_check_result_file_age=3600
>> cached_host_check_horizon=15
>> cached_service_check_horizon=15
>> enable_predictive_host_dependency_checks=1
>> enable_predictive_service_dependency_checks=1
>> soft_state_dependencies=1
>> auto_reschedule_checks=1
>> auto_rescheduling_interval=30
>> auto_rescheduling_window=180
>> sleep_time=0.25
>> service_check_timeout=30
>> host_check_timeout=20
>>
>> event_handler_timeout=30
>> notification_timeout=60
>> ocsp_timeout=5
>> perfdata_timeout=5
>> retain_state_information=1
>> state_retention_file=var/log/nagios/retention.dat
>> retention_update_interval=60
>> use_retained_program_state=1
>> use_retained_scheduling_info=1
>> retained_host_attribute_mask=0
>> retained_service_attribute_mask=0
>> retained_process_host_attribute_mask=0
>> retained_process_service_attribute_mask=0
>> retained_contact_host_attribute_mask=0
>> retained_contact_service_attribute_mask=0
>> interval_length=60
>> use_aggressive_host_checking=0
>> execute_service_checks=1
>> accept_passive_service_checks=1
>> execute_host_checks=1
>> accept_passive_host_checks=1
>> enable_notifications=1
>> enable_event_handlers=1
>> process_performance_data=0
>> obsess_over_services=1
>> ocsp_command=send_service_check
>> ochp_command=send_host_check
>> obsess_over_hosts=1
>> translate_passive_host_checks=0
>> passive_host_checks_are_soft=0
>> check_for_orphaned_services=1
>> check_for_orphaned_hosts=1
>> check_service_freshness=1
>> service_freshness_check_interval=60
>> check_host_freshness=0
>> host_freshness_check_interval=60
>> additional_freshness_latency=15
>> enable_flap_detection=1
>> low_service_flap_threshold=5.0
>> high_service_flap_threshold=20.0
>> low_host_flap_threshold=5.0
>> high_host_flap_threshold=20.0
>> date_format=us
>> enable_embedded_perl=1
>> use_embedded_perl_implicitly=1
>> illegal_object_name_chars=`~!$%^&*|'"<>?,()=
>> illegal_macro_output_chars=`~$&|'"<>
>> use_regexp_matching=0
>> use_true_regexp_matching=0
>> admin_email=sr2690 at columbia.edu
>> daemon_dumps_core=0
>> use_large_installation_tweaks=1
>> enable_environment_macros=1
>> debug_level=-1debug_verbosity=2
>> debug_file=/var/log/nagios/nagios.debug
>> max_debug_file_size=1000000
>>
>>
>>
>>
>> my nagiostats output
>>
>>
>>
>>
>>
>>
>>
>> [sr2690>nagiostats
>>
>> Nagios Stats 3.0.6
>> Copyright (c) 2003-2008 Ethan Galstad (www.nagios.org)
>> Last Modified: 12-01-2008
>> License: GPL
>>
>> CURRENT STATUS DATA
>> ------------------------------------------------------
>> Status File:                            /var/log/nagios/status.dat
>> Status File Age:                        0d 0h 0m 19s
>> Status File Version:                    3.0.6
>>
>> Program Running Time:                   0d 2h 5m 28s
>> Nagios PID:                             12139
>> Used/High/Total Command Buffers:        0 / 0 / 8192
>>
>> Total Services:                         2783
>> Services Checked:                       2783
>> Services Scheduled:                     2782
>> Services Actively Checked:              2783
>> Services Passively Checked:             0
>> Total Service State Change:             0.000 / 52.830 / 0.263 %
>> Active Service Latency:                 1.304 / 12092.843 / 1469.130 sec
>> Active Service Execution Time:          0.011 / 15.103 / 0.468 sec
>> Active Service State Change:            0.000 / 52.830 / 0.263 %
>> Active Services Last 1/5/15/60 min:     0 / 0 / 0 / 129
>> Passive Service Latency:                0.000 / 0.000 / 0.000 sec
>> Passive Service State Change:           0.000 / 0.000 / 0.000 %
>> Passive Services Last 1/5/15/60 min:    0 / 0 / 0 / 0
>> Services Ok/Warn/Unk/Crit:              2560 / 13 / 186 / 24
>> Services Flapping:                      17
>> Services In Downtime:                   0
>>
>> Total Hosts:                            3037
>> Hosts Checked:                          3005
>> Hosts Scheduled:                        3029
>> Hosts Actively Checked:                 3037
>> Host Passively Checked:                 0
>> Total Host State Change:                0.000 / 53.620 / 0.227 %
>> Active Host Latency:                    0.000 / 12080.792 / 3770.409 sec
>> Active Host Execution Time:             0.000 / 104.093 / 2.500 sec
>> Active Host State Change:               0.000 / 53.620 / 0.227 %
>> Active Hosts Last 1/5/15/60 min:        0 / 0 / 0 / 256
>> Passive Host Latency:                   0.000 / 0.000 / 0.000 sec
>> Passive Host State Change:              0.000 / 0.000 / 0.000 %
>> Passive Hosts Last 1/5/15/60 min:       0 / 0 / 0 / 0
>> Hosts Up/Down/Unreach:                  2849 / 188 / 0
>> Hosts Flapping:                         10
>> Hosts In Downtime:                      0
>>
>> Active Host Checks Last 1/5/15 min:     0 / 0 / 1
>>    Scheduled:                           0 / 0 / 0
>>    On-demand:                           0 / 0 / 1
>>    Parallel:                            0 / 0 / 0
>>    Serial:                              0 / 0 / 0
>>    Cached:                              0 / 0 / 1
>> Passive Host Checks Last 1/5/15 min:    0 / 0 / 0
>> Active Service Checks Last 1/5/15 min:  0 / 0 / 0
>>    Scheduled:                           0 / 0 / 0
>>    On-demand:                           0 / 0 / 0
>>    Cached:                              0 / 0 / 0
>> Passive Service Checks Last 1/5/15 min: 0 / 0 / 0
>>
>> External Commands Last 1/5/15 min:      0 / 0 / 0
>>
>>
>>
>>
>>
>>
>> --
>> Cordially,
>> Shadhin Rahman
>>
>>
>> ------------------------------------------------------------------------------
>> This SF.net email is sponsored by:
>> High Quality Requirements in a Collaborative Environment.
>> Download a free trial of Rational Requirements Composer Now!
>> http://p.sf.net/sfu/www-ibm-com
>> _______________________________________________
>> Nagios-users mailing list
>> Nagios-users at lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/nagios-users
>> ::: Please include Nagios version, plugin version (-v) and OS when
>> reporting any issue.
>> ::: Messages without supporting info will risk being sent to /dev/null
>>
>
>
>
> ------------------------------------------------------------------------------
> This SF.net email is sponsored by:
> High Quality Requirements in a Collaborative Environment.
> Download a free trial of Rational Requirements Composer Now!
> http://p.sf.net/sfu/www-ibm-com
> _______________________________________________
> Nagios-users mailing list
> Nagios-users at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nagios-users
> ::: Please include Nagios version, plugin version (-v) and OS when
> reporting any issue.
> ::: Messages without supporting info will risk being sent to /dev/null
>



-- 
Cordially,
Shadhin Rahman
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www.monitoring-lists.org/archive/users/attachments/20090409/00e26554/attachment.html>
-------------- next part --------------
------------------------------------------------------------------------------
This SF.net email is sponsored by:
High Quality Requirements in a Collaborative Environment.
Download a free trial of Rational Requirements Composer Now!
http://p.sf.net/sfu/www-ibm-com
-------------- next part --------------
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null


More information about the Users mailing list