FW: Postgres And Nagios

Marc Powell mpowell at ena.com
Mon Nov 18 05:52:28 CET 2002


Sorry... Let's try this in plain text this time...


Hey all,

I've been struggling off and on with using Postgres as a back-end for
Nagios and have been unable to rectify some obnoxious performance
problems that shouldn't be happening. The machine in question is a quad
processor Compaq Proliant with 2.5GB RAM and a raid 5 array with about
120GB of storage. The filesystem is ext3 with data=journal mount option.
The machine is mostly unused except for Nagios accepting passive service
checks for 2000 services on 1800 hosts. I've tried to do my research
into optimizing postgres but the majority of what I've found is that
postgres really doesn't need optimizing. What I'm seeing is terrible
performance on inserts from Nagios. The following strace snippet of
Nagios starting up illustrates the point quite nicely. As you can see,
the average wait time between inserts is approximately 4-5 seconds.
Multiply that by 3800 inserts and you'll see that those times really
aren't acceptable. Is anyone else seeing this or is it just me? Does
anyone have any pointers? I'm using 1.0b3, postgres 7.1 is running on
the same host as nagios and I vacuum the database nightly.

Thanks in advance,

Marc


time([1037590753])                      = 1037590753
rt_sigaction(SIGPIPE, {SIG_IGN}, {SIG_DFL}, 8) = 0
send(5, "QINSERT INTO servicestatus
(host_name,service_description,service_status,last_update,current_attemp
t,max_attempts,state_type,last_check,next_check,should_be_scheduled,chec
k_type,checks_enabled,accept_passive_checks,event_handler_enabled,last_s
tate_change,problem_acknowledged,last_hard_state,time_ok,time_warning,ti
me_unknown,time_critical,last_notification,current_notification,notifica
tions_enabled,latency,execution_time,flap_detection_enabled,is_flapping,
percent_state_change,scheduled_downtime_depth,failure_prediction_enabled
,process_performance_data,obsess_over_service,plugin_output) VALUES
(\'nateng\\-fa6\\-0\\-ecr1\\-washington\\-tn\',\'PING\',\'PENDING\',date
time(abstime(1037590753)),\'0\',\'5\',\'SOFT\',datetime(abstime(0)),date
time(abstime(0)),\'0\',\'ACTIVE\',\'1\',\'1\',\'1\',datetime(abstime(0))
,\'0\',\'OK\',\'0\',\'0\',\'0\',\'0\',datetime(abstime(0)),\'0\',\'1\',\
'0\',\'0\',\'1\',\'0\',\'0.0\',\'0\',\'1\',\'1\',\'1\',\'Service check
is not scheduled for execution...\')\0", 932, 0) = 932
rt_sigaction(SIGPIPE, {SIG_DFL}, {SIG_IGN}, 8) = 0
select(6, [5], [], [5], NULL)           = 1 (in [5])
recv(5, "Pblank\0CINSERT 843458113 1\0Z", 16384, 0) = 28
time([1037590758])                      = 1037590758
time([1037590758])                      = 1037590758
rt_sigaction(SIGPIPE, {SIG_IGN}, {SIG_DFL}, 8) = 0
send(5, "QINSERT INTO servicestatus
(host_name,service_description,service_status,last_update,current_attemp
t,max_attempts,state_type,last_check,next_check,should_be_scheduled,chec
k_type,checks_enabled,accept_passive_checks,event_handler_enabled,last_s
tate_change,problem_acknowledged,last_hard_state,time_ok,time_warning,ti
me_unknown,time_critical,last_notification,current_notification,notifica
tions_enabled,latency,execution_time,flap_detection_enabled,is_flapping,
percent_state_change,scheduled_downtime_depth,failure_prediction_enabled
,process_performance_data,obsess_over_service,plugin_output) VALUES
(\'nateng\\-filt1\\-davidson\\-tn\',\'FILTERING\',\'PENDING\',datetime(a
bstime(1037590758)),\'0\',\'5\',\'SOFT\',datetime(abstime(0)),datetime(a
bstime(0)),\'0\',\'ACTIVE\',\'1\',\'1\',\'1\',datetime(abstime(0)),\'0\'
,\'OK\',\'0\',\'0\',\'0\',\'0\',datetime(abstime(0)),\'0\',\'1\',\'0\',\
'0\',\'1\',\'0\',\'0.0\',\'0\',\'1\',\'1\',\'1\',\'Service check is not
scheduled for execution...\')\0", 928, 0) = 928
rt_sigaction(SIGPIPE, {SIG_DFL}, {SIG_IGN}, 8) = 0
select(6, [5], [], [5], NULL)           = 1 (in [5])
recv(5, "Pblank\0CINSERT 843458115 1\0Z", 16384, 0) = 28
time([1037590764])                      = 1037590764
time([1037590764])                      = 1037590764
rt_sigaction(SIGPIPE, {SIG_IGN}, {SIG_DFL}, 8) = 0
send(5, "QINSERT INTO servicestatus
(host_name,service_description,service_status,last_update,current_attemp
t,max_attempts,state_type,last_check,next_check,should_be_scheduled,chec
k_type,checks_enabled,accept_passive_checks,event_handler_enabled,last_s
tate_change,problem_acknowledged,last_hard_state,time_ok,time_warning,ti
me_unknown,time_critical,last_notification,current_notification,notifica
tions_enabled,latency,execution_time,flap_detection_enabled,is_flapping,
percent_state_change,scheduled_downtime_depth,failure_prediction_enabled
,process_performance_data,obsess_over_service,plugin_output) VALUES
(\'nateng\\-filt1\\-davidson\\-tn\',\'OVER\\-FILTERING\',\'PENDING\',dat
etime(abstime(1037590764)),\'0\',\'5\',\'SOFT\',datetime(abstime(0)),dat
etime(abstime(0)),\'0\',\'ACTIVE\',\'1\',\'1\',\'1\',datetime(abstime(0)
),\'0\',\'OK\',\'0\',\'0\',\'0\',\'0\',datetime(abstime(0)),\'0\',\'1\',
\'0\',\'0\',\'1\',\'0\',\'0.0\',\'0\',\'1\',\'1\',\'1\',\'Service check
is not scheduled for execution...\')\0", 934, 0) = 934
rt_sigaction(SIGPIPE, {SIG_DFL}, {SIG_IGN}, 8) = 0
select(6, [5], [], [5], NULL)           = 1 (in [5])
recv(5, "Pblank\0CINSERT 843458117 1\0Z", 16384, 0) = 28
time([1037590769])                      = 1037590769
time([1037590769])                      = 1037590769


-------------------------------------------------------
This sf.net email is sponsored by: To learn the basics of securing 
your web site with SSL, click here to get a FREE TRIAL of a Thawte 
Server Certificate: http://www.gothawte.com/rd524.html




More information about the Users mailing list