[Nagios-users] external commands and segfault -- again

bobi at netshel.net bobi at netshel.net
Mon Jan 8 20:24:51 CET 2007


Hi David,

Would you mind printing the contents of the timed_event structure under
consideration?

You'd have to move the stack frame of the event_execution_loop() and type
"print *temp_event"

The output I'm looking for should look something similar to:

(gdb) up
#5  0x00000000004220ab in event_execution_loop () at events.c:964
(gdb) print *temp_event
$3 = {event_type = 9, run_time = 1167483600, recurring = 0, event_interval
= 0, compensate_for_time_change = 0, timing_func = 0x0, event_data =
0x2a9623f5f0, event_args = 0x0, next = 0x2a96235010}

I'd also be curious to see the contents of the associated
scheduled_downtime structure:

(gdb)  print *(scheduled_downtime *)(temp_event->event_data)
$4 = {type = -1776069696, host_name = 0x2a96200688 "x\006 \226*",
service_description = 0x4800204800000000 <Address 0x4800204800000000 out
of bounds>, entry_time = 0,
  start_time = 3684075530346299393, end_time = 0, fixed = 11760848,
triggered_by = 0, duration = 182907410432, downtime_id = 2641, author =
0x2a962353c0 "owntime",
  comment = 0x2a96200688 "x\006 \226*", comment_id = 8011462211995332461,
is_in_effect = 1830905714, start_flex_downtime = 1869443695,
incremented_pending_downtime = 1919889006,
  is_restart = 1768303975, next = 0x2d6f666e692d646e}
(gdb)


This will help me determine if we have the same segfault condition and, if
we do, may yield some helpful clues in debugging the cause.

Thanks,
Bob

>> > 2. I've had this problem with Nagios 2.4, 2.5 and 2.6.  So,
>> "upgrading"
>> > hasn't gotten rid of it.
>> >
>> > 3. We are currently running Nagios 2.6 on a 64-bit Linux platform:
>> > SLES-9 x86-64, Kernel 2.6.5-7.267-smp
>> >
>>
>> This is the culprit, I guess. As this isn't a widespread
>> problem, I wouldn't be surprised if it's related to 64-bit
>> archs (kernel-2.6.5 is fairly ancient too, but that shouldn't
>> matter as this is the only app you're seeing it in).
>>
>> I'm guessing this actually is an SMP-system and that SuSE
>> doesn't install SMP kernels on all systems, correct? If so,
>> this could also be a source of problem for you. Nagios
>> doesn't follow the pthread guidelines very closely and does
>> some pretty inappropriate things post-fork() for being a
>> threaded application. This could be one of those problems
>> that doesn't happen on single-cpu systems because the only
>> cpu doesn't have anything to compete with when racing for the memory.
>>
>
> I've seen this problem on every platform I've used, including z/os,
> 32-bit 64-bit, sles, RedHat...
>
> I'll admit that the problems with the older platforms and Nagios
> versions (v1.2 and up) may have been different though they all appeared
> to fail the same to me. My first guess was SLES threading so I ported to
> RH and still no happiness.
>



-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV




More information about the Developers mailing list