Bug report: downtimes beyond 2038 cause event queue errors

Ton Voon ton.voon at opsview.com
Thu Apr 4 18:32:36 CEST 2013


Hi!

We've come across a problem in an upgrade of Nagios 3 to Nagios 4 which we can't work out where the fix is. It occurs when an event is scheduled in the future beyond 2038.

Recreation steps:
  * Set a downtime on a service to end next day
  * Stop Nagios
  * Edit the retention.dat so that the end_date=4514791088 (some other values seem to work)
  * Start Nagios

When Nagios starts, it will not run any scheduled events in the events queue.

This fails on CentOS 5 64bit, though appears to work on Debian Squeeze 32bit, so it maybe a 64 bit only issue.

We think this is an issue when the event is scheduled via squeue_add(). We've managed to get the test-squeue to fail by changing the time value to be greater than 2038 with the following:

Index: test-squeue.c
===================================================================
--- test-squeue.c	(revision 2716)
+++ test-squeue.c	(working copy)
@@ -116,7 +116,7 @@
 	sq_test_random(sq);
 	t(squeue_size(sq) == 0, "Size should be 0 after first sq_test_random");
 
-	t((a.evt = squeue_add(sq, time(NULL) + 9, &a)) != NULL);
+	t((a.evt = squeue_add(sq, time(NULL)*2, &a)) != NULL);
 	t(squeue_size(sq) == 1);
 	t((b.evt = squeue_add(sq, time(NULL) + 3, &b)) != NULL);
 	t(squeue_size(sq) == 2);

This gives the test result of:

### squeue tests
  FAIL max <= *d @test-squeue.c:86
  FAIL x == &b @test-squeue.c:133
  FAIL x->id == b.id @test-squeue.c:134
  FAIL x == &c @test-squeue.c:141
about to fail pretty fucking hard...
ea: 0xbfe065e0; &b: 0xbfe065d8; &c: 0xbfe065d0; ed: 0xbfe065c8; x: 0xbfde9b80
  FAIL x == &b @test-squeue.c:152
  FAIL x->id == b.id @test-squeue.c:153
  FAIL x == &b @test-squeue.c:160
  FAIL x->id == b.id @test-squeue.c:161
  FAIL x == &c @test-squeue.c:166
  FAIL x->id == c.id @test-squeue.c:167
Test results: 390637 passed, 10 failed

Changing to a factor of 1.1 instead of 2 passes:

### squeue tests
Test results: 390647 passed, 0 failed

This worked in Nagios 3, so we're guessing that the change to use the squeue library for events is probably where this limitation has come in.

Any thoughts?

Ton


------------------------------------------------------------------------------
Minimize network downtime and maximize team effectiveness.
Reduce network management and security costs.Learn how to hire 
the most talented Cisco Certified professionals. Visit the 
Employer Resources Portal
http://www.cisco.com/web/learning/employer_resources/index.html




More information about the Developers mailing list