Bug report: downtimes beyond 2038 cause event queue errors

Ton Voon ton.voon at opsview.com
Thu Apr 4 18:32:36 CEST 2013


We've come across a problem in an upgrade of Nagios 3 to Nagios 4 which we can't work out where the fix is. It occurs when an event is scheduled in the future beyond 2038.

Recreation steps:
  * Set a downtime on a service to end next day
  * Stop Nagios
  * Edit the retention.dat so that the end_date=4514791088 (some other values seem to work)
  * Start Nagios

When Nagios starts, it will not run any scheduled events in the events queue.

This fails on CentOS 5 64bit, though appears to work on Debian Squeeze 32bit, so it maybe a 64 bit only issue.

We think this is an issue when the event is scheduled via squeue_add(). We've managed to get the test-squeue to fail by changing the time value to be greater than 2038 with the following:

Index: test-squeue.c
--- test-squeue.c	(revision 2716)
+++ test-squeue.c	(working copy)
@@ -116,7 +116,7 @@
 	t(squeue_size(sq) == 0, "Size should be 0 after first sq_test_random");
-	t((a.evt = squeue_add(sq, time(NULL) + 9, &a)) != NULL);
+	t((a.evt = squeue_add(sq, time(NULL)*2, &a)) != NULL);
 	t(squeue_size(sq) == 1);
 	t((b.evt = squeue_add(sq, time(NULL) + 3, &b)) != NULL);
 	t(squeue_size(sq) == 2);

This gives the test result of:

### squeue tests
  FAIL max <= *d @test-squeue.c:86
  FAIL x == &b @test-squeue.c:133
  FAIL x->id == b.id @test-squeue.c:134
  FAIL x == &c @test-squeue.c:141
about to fail pretty fucking hard...
ea: 0xbfe065e0; &b: 0xbfe065d8; &c: 0xbfe065d0; ed: 0xbfe065c8; x: 0xbfde9b80
  FAIL x == &b @test-squeue.c:152
  FAIL x->id == b.id @test-squeue.c:153
  FAIL x == &b @test-squeue.c:160
  FAIL x->id == b.id @test-squeue.c:161
  FAIL x == &c @test-squeue.c:166
  FAIL x->id == c.id @test-squeue.c:167
Test results: 390637 passed, 10 failed

Changing to a factor of 1.1 instead of 2 passes:

### squeue tests
Test results: 390647 passed, 0 failed

This worked in Nagios 3, so we're guessing that the change to use the squeue library for events is probably where this limitation has come in.

Any thoughts?


Minimize network downtime and maximize team effectiveness.
Reduce network management and security costs.Learn how to hire 
the most talented Cisco Certified professionals. Visit the 
Employer Resources Portal

More information about the Developers mailing list