Regarding Trends status after Network Outage

BOLLENGIER Eric ebollengier at sigma.fr
Thu Dec 2 11:04:59 CET 2004

Previous message: Antwort: Nag 2.0 ignoring notification_interval? [Virus scanned]
Next message: memtracker
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Hi,

I have the same bug (nagios 1.2), in a race condition (after a host
reboot).

ssh down -> reboot -> host up -> ssh down -> ssh up

[1099042385] SERVICE ALERT: test;ssh;CRITICAL;SOFT;1;Connection refused
[1099042445] SERVICE ALERT: test;ssh;CRITICAL;SOFT;2;Socket timeout
[1099042525] SERVICE ALERT: test;ssh;CRITICAL;HARD;3;Socket timeout
[1099042715] HOST ALERT: test;DOWN;SOFT;1;CRITICAL
[1099042725] HOST ALERT: test;DOWN;SOFT;2;CRITICAL
[1099042735] HOST ALERT: test;DOWN;SOFT;3;CRITICAL
[1099042745] HOST ALERT: test;DOWN;SOFT;4;CRITICAL
[1099042755] HOST ALERT: test;DOWN;HARD;5;CRITICAL
[1099042755] SERVICE ALERT: test;ping;CRITICAL;HARD;1;CRITICAL
[1099042935] HOST ALERT: test;UP;HARD;1;PING OK
[1099042935] SERVICE ALERT: test;ping;OK;HARD;1;PING OK
[1099042945] SERVICE ALERT: test;ssh;CRITICAL;SOFT;1;Socket timeout
[1099043005] SERVICE ALERT: test;ssh;OK;SOFT;2;TCP OK

====> BUG ssh is in CRITICAL HARD STATE, but OK is SOFT !!

[1099043265] SERVICE ALERT: test;ssh;CRITICAL;SOFT;1;Socket timeout
[1099043335] SERVICE ALERT: test;ssh;CRITICAL;SOFT;2;Socket timeout
[1099043395] SERVICE ALERT: test;ssh;CRITICAL;HARD;3;Socket timeout
[1099043475] HOST ALERT: test;DOWN;SOFT;1;CRITICAL
[1099043485] HOST ALERT: test;DOWN;SOFT;2;CRITICAL
[1099043495] HOST ALERT: test;DOWN;SOFT;3;CRITICAL
[1099043505] HOST ALERT: test;DOWN;SOFT;4;CRITICAL
[1099043515] HOST ALERT: test;DOWN;HARD;5;CRITICAL
[1099043565] SERVICE ALERT: test;ping;CRITICAL;HARD;1;CRITICAL
[1099043715] HOST ALERT: test;UP;HARD;1;PING OK
[1099043715] SERVICE ALERT: test;ping;OK;HARD;1;PING OK
[1099043745] SERVICE ALERT: test;ssh;CRITICAL;SOFT;1;Socket timeout
[1099043815] SERVICE ALERT: test;ssh;CRITICAL;SOFT;2;Socket timeout
[1099043865] SERVICE ALERT: test;ssh;OK;HARD;3;TCP OK

=====> hier it's ok, because ssh goes up after 2 test

If you want look this bug in your nagios log file, you could use
my simple perl script (see attachment)

PS :
to use it

for i in nagios-*2004*
do
	./mayday_bug_trends.pl $i
done

Regards

Le jeudi 02 décembre 2004 à 10:05 +0530, Nilesh a écrit :
> Dear All,
> 
> I have noticed a strange behaviour of Trends in nagios.
> I'm using nagios-1.2
> 
> When ever there is a network outage, It is updating information 
> immediately for the same.
> After Recover of network connectivity all host check and service checks 
> are getting checked and updating information
> for availability of hosts and services. But many times Trends keeps on 
> continuin with either "HOST UNREACHABLE" status  and services with 
> "CRITICAL" status.
> 
> In such  cases when i reboots nagios server then it is recovering it , 
> but it is not a solution.
> 
> So how to resolve this problem.
> What i want is, as soon as host &/OR service check get success after 
> network outage, Trends Must get update immediately.
> 
> Waiting For Reply
> With regards
> 
> Linux Admin
> 

-- 
Eric BOLLENGIER, Administrateur Système - Poste 1325
SIGMA Informatique http://www.sigma.fr
3 rue Newton, BP 4127, 44241 La Chapelle sur Erdre Cedex
tel : 02.40.37.14.00
-------------- next part --------------
A non-text attachment was scrubbed...
Name: mayday_bug_trends.pl
Type: application/x-perl
Size: 1172 bytes
Desc: not available
URL: <https://www.monitoring-lists.org/archive/developers/attachments/20041202/56bb5f0f/attachment.bin>

Previous message: Antwort: Nag 2.0 ignoring notification_interval? [Virus scanned]
Next message: memtracker
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the Developers mailing list