bischeck suddenly stops working

Anders Håål anders.haal at ingby.com
Wed Aug 9 08:16:11 CEST 2017


Francesco - any progress on the issue?


On 07/26/2017 05:52 PM, Anders Håål wrote:
>
> Thanks for the feedback.
>
> When bischeck "stop working" it would be interesting to understand if 
> anything gets logged after it "stops" and also what is logged when you 
> do a restart - but I suggest you do a stop and see what is logged 
> before starting.
>
> I would suggest that you change the log level in logback.xml for all 
> packages
>
>  <root level="INFO">
>     <appender-ref ref="bischeck"/>
>   </root>
>
> To avoid duplicates you should also add the additivity="false" on the 
> other logger. Based on the standard logback.xml you can test this in 
> your test environment first, have not tested it my self, and if it 
> looks good deploy in in production according to your specific 
> customization of paths, etc.
>
>
> logback.xml:
>
> <?xml version="1.0" encoding="UTF-8"?>
>
> <configuration>
>   <jmxConfigurator />
>   <appender name="bischeck" 
> class="ch.qos.logback.core.rolling.RollingFileAppender">
>     <!--See also 
> http://logback.qos.ch/manual/appenders.html#RollingFileAppender-->
>     <File>/var/tmp/bischeck.log</File>
>     <encoder>
>       <pattern>%d{yyyy-MM-dd HH:mm:ss.SSS,Europe/Stockholm} ; %p ; %t 
> ; %c ; %m%ex%n</pattern>
>     </encoder>
>
>     <rollingPolicy 
> class="ch.qos.logback.core.rolling.FixedWindowRollingPolicy">
>       <maxIndex>3</maxIndex>
> <FileNamePattern>/var/tmp/bischeck.log.%i</FileNamePattern>
>     </rollingPolicy>
>
>     <triggeringPolicy 
> class="ch.qos.logback.core.rolling.SizeBasedTriggeringPolicy">
>       <MaxFileSize>1000KB</MaxFileSize>
>     </triggeringPolicy>
>
>   </appender>
>
>   <logger name="com.ingby" level="INFO" additivity="false">
>     <appender-ref ref="bischeck"/>
>   </logger>
>
>
>   <logger name="com.ingby.socbox.bischeck.configuration.CachePurgeJob" 
> level="DEBUG" additivity="false">
>     <appender-ref ref="bischeck"/>
>   </logger>
>
>   <logger name="com.ingby.socbox.bischeck.cache.provider.redis" 
> level="DEBUG" additivity="false">
>     <appender-ref ref="bischeck"/>
>   </logger>
>
>
>   <logger name="org.quartz" level="INFO" additivity="false">
>     <appender-ref ref="bischeck"/>
>   </logger>
>
>   <root level="WARN">
>     <appender-ref ref="bischeck"/>
>   </root>
>
> </configuration>
>
>
> The root section will secure that everything from any java packages 
> with WARN or ERROR is logged to the bischeck appender.
> Regards
> Anders
>
> On 07/25/2017 09:55 AM, Francesco Giuseppe Toffoli wrote:
>>
>> Hi Anders,
>> thanks for your reply. I'll answer you to the variuos questions:
>>
>> (1) the java version is:
>>
>> openjdk version "1.8.0_91"
>> OpenJDK Runtime Environment (build 1.8.0_91-b14)
>> OpenJDK 64-Bit Server VM (build 25.91-b14, mixed mode)
>>
>> and has not been updated recently. In our test environment, (where 
>> the problem does not occur), the version is nearly the same (1.8.0_121).
>> The OS has not been updated, (CentOS release 6.6).
>>
>> (2) Redis has not been uptaded recently, (redis 2.8.23). At the 
>> moment we have more or less 13.000 keys used.
>>
>> (3) We usually add checks, maybe weekly. The issue started to occur 
>> some months ago, but it could happen that for 2 or 3 weeks everything 
>> is ok,  then we have several crashes in a week. I'm not so inclined 
>> to give the guilt to some new checks, also because the testing server 
>> is aligned to the production one.
>>
>>
>> (5) Yes, the restart is done via '/etc/init.d/bischeckd restart' and 
>> it solves the issue. Physical memory on the server is always OK, i 
>> don't think to a jvm out of memory.
>>
>> In the Bischeck logs i didn't notice any error. However, at the next 
>> crash i'll try have a deeper look at them.
>> Could i have a look at some other logs maybe?
>>
>> Thanks,
>> Francesco
>>
>>
>>
>>
>>
>> Il 24/07/2017 21:57, Anders Håål ha scritto:
>>>
>>> Hi Giuseppe,
>>>
>>> Sounds strange that it just stopped working after along time of 
>>> stability if not something has change:
>>>
>>> - Anything change on the server you run bischeck on - OS, jdk 
>>> version, ......
>>>
>>> - Update redis version? Change in configuration?
>>>
>>> - Added any new bischeck check or changed something in the 
>>> configuration?
>>>
>>> - Anything else you can think about that may have change?
>>>
>>> When you say restarting is it the normal /etc/init.d/bischeckd 
>>> restart that fix the problem? The reason I ask is that the script 
>>> just do a kill with TERM signal. If the jvm would be in a out of 
>>> memory situation it may not be enough, but you should have seen that 
>>> in the log I guess. Sure you do not have any ERROR or WARN entries 
>>> in the log.
>>>
>>> /Anders
>>>
>>>
>>>
>>> On 07/24/2017 02:14 PM, Francesco Giuseppe Toffoli wrote:
>>>>
>>>> Hi,
>>>> we are experiencing a critical problem with Bischeck. It's a couple 
>>>> of months it sometimes suddenly stops working: the daemon  
>>>> /etc/init.d/bicheckd is running but no check results are sent to 
>>>> Nagios. Restarting bischeck daemon fixes the issue.
>>>> Unfortunately we can't find any clue about the root cause on 
>>>> bischeck logs, not even with DEBUG logging level enabled. Redis 
>>>> database seems working properly  and no increasing of memory/cpu 
>>>> usage are reported on the server hosting bischeck while the issue 
>>>> occurs.
>>>>
>>>> Do you have any suggestion on how to deeply investigate this?
>>>>
>>>> Regards,
>>>> Francesco
>>>>
>>>> -- 
>>>>
>>>> Francesco Giuseppe Toffoli
>>>> Monitoring Engineer
>>>>
>>>> GSE Department
>>>>
>>>> Tel: +39 01127387488
>>>>
>>>> Mobile: +39 349.800.60.35
>>>> Email: _ftoffoli at skylogic.it <mailto:ftoffoli at skylogic.it>_
>>>> *
>>>> **Skylogic S. p. A.*
>>>> Strada Pianezza, 289
>>>> 10151 Torino, Italy
>>>>
>>>>
>>>>
>>>> This message contains confidential information and is intended only 
>>>> for the individual named. If you are not the named addressee you 
>>>> should not disseminate, distribute or copy this e-mail. Please 
>>>> notify the sender immediately by e-mail if you have received this 
>>>> e-mail by mistake and delete this e-mail from your system. E-mail 
>>>> transmission cannot be guaranteed to be secure or error-free as 
>>>> information could be intercepted, corrupted, lost, destroyed, 
>>>> arrive late or incomplete, or contain viruses. The sender therefore 
>>>> does not accept liability for any errors or omissions in the 
>>>> contents of this message, which arise as a result of e-mail 
>>>> transmission. If verification is required please request a 
>>>> hard-copy version. Please note that any views or opinions presented 
>>>> in this email are solely those of the author and do not necessarily 
>>>> represent those of the Company.
>>>> No employee or agent is authorized to conclude any binding 
>>>> agreement on behalf of this Company nor, through this latter, any 
>>>> of the Eutelsat Communication group with another party by email 
>>>> without express written confirmation by a duly authorized officer 
>>>> of the Company. The list of duly authorized officers and the scope 
>>>> of their powers is published on the Trade Register according to the 
>>>> national law of each affiliate.
>>>
>>> -- 
>>>
>>>
>>> Ingby<http://www.ingby.com>
>>>
>>> bischeck - dynamic and adaptive monitoring for Nagios<http://www.bischeck.org>
>>>
>>> anders.haal at ingby.com<mailto:anders.haal at ingby.com>
>>>
>>> Mjukvara genom ingenjörsmässig kreativitet och kompetens
>>>
>>> Ingenjörsbyn
>>> Box 531
>>> 101 30 Stockholm
>>> Sweden
>>> www.ingby.com  <http://www.ingby.com/>
>>> Mobil: +46 70 575 35 46
>>> Tele: +46 75 75 75 090
>>> Fax:  +46 75 75 75 091
>>
>> -- 
>>
>> Francesco Giuseppe Toffoli
>> Monitoring Engineer
>>
>> GSE Department
>>
>> Tel: +39 01127387488
>>
>> Mobile: +39 349.800.60.35
>> Email: _ftoffoli at skylogic.it <mailto:ftoffoli at skylogic.it>_
>> *
>> **Skylogic S. p. A.*
>> Strada Pianezza, 289
>> 10151 Torino, Italy
>>
>>
>>
>> This message contains confidential information and is intended only 
>> for the individual named. If you are not the named addressee you 
>> should not disseminate, distribute or copy this e-mail. Please notify 
>> the sender immediately by e-mail if you have received this e-mail by 
>> mistake and delete this e-mail from your system. E-mail transmission 
>> cannot be guaranteed to be secure or error-free as information could 
>> be intercepted, corrupted, lost, destroyed, arrive late or 
>> incomplete, or contain viruses. The sender therefore does not accept 
>> liability for any errors or omissions in the contents of this 
>> message, which arise as a result of e-mail transmission. If 
>> verification is required please request a hard-copy version. Please 
>> note that any views or opinions presented in this email are solely 
>> those of the author and do not necessarily represent those of the 
>> Company.
>> No employee or agent is authorized to conclude any binding agreement 
>> on behalf of this Company nor, through this latter, any of the 
>> Eutelsat Communication group with another party by email without 
>> express written confirmation by a duly authorized officer of the 
>> Company. The list of duly authorized officers and the scope of their 
>> powers is published on the Trade Register according to the national 
>> law of each affiliate.
>
> -- 
>
>
> Ingby<http://www.ingby.com>
>
> bischeck - dynamic and adaptive monitoring for Nagios<http://www.bischeck.org>
>
> anders.haal at ingby.com<mailto:anders.haal at ingby.com>
>
> Mjukvara genom ingenjörsmässig kreativitet och kompetens
>
> Ingenjörsbyn
> Box 531
> 101 30 Stockholm
> Sweden
> www.ingby.com  <http://www.ingby.com/>
> Mobil: +46 70 575 35 46
> Tele: +46 75 75 75 090
> Fax:  +46 75 75 75 091

-- 


Ingby <http://www.ingby.com>

bischeck - dynamic and adaptive monitoring for Nagios <http://www.bischeck.org>

anders.haal at ingby.com<mailto:anders.haal at ingby.com>

Mjukvara genom ingenjörsmässig kreativitet och kompetens

Ingenjörsbyn
Box 531
101 30 Stockholm
Sweden
www.ingby.com <http://www.ingby.com/>
Mobil: +46 70 575 35 46
Tele: +46 75 75 75 090
Fax:  +46 75 75 75 091

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www.monitoring-lists.org/archive/bischeck-users/attachments/20170809/4b43fc5f/attachment-0001.html>


More information about the Bischeck-users mailing list