mid_server.down events are generated even if the mid server is not actually down


Description

MID Server down events (mid_server.down) are generated even if the mid servers did not go down during that time.

Looking at the ECC Queue during the generation of these events, It is observed that there is no response (input) for the Heartbeat output probe for continuously for two times. 

So the mid_server.down event is generated thinking that the mid server is down.

mid_server.up is generated back once the HeartBeat response is received for the Heartbeat output probe.



Release or Environment

Orlando and above versions

Cause

 It looks like the JVM is getting locked and this could slow the process and eventually the slow response for the requests from the MID which includes the Heartbeat probe.

Below is the log from agent logs:


12/28/20 11:18:27 (168) glide.lock.cleaner WARNING *** WARNING *** Freed stuck JVM lock: OCSPCheckedCertificateCache
12/28/20 11:18:27 (168) ECCQueueMonitor.40 DEBUG: HTTPClient.registerOtherProtocols() starting on Thread Thread[ECCQueueMonitor.40,5,main].
12/28/20 11:18:32 (185) glide.lock.cleaner WARNING *** WARNING *** Freed stuck JVM lock: OCSPCheckedCertificateCache
12/28/20 11:18:42 (986) RefreshMonitor.65 DEBUG: HTTPClient.registerOtherProtocols() starting on Thread Thread[RefreshMonitor.65,5,main].
12/28/20 11:18:45 (970) LogStatusMonitor.60 stats threads: 156, memory max: 910.0mb, allocated: 600.0mb, used: 151.0mb, standard.queued: 0 probes, standard.processing: 0 probes, expedited.queued: 0 probes, expedited.processing: 0 probes, interactive.queued: 32 probes, interactive.processing: 10 probes
12/28/20 11:18:48 (981) ECCQueueMonitor.40 DEBUG: HTTPClient.registerOtherProtocols() starting on Thread Thread[ECCQueueMonitor.40,5,main].

Resolution

To make sure that there are no OCSP issues we can ty something below: This will disable the OCSP.


a) Navigate to MID Server -> Properties. Find "com.glide.communications.httpclient.verify_revoked_certificate" and set value to false
b) Navigate to MID Server -> Properties. Find "mid.security.validation.endpoints" and clear the value so it is empty.
c) Restart the MID Server.