ECCSender can fail to insert inputs into the ECC Queue for many reasons, blocking all inputs from then on, leaving MID Server effectively Down


Description

The MID Server's ECCSender thread can get blocked and leave the MID Server effectively Down, as although new jobs will be picked up and run, none of the results will be returned to the instance.
The MID Server agent log will show status 500 for the ECCSender SOAP request to the ecc_queue. The instance app node logs will confirm this, and usually give more details of the exact cause.

This could be due to many root causes such as:
- PRB1502013 ECCSender can fail to insert inputs into the ECC Queue due to Invalid byte 2 of 4-byte UTF-8 sequence
- PRB1487514 'Discovery - Update device started count' BR throws unhandled exception on empty ECC queue payload
- A problem with the SOAP table API of the instance
- Before insert Business Rules on ecc_queue
- An after insert Business Rules on ecc_queue, which even though inserted will still give 500 status for the insert, so MID Server assumes it wasn't inserted
- Unicode in the payload
- large payloads (glide.soap.max_inbound_content_length related)
- and others...

The MID Server should not allow individual payloads to block other payloads queued behind it.

NOTE: If a problem ticket exists for a specific root cause, then that should be linked to cases instead. Currently unknown causes, or custom sensors or other causes not in the control of ServiceNow can be linked.

Steps to Reproduce

Various causes.

The MID Server agent log will have something similar to. The key is Error with code: 500 or similar from the ECCSender thread to the instance. PRB1502013 will look like this.
07/13/21 09:34:13 (995) ECCSender.1 WARNING *** WARNING *** Method failed: (https://<instance>.service-now.com/ecc_queue.do?SOAP&displayvalue=all&redirectSupported=true)HTTP/1.1 500 Internal Server Error with code: 500
07/13/21 09:34:13 (995) ECCSender.1 WARNING *** WARNING *** Unable to parse SOAP document
07/13/21 09:34:13 (995) ECCSender.1 WARNING *** WARNING *** RemoteGlideRecord failed to send data to https://<instance>.service-now.com/ with (Unable to parse SOAP document)
07/13/21 09:34:13 (995) ECCSender.1 WARNING *** WARNING *** MIDRemoteGlideRecord.insert failed, retrying in 22 seconds

Size related
09/09/20 16:42:28 (271) ECCSender.1 WARNING *** WARNING *** Method failed: (https://<instance>.service-now.com/ecc_queue.do?SOAP&displayvalue=all&redirectSupported=true)HTTP/1.1 500 Internal Server Error with code: 500
09/09/20 16:42:28 (271) ECCSender.1 WARNING *** WARNING *** Request body exceeded max allowed content length
09/09/20 16:42:28 (271) ECCSender.1 WARNING *** WARNING *** RemoteGlideRecord failed to send data to https://<instance>.service-now.com/ with (Request body exceeded max allowed content length)

or a specific Business rule may be mentioned:
This specific symptom is PRB1487514:
03/31/21 13:04:35 (078) StartupSequencer WARNING *** WARNING *** Method failed: (https://<instance>.service-now.com/ecc_queue.do?SOAP&displayvalue=all&redirectSupported=true)HTTP/1.1 500 Internal Server Error with code: 500
03/31/21 13:04:35 (096) StartupSequencer WARNING *** WARNING *** Error executing business rule 'Discovery - Update device started count'
03/31/21 13:04:35 (097) StartupSequencer WARNING *** WARNING *** RemoteGlideRecord failed to send data to https://<instance>.service-now.com/ with (Error executing business rule 'Discovery - Update device started count')
03/31/21 13:04:35 (097) StartupSequencer WARNING *** WARNING *** MIDRemoteGlideRecord.insert failed, retrying in 22 seconds

2020-10-28 14:44:06 (824) ECCSender.1 WARNING *** WARNING *** Method failed: (https://<instance>.service-now.com/ecc_queue.do?SOAP&displayvalue=all&redirectSupported=true)HTTP/1.1 500 Internal Server Error with code: 500
2020-10-28 14:44:06 (825) ECCSender.1 WARNING *** WARNING *** Error executing business rule 'Process LDAP Listener on MID changes'
2020-10-28 14:44:06 (825) ECCSender.1 WARNING *** WARNING *** RemoteGlideRecord failed to send data to https://<instance>.service-now.com/ with (Error executing business rule 'Process LDAP Listener on MID changes')

06/12/21 16:02:52 (173) ECCSender.1 WARNING *** WARNING *** Method failed: (https://<instance>.service-now.com/ecc_queue.do?SOAP&amp;displayvalue=all&amp;redirectSupported=true)HTTP/1.1 500 Internal Server Error with code: 500
06/12/21 16:02:52 (282) ECCSender.1 WARNING *** WARNING *** Error executing business rule 'User Engagement: Operate attachments'
06/12/21 16:02:52 (282) ECCSender.1 WARNING *** WARNING *** RemoteGlideRecord failed to send data to https://<instance>.service-now.com/ with (Error executing business rule 'User Engagement: Operate attachments')
06/12/21 16:02:52 (282) ECCSender.1 SEVERE *** ERROR *** Error executing business rule 'User Engagement: Operate attachments'

04/22/19 12:01:10 (992) ECCSender.1 WARNING *** WARNING *** Method failed: (https://<instance>.service-now.com/ecc_queue.do?SOAP&displayvalue=all&redirectSupported=true)HTTP/1.1 500 Internal Server Error with code: 500
04/22/19 12:01:10 (994) ECCSender.1 WARNING *** WARNING *** com.glide.processors.soap.SOAPProcessingException: Insert Aborted : Operation against file 'ecc_queue' was aborted by Business Rule 'Restrict hthd_user to HTHD input only^ca97a55cdbc5ff407364fd431d961976'. Business Rule Stack:Restrict hthd_user to HTHD input only
04/22/19 12:01:10 (994) ECCSender.1 WARNING *** WARNING *** RemoteGlideRecord failed to send data to https://<instance>.service-now.com/ with (com.glide.processors.soap.SOAPProcessingException: Insert Aborted : Operation against file 'ecc_queue' was aborted by Business Rule 'Restrict hthd_user to HTHD input only^ca97a55cdbc5ff407364fd431d961976'. Business Rule Stack:Restrict hthd_user to HTHD input only)

10/05/21 08:58:09 (310) ECCQueueMonitor.1 WARNING *** WARNING *** Method failed: (https://<instance>.service-now.com/ecc_queue.do?SOAP&displayvalue=all&redirectSupported=true)HTTP/1.1 500 Internal Server Error with code: 500
10/05/21 08:58:09 (451) ECCQueueMonitor.1 WARNING *** WARNING *** Error executing business rule 'Add Limited Resources'
10/05/21 08:58:09 (451) ECCQueueMonitor.1 WARNING *** WARNING *** RemoteGlideRecord failed to send data to https://<instance>.service-now.com/ with (Error executing business rule 'Add Limited Resources')

or an Abort action
03/16/18 23:45:06 (367) ECCSender.1 WARNING *** WARNING *** Method failed: (https://<instance>.service-now.com/ecc_queue.do?SOAP&amp;displayvalue=all&amp;redirectSupported=true)HTTP/1.1 500 Internal Server Error with code: 500
03/16/18 23:45:06 (367) ECCSender.1 WARNING *** WARNING *** com.glide.processors.soap.SOAPProcessingException: Insert Aborted : Error during insert of ecc_queue (JavascriptProbe)
03/16/18 23:45:06 (367) ECCSender.1 WARNING *** WARNING *** RemoteGlideRecord failed to send data to https://<instance>.service-now.com/ with (com.glide.processors.soap.SOAPProcessingException: Insert Aborted : Error during insert of ecc_queue (JavascriptProbe))
03/16/18 23:45:06 (367) ECCSender.1 Attempt to send ecc_queue.0e1c7f26dbf4570820f268d35b9619fe.xml failed: file remains enqueued for later sending

The instance app node log for the API_INT transaction will explain in more detail exactly why that failed.

e.g. a unicode related issue perhaps:
2021-09-07 07:58:10 (550) API_INT-thread-3 02E83D4C1BA2F090426A437EAD4BCB9B txid=892971801be2 *** Start #3823406 /ecc_queue.do, user: admin
2021-09-07 07:58:10 (552) API_INT-thread-3 02E83D4C1BA2F090426A437EAD4BCB9B txid=892971801be2 WARNING *** WARNING *** org.xml.sax.SAXParseException; lineNumber: 12; columnNumber: 2; An invalid XML character (Unicode: 0x16) was found in the element content of the document.
2021-09-07 07:58:10 (560) API_INT-thread-3 02E83D4C1BA2F090426A437EAD4BCB9B txid=892971801be2 WARNING *** WARNING *** SOAP Fault: <SOAP-ENV:Envelope xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/"><SOAP-ENV:Header/><SOAP-ENV:Body><SOAP-ENV:Fault><faultcode>SOAP-ENV:Server</faultcode><faultstring>Unable to parse SOAP document</faultstring><detail>Error completing SOAP request</detail></SOAP-ENV:Fault></SOAP-ENV:Body></SOAP-ENV:Envelope>
2021-09-07 07:58:10 (563) API_INT-thread-3 02E83D4C1BA2F090426A437EAD4BCB9B txid=892971801be2 tx_pattern_hash=-59168894 *** End #3823406 /ecc_queue.do, user: admin, total time: 0:00:00.026, processing time: 0:00:00.026, SQL time: 0:00:00.003 (count: 5), source: 103.227.42.8 , type:soap, method:insert, api_name:SOAP APIs, resource:ecc_queue.do, user_id:6816f79cc0a8016401c5a33be04be441, response_status:500
2021-09-07 07:58:10 (565) API_INT-thread-3 02E83D4C1BA2F090426A437EAD4BCB9B txid=892971801be2 tx_pattern_hash=-59168894 Transaction #3823406 /ecc_queue.do - Entered Transaction.exit()

Workaround

This problem has been fixed. If you are able to upgrade, review the Fixed In or Intended Fix Version fields to determine whether any versions have a planned or permanent fix.

NOTE: This is only fixing the ECCSender "blocking" issue, and not solving any of the root cases of why any particular payload can't be sent back to the instance. Those will still fail to be sent back to the instance, breaking those jobs. The payloads that can't be handled will be moved to agent\work\monitors\ECCSender\output_error folder, and a MID Server Issues record will be created.

Ganeral workaround until you can upgrade to a fixed version:

The oldest records in the ECCSender output folders are likely to be the 'bad' ones with the content that is causing the problem. The MID Server gets stuck on those records due to this problem, and doesn't then process the later 'good' records. You may need to move some of the oldest xml files out of the output folders to allow the rest to be sent back to the instance.

Once the ECCSender thread has worked through the backlog, and left the output folders empty, the issue is temporarily solved.

Workarounds for specific root causes can be seen in:
- PRB1502013 ECCSender can fail to insert inputs into the ECC Queue due to Invalid byte 2 of 4-byte UTF-8 sequence
- PRB1487514 'Discovery - Update device started count' BR throws unhandled exception on empty ECC queue payload


Related Problem: PRB1521761