HTTP 500 Response and 'Unable to parse SOAP document' Causes MID Server Issues from Past Failed SOAP Parsing


Description

If at some point in the past the response XML from an outbound SOAP call (from the instance via MID server) failed to be processed or parsed by the instance, the MID server may keep trying to resend that response XML to the instance over and over again. This can cause issues on the MID server, including slower or delayed processing of current ECC queue entries. 

If you're affected by this issue you'll see something similar to this in the MID Server Agent logs, usually repeated over and over:

ECCSender.1 Attempt to send ecc_queue.ab515820db871c1064c4de4dd39619e0.80037.xml failed: file remains enqueued for later sending
ECCSender.1 Sending ecc_queue.a5ae18e8db4f1050e2630d1ed3961940.79998.xml
ECCSender.1 DEBUG: HTTPClient.registerOtherProtocols() starting on Thread Thread[ECCSender.1,5,main].
ECCSender.1 WARNING *** WARNING *** Method failed: (https://<instance-name>.service-now.com/ecc_queue.do?SOAP&displayvalue=all&redirectSupported=true)
HTTP/1.1 500 Server Error with code: 500
ECCSender.1 WARNING *** WARNING *** Unable to parse SOAP document
ECCSender.1 WARNING *** WARNING *** RemoteGlideRecord failed to send data to https://<instance-name>.service-now/ with (Unable to parse SOAP document)

The instance app node/localhost log will also show a similar but more detailed error for the request. e.g.:

API_INT-thread-2 <session> txid=45945ae21bd9 *** Start #130180 /ecc_queue.do, user: mid_server_user
API_INT-thread-2 <session> txid=45945ae21bd9 WARNING *** WARNING *** org.xml.sax.SAXParseException; lineNumber: 331; columnNumber: 942; Invalid byte 2 of 4-byte UTF-8 sequence.
API_INT-thread-2 <session> txid=45945ae21bd9 WARNING *** WARNING *** SOAP Fault: <SOAP-ENV:Envelope xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/"><SOAP-ENV:Header/><SOAP-ENV:Body><SOAP-ENV:Fault><faultcode>SOAP-ENV:Server</faultcode><faultstring>Unable to parse SOAP document</faultstring><detail>Error completing SOAP request</detail></SOAP-ENV:Fault></SOAP-ENV:Body></SOAP-ENV:Envelope>
API_INT-thread-2 <session> txid=45945ae21bd9 tx_pattern_hash=-59168894 *** End #130180 /ecc_queue.do, user: mid_server_user, total time: 0:00:00.147, processing time: 0:00:00.147, SQL time: 0:00:00.000 (count: 6), source: xxxx , type:soap, method:insert, api_name:SOAP APIs, resource:ecc_queue.do, user_id:8c241a9edb5d2f4014b38e68689619aa, response_status:500

Cause

The most likely cause of this issue is that integrations probes run in the recent past have 'bad' data in the ecc_queue input field values. e.g. Outbound REST/SOAP, LDAP/JDBC Imports. This could be unicode character encoding, XML format, unescaped XML entities, or any other reason that means the SOAP Table API code cannot parse the data and so can't insert the record in the ecc_queue tbale.

Even if the SOAP Table API can insert the ecc_queue record, the transaction could still fail with error 500 if Business Rules running for that insert, usually 'sensor' business rules, crash, have an exception, cause looping/recursion that aborts the transaction, or even deliberately aborts the transaction. These could be before or after business rules (async won't break the insert).

The only way to know the cause for sure is to check the instance app node logs for the API_INT semaphore transaction that handled the request. This will say if a particular problem with the XML is detected, or if perhaps a business rule is interfering with the insert. Only that will allow a specific known problem to be linked to the support case, or a solution found if the cause is custom code.

For SAXParseException, the app node logs may give you a row and column of the data that is causing the problem. e.g. "SAXParseException; lineNumber: 331; columnNumber: 942;". Opening the XML file in a web browser may show extended unicode characters, such as Emojis in the data.

If a business rule is the cause, or payload size is the issue, the name is usually given in the mid server agent log. e.g. 
ECCSender.1 WARNING *** WARNING *** Request body exceeded max allowed content length
ECCSender.1 WARNING *** WARNING *** Error executing business rule 'Process LDAP Listener on MID changes'

Resolution

On the MID Server host look in the directory <MID server installation path>/agent/work/monitors/ECCSender/output_1 for files named something similar to this: ecc_queue.66ae58eedba0e45064c4de4dd39619f6.1.xml

The file format is exactly the same as if you had exported a record from the eccc_queue table as XML. This is the format that the instance SOAP Table API takes in.

The ECC_Sender folders are processed in order, and that's especially true of the output_s (sequential) folder used for JDBCProbe and LDAPProbe, so the problem will be with the contents of the oldest file(s).

You will need to move the 'bad' records out of the ecc_sender folder. This will cause data loss, because this ecc_queue input record will never be inserted into the instance ecc_queue, so it is important to be sure you don't move more than you have to.

There are several output folders, depending on the ecc_queue priority (0-2), or for sequential jobs, in /agent/work/monitors/ECCSender/. The blockage could be in any of these, but from experience output_s is the likely one.

Look at the modified date/time of the files, and find the oldest file, or files if there were several within the same second. Move these to another directory (be sure to move the backed up directories/files outside of the <MID server installation path>). You don't want to touch new files (last few minutes up to the last hour) as they may be being used for currently active ECC queue processing. 

You can then review the files you moved (the older ones) and see if they contain issues like:

The fact that these files were there means they couldn't be processed/parsed by the instance. You would then need to work on the relevant integration on the instance, for example changing the calls it makes to the external SOAP API to avoid character encoding issues. Searching the knowledge base for known problems with specific business urle, or fixing custom business rules.

To gauge the impact of loosing these ecc_queue inputs, you will need to use the field value to identify which feature the record is for:
KB0727132 How to link an ECC Queue record back to a specific Feature or Job

Additional Information

Note: Until the San Diego release, the MID Server queue processing in general can be blocked by this. That fix, which avoids the blocking by moving the bad XML files to an agent\work\monitors\ECCSender\output_error automatically, does not identify or solve any of the possible root causes. This KB remain very relevant.
PRB1521761 / ECCSender can fail to insert inputs into the ECC Queue for many reasons, blocking all inputs from then on, leaving MID Server effectively Down