MID Server ECCSender thread gives "Invalid byte 2 of 4-byte UTF-8 sequence" error, blocking sending of valid ecc_queue inputs to instanceDescriptionThis is most likely to be seen in LDAP user imports, or Import Set JDBC Data Sources, where large data is involved, which is likely to include extended Unicode characters such as Emojis. That is when the Apache Xerces XML parser bug causing this becomes relevant:https://issues.apache.org/jira/browse/XERCESJ-1668 To be this problem, the instance must be Vancouver or later, and this error see in the ECCSender thread, of a MID Server' agent log: Invalid byte 2 of 4-byte UTF-8 sequenceOther ECCSender XML parser errors, or instance side errors, are not this problem. For LDAP probes, Emojis have been seen in OU/CN Group names, and user details data. For REST integrations synching incidents, emojis have been seen in comments of cases. They could potentially appear anywhere, for any feature's probe result data.Steps to Reproduce Install a MID Server and Validate it (Used Windows 10, and Xanadu Patch 4 for this test)Configure these to prove they don't make any difference Set mid server parameter glide.util.xml.transformer.handle.utf16_surrogate_pairs=falseedit wrapper-override.conf to add wrapper.java.additional.501=-Dfile.encoding=UTF-8 Paste the example XML files into the \agent\work\monitors\ECCSender\output_2 folderNote: The examples includes customer data from a LDAP probe, so are not shared. It is proving surprisingly difficult to engineer an ecc input payload from scratch that triggers this issue on demand. Actual behaviour: In Vancouver/Washington/Xanadu, ECCSender.1 thread will retry infinitely, causing a performance issue for the MID Server in the process. In Yokohama, because of the fix for PRB1792071, that will be moved into output_error instead, preventing the looping and performance issues.In both cases this is still break the job, because not all inputs will make it back to the instance. Inspect the error in the MID Server agent log. Note: The actual character and lineNumber/columnNumber position is likely to be different each time 2024-12-30T13:10:14.484+0100 ERROR (ECCSender.1) [SimpleSaxParser$SaxHandler:110] Invalid byte 2 of 4-byte UTF-8 sequence.org.xml.sax.SAXParseException; systemId: file:/C:/.../agent/work/monitors/ECCSender/output_s/ecc_queue.192983473400000001.41925.xml; lineNumber: 1; columnNumber: 145417; Invalid byte 2 of 4-byte UTF-8 sequence. at org.apache.xerces.util.ErrorHandlerWrapper.createSAXParseException(Unknown Source) at org.apache.xerces.util.ErrorHandlerWrapper.fatalError(Unknown Source) at org.apache.xerces.impl.XMLErrorReporter.reportError(Unknown Source) at org.apache.xerces.impl.XMLErrorReporter.reportError(Unknown Source) at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown Source) at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source) at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source) at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source) at org.apache.xerces.parsers.XMLParser.parse(Unknown Source) at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source) at org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown Source) at org.apache.xerces.jaxp.SAXParserImpl.parse(Unknown Source) at java.xml/javax.xml.parsers.SAXParser.parse(SAXParser.java:330) at com.service_now.monitor.SimpleSaxParser.parseAsMap(SimpleSaxParser.java:44) at com.service_now.monitor.ECCSenderQueueFile.parseDataUsingFile(ECCSenderQueueFile.java:163) at com.service_now.monitor.ECCSenderCache.loadData(ECCSenderCache.java:456) at com.service_now.monitor.ECCSenderCache.processFile(ECCSenderCache.java:385) at com.service_now.monitor.ECCSenderCache.sendFile(ECCSenderCache.java:347) at com.service_now.monitor.ECCSenderCache.sendFiles(ECCSenderCache.java:296) at com.service_now.monitor.ECCSender.run(ECCSender.java:122) at com.snc.midserver.monitor.internal.MonitorRunner$MonitorTask.execute(MonitorRunner.java:275) at com.snc.midserver.monitor.internal.AMonitorTask.run(AMonitorTask.java:29) at java.base/java.util.TimerThread.mainLoop(Timer.java:566) at java.base/java.util.TimerThread.run(Timer.java:516)Caused by: org.apache.xerces.impl.io.MalformedByteSequenceException: Invalid byte 2 of 4-byte UTF-8 sequence. at org.apache.xerces.impl.io.UTF8Reader.invalidByte(Unknown Source) at org.apache.xerces.impl.io.UTF8Reader.read(Unknown Source) at org.apache.xerces.impl.XMLEntityScanner.load(Unknown Source) at org.apache.xerces.impl.XMLEntityScanner.scanContent(Unknown Source) at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanContent(Unknown Source) ... 20 more 2024-12-30T13:10:14.485+0100 WARN (ECCSender.1) [ECCSenderQueueFile:167] Failed to parse XML using SAX parser: Invalid byte 2 of 4-byte UTF-8 sequence.; reverting to using DOM parser2024-12-30T13:10:14.486+0100 WARN (ECCSender.1) [XMLUtil:532] org.xml.sax.SAXParseException; lineNumber: 1; columnNumber: 145417; Invalid byte 2 of 4-byte UTF-8 sequence.2024-12-30T13:10:14.486+0100 ERROR (ECCSender.1) [ECCSenderCache:394] Failure sending file: ecc_queue.has4byteunicode.18.xmljava.io.IOException: Failed to parse XML using DOM parser, null returned - file: C:\...\agent\work\monitors\ECCSender\output_s\ecc_queue.192983473400000001.41925.xml at com.service_now.monitor.ECCSenderQueueFile.parseDataUsingFile(ECCSenderQueueFile.java:182) at com.service_now.monitor.ECCSenderCache.loadData(ECCSenderCache.java:456) at com.service_now.monitor.ECCSenderCache.processFile(ECCSenderCache.java:385) at com.service_now.monitor.ECCSenderCache.sendFile(ECCSenderCache.java:347) at com.service_now.monitor.ECCSenderCache.sendFiles(ECCSenderCache.java:296) at com.service_now.monitor.ECCSender.run(ECCSender.java:122) at com.snc.midserver.monitor.internal.MonitorRunner$MonitorTask.execute(MonitorRunner.java:275) at com.snc.midserver.monitor.internal.AMonitorTask.run(AMonitorTask.java:29) at java.base/java.util.TimerThread.mainLoop(Timer.java:566) at java.base/java.util.TimerThread.run(Timer.java:516) To show that the character isn't fundamentally a problem for the servicenow platform in general:4/ Now manually edit the file, substituting the problem character into character reference instead. e.g. 🔎 into 🔎5/ Save, and the ECCSender will send that to the instance without problem.In the process the "character reference" has become a normal "UTF-8 sequence" again, displaying the emoji in the payload.txt in a text editor. Expected behaviour: Given that the UTF-8 data is valid, and that we've known the SAX Parser has this limitation for a while now without a solution, we should avoid that limitation by using a different parser, or by encoding 4 byte UTF-8 sequences as character references before ECCSender puts that data through the Xerces XML parser, or some other solution. Whatever works.WorkaroundThis problem is currently under review and targeted to be fixed in a future release. Subscribe to this Known Error article to receive notifications when more information will be available. There is no workaround to prevent this happening, except to work out what characters in the source data in the system the data is coming from, and update that live data to replace or delete the character/emoji. In the example above, the position of the character in the file is given as "columnNumber: 145417". To understand what that character is, and from where in the source data it is coming from, open the xml file in a text editor, and go to that position. NotePad++ will give the current position of the cursor in the bottom right of the screen. The data and field labels/json/xml/tags surrounding that character should allow you to work out what table/row/record/user etc. in the source system includes the character. The MID Server's agent log will confirm which specific records the ECCSender thread is having trouble with. The oldest records in the /agent/work/monitors/ECCSender/output_* folders are usually the problem ones. In Xanadu and earlier, these records will remain in the output_0/output_1/output_2/output_s folder, and continue to be retried infinitely. To stop the retries, the xml files need moving out of the out put folders. That can be done remotely using s Command probe. For more details see:PRB1792071/KB1695722 Mid server ECCsender fails to process XML files and continuously reties, if the file is truncated, or contains invalid/control/unprintable characters. In Yokohama and later, the failed files will be moved to the output_error folder automatically.Related Problem: PRB1840424