MID server can fail to start, throwing an Out-of-Memory error, if there are a large number of XML files in the ECC Sender folder


A MID server can fail to startup with an Out-of-Memory error shown in the logs if there are a large number of XML files in the ECC Sender folder.

Each XML record is a queued result/response that needs to be sent back to the instance to become an ecc_queue table input record. All probes create an XML file for their result, but jobs such as JDBCProbe can produce a huge number of these small input files for each output job. Generally, the results are split into 200 rows per file/XML payload, so for large imports, this can be a lot. If a MID Server cannot get the inputs back to the instance as quickly as they are being generated then a backlog will build up. That could be caused by a slow connection (compared to the connection to the target server), but also a loss of connection to the instance because the MID Server will continue running the jobs it has already taken in that situation and continue building up the backlog.

When a MID Server starts up, the ECCSender thread is started and the first thing it does is do a directory listing of the files in the .\agent\work\monitors\ECCSender folders so that it can send those previous results to the instance. That listing itself can take a huge amount of CPU time and Memory.

The MID Server agent logs will include something like this. "java.io.WinNTFileSystem.canonicalizeWithPrefix" is the folder listing command (a bit like 'dir'):

10/03/18 07:13:02 (378) StartupSequencer SEVERE *** ERROR *** java.lang.reflect.InvocationTargetException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at com.glide.util.ClassUtil.newInstance(ClassUtil.java:170)
at com.service_now.mid.services.Monitors.createMonitor(Monitors.java:225)
at com.service_now.mid.services.Monitors.loadInternalMonitor(Monitors.java:155)
at com.service_now.mid.services.Monitors.loadInternalMonitors(Monitors.java:124)
at com.service_now.mid.services.Monitors.start(Monitors.java:58)
at com.service_now.mid.services.Monitors.onMIDServerEvent(Monitors.java:343)
at com.service_now.mid.services.Events.internalFire(Events.java:102)
at com.service_now.mid.services.Events.fire(Events.java:34)
at com.service_now.mid.services.StartupSequencer.startServices(StartupSequencer.java:201)
at com.service_now.mid.services.StartupSequencer.testsSucceeded(StartupSequencer.java:103)
at com.service_now.mid.services.StartupSequencer.access$100(StartupSequencer.java:53)
at com.service_now.mid.services.StartupSequencer$Starter.run(StartupSequencer.java:305)
Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded
at java.io.WinNTFileSystem.canonicalizeWithPrefix(WinNTFileSystem.java:451)
at java.io.WinNTFileSystem.canonicalize(WinNTFileSystem.java:422)
at java.io.File.getCanonicalPath(File.java:618)
at java.io.FilePermission$1.run(FilePermission.java:215)
at java.io.FilePermission$1.run(FilePermission.java:203)
at java.security.AccessController.doPrivileged(Native Method)
at java.io.FilePermission.init(FilePermission.java:203)
at java.io.FilePermission.<init>(FilePermission.java:277)
at java.lang.SecurityManager.checkRead(SecurityManager.java:888)
at com.service_now.mid.security.MIDSecurityManager.checkRead(MIDSecurityManager.java:31)
at java.io.File.isFile(File.java:877)
at com.service_now.monitor.ECCSenderQueueFile.init(ECCSenderQueueFile.java:49)
at com.service_now.monitor.ECCSenderQueueFile.<init>(ECCSenderQueueFile.java:37)
at com.service_now.monitor.ECCSenderCache.getQueueFiles(ECCSenderCache.java:589)
at com.service_now.monitor.ECCSenderCache.<init>(ECCSenderCache.java:100)
at com.service_now.monitor.ECCSender.<init>(ECCSender.java:84)

Steps to Reproduce

This is not going to be easy to reproduce.


The files blocking the MID Server from starting are within the install folder of the MID Server. 

  1. Rename folder agent\work\monitors\ECCSender\output_2 to agent\work\monitors\ECCSender\output_2_OLD
  2. Create a new empty agent\work\monitors\ECCSender\output_2 folder to replace it
  3. Start the MID Server

The payloads will now not be sent back to the instance. The jobs that generated those results may need running again.

The number in the folder name equates to the priority of the job. You may find records in the other folders as well.

If there is vital data in the records, then it could be searched, based on the XML file contents to identify them, and then each selectively moved back to the folder.

Note: The system property "glide.mid.max.sender.queue.size" is not a workaround for this, as there could be millions of very small files, and still be below that limit. This just prevents the ECC Sender folder from getting too large (size of data), but does not control the number of files. Depending on the size of the payloads, which for JDBC Data with small rows will be very small, there can be a huge number of records before the size limit is reached.

Related Problem: PRB1316111