Aggregation Learner: RCA for learned patterns cause high memory usage for patterns with large number of CIs (more then 300)


Description

Need to improve the performance of RCA calculation for learned patterns. This issue is causing the instance run out of memory.

When the issue happen, please capture stacktrace and heapdump.

The stacktrace looks like following:

glide.scheduler.worker.3
at java.util.HashMap.putVal(ILjava/lang/Object;Ljava/lang/Object;ZZ)Ljava/lang/Object; (HashMap.java:631)
at java.util.HashMap.put(Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object; (HashMap.java:612)
at java.util.HashSet.add(Ljava/lang/Object;)Z (HashSet.java:220)
at com.snc.sw.util.AnalyticsGraphUtil.splitDependencyMapByDfs(Ljava/util/List;Ljava/util/Map;)Ljava/util/Map; (AnalyticsGraphUtil.java:58)
at com.snc.sw.util.AnalyticsGraphUtil.splitDependencyMapByDfs(Ljava/lang/String;Ljava/util/Map;)Ljava/util/Map; (AnalyticsGraphUtil.java:36)
at com.snc.sa.dal.CmdbDependenciesProvider.splitDependencyToSubGraphForCiCount(Ljava/util/Map;Ljava/util/Set;)Ljava/util/Map; (CmdbDependenciesProvider.java:436)
at com.snc.sa.dal.CmdbDependenciesProvider.getRelatedCisSubgraphsByCMDBWalkerDependency(Lcom/snc/sa/analytics/query/CMDBWalkerMetaData;)Ljava/util/Map; (CmdbDependenciesProvider.java:311)
at com.snc.sa.dal.CmdbDependenciesProvider.getRelatedAndDependencyMapByCMDBWalker(Lcom/snc/sa/analytics/query/CMDBWalkerMetaData;)Ljava/util/Map; (CmdbDependenciesProvider.java:169)
at com.snc.sa.dal.CmdbForAnalytics.getGraphForRootCauseIdentification(Ljava/util/Set;IZZZ)Ljava/util/Map; (CmdbForAnalytics.java:525)
at com.snc.sa.dal.CmdbForAnalytics.getGraphForRootCauseIdentification(Ljava/util/Set;IZ)Ljava/util/Map; (CmdbForAnalytics.java:504)
at com.snc.sa.dal.CmdbForAnalytics.getRootCauseCis(Ljava/util/Set;IZ)Ljava/util/Set; (CmdbForAnalytics.java:440)
at com.snc.sa.dal.CmdbForAnalytics.getRootCauseCis(Ljava/util/Set;)Ljava/util/Set; (CmdbForAnalytics.java:422)
at com.snc.sa.analytics.learner.AggregationLearner.insertData(Lcom/snc/sa/analytics/learner/AlertGroupList;Lcom/snc/sa/analytics/common/model/AggPatternModel$TypeEnum;ZLcom/snc/sa/analytics/learner/FeatureMap;)V (AggregationLearner.java:851)
at com.snc.sa.analytics.learner.AggregationLearner.processObservations(Lcom/snc/sa/analytics/learner/AlertObservationList;Ljava/lang/String;Ljava/lang/String;Lcom/snc/sa/analytics/common/model/AggPatternModel$TypeEnum;)V (AggregationLearner.java:437)
at com.snc.sa.analytics.learner.AggregationLearner.executeByCmdbProperty(Ljava/lang/String;Ljava/lang/String;)V (AggregationLearner.java:310)
at com.snc.sa.analytics.learner.AggregationLearner.executeForDomain()V (AggregationLearner.java:266)
at com.snc.sa.analytics.learner.AggregationLearner.execute()V (AggregationLearner.java:216)
at com.snc.sa.analytics.processor.ServiceAnalyticsProcessor.jsFunction_aggLearn()V (ServiceAnalyticsProcessor.java:230)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Ljava/lang/reflect/Method;Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object; (Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object; (NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object; (DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object; (Method.java:498)
at org.mozilla.javascript.MemberBox.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object; (MemberBox.java:138)
at org.mozilla.javascript.FunctionObject.doInvoke(Lorg/mozilla/javascript/Context;Lorg/mozilla/javascript/MemberBox;Ljava/lang/Object;[Ljava/lang/Object;Lorg/mozilla/javascript/Scriptable;Lorg/mozilla/javascript/Scriptable;)Ljava/lang/Object; (FunctionObject.java:670)
at org.mozilla.javascript.FunctionObject.call(Lorg/mozilla/javascript/Context;Lorg/mozilla/javascript/Scriptable;Lorg/mozilla/javascript/Scriptable;[Ljava/lang/Object;Z)Ljava/lang/Object; (FunctionObject.java:614)
at org.mozilla.javascript.ScriptRuntime.doCall(Lorg/mozilla/javascript/Callable;Lorg/mozilla/javascript/Context;Lorg/mozilla/javascript/Scriptable;Lorg/mozilla/javascript/Scriptable;[Ljava/lang/Object;)Ljava/lang/Object; (ScriptRuntime.java:2609)
at org.mozilla.javascript.Interpreter.interpretLoop(Lorg/mozilla/javascript/Context;Lorg/mozilla/javascript/Interpreter$CallFrame;Ljava/lang/Object;)Ljava/lang/Object; (Interpreter.java:1501)
at org.mozilla.javascript.Interpreter.interpret(Lorg/mozilla/javascript/InterpretedFunction;Lorg/mozilla/javascript/Context;Lorg/mozilla/javascript/Scriptable;Lorg/mozilla/javascript/Scriptable;[Ljava/lang/Object;)Ljava/lang/Object; (Interpreter.java:829)
at org.mozilla.javascript.InterpretedFunction.lambda$call$0(Lorg/mozilla/javascript/Context;Lorg/mozilla/javascript/Scriptable;Lorg/mozilla/javascript/Scriptable;[Ljava/lang/Object;)Ljava/lang/Object; (InterpretedFunction.java:152)
at org.mozilla.javascript.InterpretedFunction$$Lambda$142.get()Ljava/lang/Object; (Unknown Source)
at org.mozilla.javascript.Context$ScriptCaller.call(Ljava/util/function/Supplier;Ljava/lang/String;)Ljava/lang/Object; (Context.java:2941)
at org.mozilla.javascript.InterpretedFunction.call(Lorg/mozilla/javascript/Context;Lorg/mozilla/javascript/Scriptable;Lorg/mozilla/javascript/Scriptable;[Ljava/lang/Object;)Ljava/lang/Object; (InterpretedFunction.java:151)
at org.mozilla.javascript.ContextFactory.doTopCall(Lorg/mozilla/javascript/Callable;Lorg/mozilla/javascript/Context;Lorg/mozilla/javascript/Scriptable;Lorg/mozilla/javascript/Scriptable;[Ljava/lang/Object;)Ljava/lang/Object; (ContextFactory.java:563)
at org.mozilla.javascript.ScriptRuntime.doTopCall(Lorg/mozilla/javascript/Callable;Lorg/mozilla/javascript/Context;Lorg/mozilla/javascript/Scriptable;Lorg/mozilla/javascript/Scriptable;[Ljava/lang/Object;Z)Ljava/lang/Object; (ScriptRuntime.java:3459)
at org.mozilla.javascript.InterpretedFunction.exec(Lorg/mozilla/javascript/Context;Lorg/mozilla/javascript/Scriptable;)Ljava/lang/Object; (InterpretedFunction.java:164)
at com.glide.script.ScriptEvaluator.execute(Ljava/lang/String;Lorg/mozilla/javascript/Scriptable;Ljava/lang/String;Lorg/mozilla/javascript/Context;)Ljava/lang/Object; (ScriptEvaluator.java:279)
at com.glide.script.ScriptEvaluator.evaluateString(Ljava/lang/String;Lorg/mozilla/javascript/Scriptable;Ljava/lang/String;Z)Ljava/lang/Object; (ScriptEvaluator.java:118)
at com.glide.script.ScriptEvaluator.evaluateString(Ljava/lang/String;Ljava/lang/String;Z)Ljava/lang/Object; (ScriptEvaluator.java:82)
at com.glide.script.ScriptEvaluator.evaluateString(Ljava/lang/String;Z)Ljava/lang/Object; (ScriptEvaluator.java:73)
at com.glide.script.Evaluator.evaluateString(Ljava/lang/String;)Ljava/lang/Object; (Evaluator.java:103)
at com.snc.automation.ScriptJob.executeInSingleDomain(Ljava/lang/String;)V (ScriptJob.java:57)
at com.snc.automation.ScriptJob.execute()I (ScriptJob.java:41)
at com.glide.schedule.JobExecutor.lambda$executeJob$0(Lcom/glide/job/GlideJob;)Ljava/lang/Void; (JobExecutor.java:115)
at com.glide.schedule.JobExecutor$$Lambda$449.call()Ljava/lang/Object; (Unknown Source)
at com.glide.schedule.JobExecutor.executeJob(Lcom/glide/job/GlideJob;)V (JobExecutor.java:118)
at com.glide.schedule.JobExecutor.execute()V (JobExecutor.java:102)
at com.glide.schedule_v2.SchedulerWorkerThread.executeJob(Lcom/glide/schedule/JobDescriptor;Lcom/glide/sys/BGTransaction;)V (SchedulerWorkerThread.java:300)
at com.glide.schedule_v2.SchedulerWorkerThread.lambda$process$0(Lcom/glide/sys/BGTransaction;)V (SchedulerWorkerThread.java:188)
at com.glide.schedule_v2.SchedulerWorkerThread$$Lambda$437.run()V (Unknown Source)
at com.glide.worker.TransactionalWorkerThread.executeInTransaction(Ljava/lang/Runnable;Lcom/glide/sys/Transaction;)V (TransactionalWorkerThread.java:35)
at com.glide.schedule_v2.SchedulerWorkerThread.process()V (SchedulerWorkerThread.java:188)
at com.glide.schedule_v2.SchedulerWorkerThread.run()V (SchedulerWorkerThread.java:102)

Steps to Reproduce

1. Verify that sa_analytics.agg.learner_rca_detection is true
2. run getRootCauseCis method of ServiceAnalyticsProcessor on large number of CIs

var sap = new SNC.ServiceAnalyticsProcessor();
sap.getRootCauseCis(cis);

3. Use performance monitor to see that there is memory peak just after the call to this function

Workaround

Set sa_analytics.agg.learner_rca_detection to false on the instance to resolve the issue


Related Problem: PRB1456651