Improve Event Management event processing by decoupling Alert and Incident Management activitiesIssue A brief overview of how Event Management event processing (em_event) works: - Scheduled job batches events to process- Script copies event data into memory- Iterate in-memory event data for processing-- Create/Update Alert records-- Trigger Alert processing (Such as BRs, creating/updating incidents)-- Trigger Incident processing (Such as BRs/workflows/etc)-- Update in-memory values for event records to be sent back to the DB- Update the DB with new em_event values (e.g: Alert Number values) So what this means is that long running processes/activities kicked off by Alert Management or Incident Management can cause Event Processing to become delayed, increasing the backlog of em_events sitting in a Ready state. While there are a number of circumstances that can cause issues or delays in Event Processing, this article will focus on decoupling long running Alert/Incident activities from Event Processing to keep it as performant as possible.ReleaseAllCauseYou can verify if your Event Processing delay fits the criteria for this article through the following: - If you note in the "Active Transactions (All Nodes)" module that there are "Event Management - process events" jobs running for longer than several minutes - If you note in stats.do for the node assigned to that job that the node "Free Percentage" memory is not close to 0% - If you note in threads.do for the node assigned to that job (and the worker running the process events job), refreshing every 30s, that the java stack is changing (indicating the job isn't hung) - If you have business rules, workflows, or flows related to Alerts or Incidents that could potentially take a significant amount of time to processResolutionThe changes to make to decouple the Alert/Incident Management activities is relatively simple: - Flows/Workflows: Implement a "Timer" or "Wait for Condition" -- This removes the Flow/Workflow from the transaction and puts it into the Scheduler queue to be picked up and executed separately from Event Processing - Business Rules: Set to "Async" instead of "Before" or "After", or have the Business Rule create a Scheduled Job to execute the desired code -- Either option achieves the same behavior as detailed for Flows/Workflows. The preferred option would be determined by simplicity vs. immediacy. If you need the behavior in the BR executed ASAP, you would want the BR to create a Scheduled Job with the desired code and trigger it to execute immediately. If the behavior does not need to be immediate and/or you want a simpler solution, setting the BR to "Async" will have the system process the job when it has resources. This typically won't be a significant delay unless the Instance is under significant load. The trickier part is identifying what activities are taking a significant amount of time to process: - Review workflows/flows running at time of Event Processing and determine if they are triggered by Event/Alert/Incident Management processes. - (Advanced) Review java stacks from threads.do for clues as to what is slowing down the processing (e.g: Look for WorkflowEngine classes, sys_ids for business rules or script includes, GlideRecord and iterate functions, etc) - Review custom business rules on em_alert and incident that trigger additional scripts or flows/workflows