Event Management Events Stuck in 'Ready' state - Support and Troubleshooting

Event Management Events Stuck in 'Ready' state Issue This article provides troubleshooting guidelines for event entries in the [em_event] table stuck in 'Ready' state for an extended period of time.

Release All releases.

Cause There are several causes that can lead to events stuck in 'Ready' state. The processing of the events is done by the "Event Management - process events" scheduled job.

Cause 1: Event processing is backed up.

By default, Multi Node Event Processing is not enabled, thus there is only one scheduled job If there are more events getting created than the systems ability to process these events, events will be backed up resulted in new events stays in 'Ready' state for longer period of time.

Cause 2: Custom Business Rules and Script Includes adds extra overhead delaying event processing

As best practice, we do not recommend having Custom Business Rules on em_event and em_alert table If Custom Business Rules has to be added, make sure it's performant. Under high event load, minimal overhead added can add up to major delay event if Multi Node Event Processing and multi thread is enabled.

Cause 3: Events are not getting picked up due to issue with Schedule Manager

This usually happens when Multi Node Event Processing is enabled along with Multi Scheduled Jobs Events coming into service now have a specific bucket between 0 - 99 assigned. When Multi Node Event Processing is enabled, on each node, the bucket range will be divided evenly among the scheduled jobs. Ex: if number of scheduled jobs processing events is 4, then each job is responsible for processing each event in a specific range: [0 - 24], [25 - 49], [50 - 74], [75 - 99]. The stuck events are likely belong to a particular bucket in one of the above ranges. This indicates the scheduled job assigned to the affected range are not operational. This was suspected to be an issue with the Schedule Manager goes out of sync when cluster nodes leaves/joins the cluster.

Cause 4. Events are not queried due to table sharding.

em_event table is sharded. By design, Event Managment only queries events in the current shard and the one before it. If for some reason, events were not picked up for a more than 2 days (Cause 1 - 3, Jobs suspended, etc) , i.e. those events are now in N-2 to N-7 shards, they won't be picked up again.

Resolution Cause 1: Event processing is backed up

Enable Multi Node Event Processing and Increase the # of scheduled jobs processing events Load test has to be carried out to see what's the event consuming rate vs. event creation rate.

Cause 2: Custom Business Rules and Script Includes adds extra overhead delaying event processing

Add timing logic in Custom Business Rules and Script includes to see how much delay is being added for each event. Factor in the event rate and event processing cycle of the scheduled job and optimize the Customization as much as possible.

Cause 3: Events are not getting picked up due to issue with Schedule Manager

The issue occurs very infrequent and not reproducible in internal testing. Current work around is to refresh the scheduled jobs by doing the following: Goto Event Management > Settings > Properties Change ' Number of scheduled jobs processing events ' to a different value (ex: from 2 to 1). Save Change ' Number of scheduled jobs processing events ' to previous value. Save

Cause 4. Events are not queried due to table sharding.

If the stuck event are needs to be processed. Contact Customer Support through HI to have the events moved to current shards Configure System Property named " evt_mgmt.events_processing_all_shards " with value = " true " Wait a few processing cycle for Event Management event processing jobs to catch up with old back log if there are many.

Related Links Components installed with Incident Management - Major Incident Management