The default/recommended MID Server CPU alert/mid server issue threshold is unrealistic for a MID Server doing Discovery


A MID Server set up with the default values for MID Server resource threshold alerts (95% for 30 minutes) is almost guaranteed to alert for CPU usage if a Discovery schedule is run. This false alert implies there is an issue with the MID Server, but that is not the case. These mid server issues can also lead to event management self-health monitoring alerts.
Number of 10 minute units in the interval for sampling CPU usage data. The default interval is 30 minutes (3 x 10 min.)
Default: 3
Usage percentage of the total CPU resources that initiates a threshold breach alert.
Default: 95"

Due to Discovery's deliberate optimization of Shazzam and other probes/patterns for maximum throughput, 100% CPU use is to be expected almost immediately, and stay that way for most of the discovery schedule.  A 10k node schedule run in a single mid placed in an AWS instance of type m5.xlarge with 8 CPUs and 16 GB memory, will be expected to max out the CPU for long periods. That test configuration is higher spec, and  with double the CPU/memory, of the currently documented  recommended minimum for a Discovery MID Server.

Steps to Reproduce

  1. Set up a MID Server on a Paris or Quebec instance
  2. Configure the default properties for mid server cpu monitoring
  3. Run the in-house Discovery test suites, or a Discovery schedule that would be expected to run for several hours


Related Problem: PRB1458352