How to Resolve Max Failure Threshold Errors for CMDB Health Dashboard - Support and Troubleshooting

How to Resolve Max Failure Threshold Errors for CMDB Health Dashboard .ns-kb-css-body-editor-container { p { font-size: 12pt; font-family: Lato; color: #000000; } span { font-size: 12pt; font-family: Lato; color: #000000; } h2 { font-size: 24pt; font-family: Lato; color: black; } h3 { font-size: 18pt; font-family: Lato; color: black; } h4 { font-size: 14pt; font-family: Lato; color: black; } a { font-size: 12pt; font-family: Lato; color: #00718F; } a:hover { font-size: 12pt; color: #024F69; } a:target { font-size: 12pt; color: #032D42; } a:visited { font-size: 12pt; color: #00718f; } ul { font-size: 12pt; font-family: Lato; } li { font-size: 12pt; font-family: Lato; } img { display: ; max-width: ; width: ; height: ; } } When working with the Health Dashboard, you may encounter "max_failures" errors that prevent proper scoring and analysis. These errors typically appear when one or more jobs fail due to number of configuration items (CIs) that failed a metric has reached the configured threshold and default is 50000 failures. This article provides methods to resolve these issues effectively across completeness, correctness, and compliance metrics.

Identifying the Root Cause Before implementing any solution, it's crucial to understand what's causing the failures across completeness, correctness, and compliance metrics. The approach is consistent regardless of the metric type:

Click on the metric card: Click on the completeness, correctness, or compliance card to display the failure scorecards at the bottom. Click on the failing scorecard: Click on the scorecard for the given class to access the list view of failures. Examine failure descriptions: Use the info icon to view the description and understand exactly what's causing each failure - whether it's a missing field, duplicate CI, orphan condition, stale data, or failing audit. Identify patterns: Look for common patterns in the failures to determine the most effective resolution approach. Adding a discovery source column (when applicable) helps identify which data source is bringing in the problematic information, providing valuable context for resolution.

Completeness When examining completeness issues, you may see max failures for both required attributes and recommended attributes. The failures will be displayed in the bottom scorecards showing classes with most issues for recommended/required.

Method 1: Remove Unnecessary Attributes from Health Configuration When to use: When an attribute doesn't make business sense to be required or recommended, especially if it's missing across many CIs.

Steps:

Navigate to CI Class Manager Go to Hierarchy and select the affected class Click on the Health tab For recommended attributes: In the completeness section, locate the problematic attribute and remove it if it doesn't make sense to have it as recommended For required attributes: If the attribute is missing in a lot of CIs, check if it's truly required. If it is not, navigate to the Attributes section, sort by mandatory fields, and edit the attribute to set mandatory to false Save and run the completeness job again.

Required:

Recommended:

Method 2: Fix Data Source Configuration When to use: When the attribute should be present but is missing due to data source configuration issues.

Steps:

Identify the discovery source bringing in the data Navigate to the data source/connector configuration Check the mapping configuration for the specific attribute Ensure the attribute is getting mapped correctly from the source system Modify the mapping configuration if needed to properly capture and populate the data This method addresses the root cause by ensuring that the data source properly captures and populates the required information.

Method 3: Use Inclusion Rules to Exclude Classes When to use: When certain classes are expensive to process or don't contain valid data for health assessment.

Steps:

Create inclusion rules to exclude the problematic class from health computation Configure the rule to skip evaluation of classes that consistently fail Apply the rule to your health assessment configuration

Method 4: Increase Failure Threshold (Last Resort) When to use: Only as a last resort when other methods are not feasible.

Steps:

Navigate to Health Preferences from the left navigation Click on Health Metrics Select the appropriate metric type based on your issue Locate the failure threshold setting Increase the threshold number Save the configuration

Warning: This method should be used sparingly as it's computationally expensive to store and process large numbers of failures. It can significantly impact system performance.

Correctness Correctness issues in the Health Dashboard can manifest as duplicates, orphans, or stale CIs. When you see max failures for correctness metrics, use the bottom scorecards to identify the specific type of correctness issue and the classes most affected.

Duplicate CIs When you see max failures for duplicates, examine the top 10 classes displayed in the scorecards.

Method 1: Use Dedupe Template Remediation When to use: When you have legitimate duplicate CIs that need to be resolved.

Steps:

Look for dedupe tasks attached to the duplicate CIs Use de-duplication template remediation to resolve these duplicates Method 2: Use Inclusion Rules (Global Level Only) When to use: When certain classes consistently have duplicate issues that are difficult to address.

Steps:

Apply inclusion rules at the global level to exclude classes that cannot be easily resolved Configure the rule to skip duplicate evaluation for problematic classes Method 3: Increase Failure Threshold (Last Resort) When to use: Only as a last resort when other methods are not feasible.

Steps:

Navigate to Health Preferences from the left navigation Click on Health Metrics Select the appropriate metric type based on your issue Locate the failure threshold setting Increase the threshold number Save the configuration Orphan CIs For orphan CI max failures, examine the scorecard to identify which class has the highest number of orphan issues. The description of health result failure will explain exactly why the CI is failing the orphan check.

Method 1: Adjust Orphan Conditions When to use: When the orphan rule conditions need to be refined to reduce false positives.

Steps:

Navigate to CI Class Manager from the left navigation Select the class with the most orphan failures from the scorecard Click on the Health tab In the Health section, click on Correctness Review the orphan rule and assess the defined conditions Adjust the condition to determine if it makes sense or modify it to reduce failures

Method 2: Use Inclusion Rules When to use: When certain classes cannot be easily resolved or orphan checking isn't meaningful.

Steps:

Apply inclusion rules globally to exclude classes that cannot be easily resolved Configure rules for classes where orphan checking isn't meaningful Method 3: Increase Failure Threshold (Last Resort) When to use: Only as a last resort when other methods are not feasible.

Steps:

Navigate to Health Preferences from the left navigation Click on Health Metrics Select the appropriate metric type based on your issue Locate the failure threshold setting Increase the threshold number Save the configuration

Stale CIs When dealing with staleness max failures, click on the stale CIs scorecard to identify the class with many stale CIs and examine the threshold settings.

Method 1: Adjust Staleness Threshold When to use: When the staleness threshold is too restrictive for your environment.

Steps:

Navigate to CI Class Manager Select the affected class Go to Health → Correctness → Staleness Rule Adjust the effective duration to reduce the number of threshold violations

Method 2: Address Data Source Issues When to use: When data sources are not running frequently enough or have connectivity issues.

Steps:

Look at the source bringing in the data from the health dashboard results Check when the source was last run and why the CI is not getting discovered Rerun the connector to refresh the data and reduce staleness failures Method 3: Use Inclusion Rules When to use: When certain classes don't make sense to monitor for staleness or are intentionally static.

Steps:

Create inclusion rules to exclude the problematic class from staleness evaluation Configure the rule to skip staleness checking for classes that are intentionally static Apply the rule to your health assessment configuration Method 4: Increase Failure Threshold (Last Resort) When to use: Only as a last resort when other methods are not feasible.

Steps:

Navigate to Health Preferences from the left navigation Click on Health Metrics Select the appropriate metric type based on your issue Locate the failure threshold setting Increase the threshold number Save the configuration

Compliance Compliance max failure thresholds differ from completeness and correctness because compliance data comes through audits, desired state and scripted audits. Use the bottom scorecards to identify the classes with the most compliance failures and examine the specific audits causing issues.

In the description of each failed health result CI, you'll see which specific audit is causing the failure along with the audit name. This information is crucial for determining the resolution approach.

Method 1: Adjust Audit Conditions (Primary Method) When to use: When audit conditions are too restrictive or need to be updated for current business requirements.

Steps:

From the left navigation, click on Audits Find the specific audit that's causing the failures (identified from the CI description) Navigate to the audit template Review the certification conditions defined in the template Adjust the conditions to reduce the number of failures while maintaining compliance requirements

This is the primary method for resolving compliance failures because it addresses the root cause by modifying the audit criteria that determine compliance status.

Method 2: Modify Audit Filter Conditions When to use: When you have too many CIs causing issues and need to reduce the scope of evaluation.

Steps:

Navigate to the audit template that's causing the failures Open the filter condition within the template Review the number of records it's currently matching Add more specific conditions to reduce the number of CIs evaluated for the audit by identifying and adding conditions that would exclude problematic CIs Save the configuration This method reduces overall failures by limiting the scope of CIs that are evaluated during the audit process, focusing on the most relevant items for compliance assessment.

Method 3: Increase Failure Threshold When to use: As a temporary measure while working on audit condition adjustments.

Steps:

Navigate to Health Preferences from the left navigation Click on Health Metrics Select the appropriate metric type based on your issue Locate the failure threshold setting Increase the threshold number Save the configuration Warning: Changes to audit conditions should be carefully considered to maintain compliance requirements

Best Practices and Recommendations Start with root cause analysis: Always investigate the specific cause of failures before implementing solutions Prioritize sustainable fixes: Address data source issues and configuration problems rather than just increasing thresholds Use threshold increases sparingly: Increasing thresholds should be a temporary measure while working on more permanent solutions Consider performance impact: Be aware that inclusion rules and high failure thresholds can affect system performance Conclusion Max failure threshold errors can significantly impact health assessment accuracy, but they're manageable with the right approach. Focus on sustainable fixes over threshold increases, and choose methods that align with your business requirements. Only CMDB admins can access and modify failure threshold settings.