How to Resolve Max Failure Threshold Errors for CMDB Health Dashboard<!-- /*NS Branding Styles*/ --> .ns-kb-css-body-editor-container { p { font-size: 12pt; font-family: Lato; color: #000000; } span { font-size: 12pt; font-family: Lato; color: #000000; } h2 { font-size: 24pt; font-family: Lato; color: black; } h3 { font-size: 18pt; font-family: Lato; color: black; } h4 { font-size: 14pt; font-family: Lato; color: black; } a { font-size: 12pt; font-family: Lato; color: #00718F; } a:hover { font-size: 12pt; color: #024F69; } a:target { font-size: 12pt; color: #032D42; } a:visited { font-size: 12pt; color: #00718f; } ul { font-size: 12pt; font-family: Lato; } li { font-size: 12pt; font-family: Lato; } img { display: ; max-width: ; width: ; height: ; } } When working with the Health Dashboard, you may encounter "max_failures" errors that prevent proper scoring and analysis. These errors typically appear when one or more jobs fail due to number of configuration items (CIs) that failed a metric has reached the configured threshold and default is 50000 failures. This article provides methods to resolve these issues effectively across completeness, correctness, and compliance metrics. Identifying the Root Cause Before implementing any solution, it's crucial to understand what's causing the failures across completeness, correctness, and compliance metrics. The approach is consistent regardless of the metric type: Click on the metric card: Click on the completeness, correctness, or compliance card to display the failure scorecards at the bottom.Click on the failing scorecard: Click on the scorecard for the given class to access the list view of failures.Examine failure descriptions: Use the info icon to view the description and understand exactly what's causing each failure - whether it's a missing field, duplicate CI, orphan condition, stale data, or failing audit.Identify patterns: Look for common patterns in the failures to determine the most effective resolution approach. Adding a discovery source column (when applicable) helps identify which data source is bringing in the problematic information, providing valuable context for resolution. Completeness When examining completeness issues, you may see max failures for both required attributes and recommended attributes. The failures will be displayed in the bottom scorecards showing classes with most issues for recommended/required. Method 1: Remove Unnecessary Attributes from Health Configuration When to use: When an attribute doesn't make business sense to be required or recommended, especially if it's missing across many CIs. Steps: Navigate to CI Class ManagerGo to Hierarchy and select the affected classClick on the Health tabFor recommended attributes: In the completeness section, locate the problematic attribute and remove it if it doesn't make sense to have it as recommendedFor required attributes: If the attribute is missing in a lot of CIs, check if it's truly required. If it is not, navigate to the Attributes section, sort by mandatory fields, and edit the attribute to set mandatory to falseSave and run the completeness job again. Required: Recommended: Method 2: Fix Data Source Configuration When to use: When the attribute should be present but is missing due to data source configuration issues. Steps: Identify the discovery source bringing in the dataNavigate to the data source/connector configurationCheck the mapping configuration for the specific attributeEnsure the attribute is getting mapped correctly from the source systemModify the mapping configuration if needed to properly capture and populate the data This method addresses the root cause by ensuring that the data source properly captures and populates the required information. Method 3: Use Inclusion Rules to Exclude Classes When to use: When certain classes are expensive to process or don't contain valid data for health assessment. Steps: Create inclusion rules to exclude the problematic class from health computationConfigure the rule to skip evaluation of classes that consistently failApply the rule to your health assessment configuration Method 4: Increase Failure Threshold (Last Resort) When to use: Only as a last resort when other methods are not feasible. Steps: Navigate to Health Preferences from the left navigationClick on Health MetricsSelect the appropriate metric type based on your issueLocate the failure threshold settingIncrease the threshold numberSave the configuration Warning: This method should be used sparingly as it's computationally expensive to store and process large numbers of failures. It can significantly impact system performance. Correctness Correctness issues in the Health Dashboard can manifest as duplicates, orphans, or stale CIs. When you see max failures for correctness metrics, use the bottom scorecards to identify the specific type of correctness issue and the classes most affected. Duplicate CIs When you see max failures for duplicates, examine the top 10 classes displayed in the scorecards. Method 1: Use Dedupe Template Remediation When to use: When you have legitimate duplicate CIs that need to be resolved. Steps: Look for dedupe tasks attached to the duplicate CIsUse de-duplication template remediation to resolve these duplicates Method 2: Use Inclusion Rules (Global Level Only) When to use: When certain classes consistently have duplicate issues that are difficult to address. Steps: Apply inclusion rules at the global level to exclude classes that cannot be easily resolvedConfigure the rule to skip duplicate evaluation for problematic classes Method 3: Increase Failure Threshold (Last Resort) When to use: Only as a last resort when other methods are not feasible. Steps: Navigate to Health Preferences from the left navigationClick on Health MetricsSelect the appropriate metric type based on your issueLocate the failure threshold settingIncrease the threshold numberSave the configuration Orphan CIs For orphan CI max failures, examine the scorecard to identify which class has the highest number of orphan issues. The description of health result failure will explain exactly why the CI is failing the orphan check. Method 1: Adjust Orphan Conditions When to use: When the orphan rule conditions need to be refined to reduce false positives. Steps: Navigate to CI Class Manager from the left navigationSelect the class with the most orphan failures from the scorecardClick on the Health tabIn the Health section, click on CorrectnessReview the orphan rule and assess the defined conditionsAdjust the condition to determine if it makes sense or modify it to reduce failures Method 2: Use Inclusion Rules When to use: When certain classes cannot be easily resolved or orphan checking isn't meaningful. Steps: Apply inclusion rules globally to exclude classes that cannot be easily resolvedConfigure rules for classes where orphan checking isn't meaningful Method 3: Increase Failure Threshold (Last Resort) When to use: Only as a last resort when other methods are not feasible. Steps: Navigate to Health Preferences from the left navigationClick on Health MetricsSelect the appropriate metric type based on your issueLocate the failure threshold settingIncrease the threshold numberSave the configuration Stale CIs When dealing with staleness max failures, click on the stale CIs scorecard to identify the class with many stale CIs and examine the threshold settings. Method 1: Adjust Staleness Threshold When to use: When the staleness threshold is too restrictive for your environment. Steps: Navigate to CI Class ManagerSelect the affected classGo to Health → Correctness → Staleness RuleAdjust the effective duration to reduce the number of threshold violations Method 2: Address Data Source Issues When to use: When data sources are not running frequently enough or have connectivity issues. Steps: Look at the source bringing in the data from the health dashboard resultsCheck when the source was last run and why the CI is not getting discoveredRerun the connector to refresh the data and reduce staleness failures Method 3: Use Inclusion Rules When to use: When certain classes don't make sense to monitor for staleness or are intentionally static. Steps: Create inclusion rules to exclude the problematic class from staleness evaluationConfigure the rule to skip staleness checking for classes that are intentionally staticApply the rule to your health assessment configuration Method 4: Increase Failure Threshold (Last Resort) When to use: Only as a last resort when other methods are not feasible. Steps: Navigate to Health Preferences from the left navigationClick on Health MetricsSelect the appropriate metric type based on your issueLocate the failure threshold settingIncrease the threshold numberSave the configuration Compliance Compliance max failure thresholds differ from completeness and correctness because compliance data comes through audits, desired state and scripted audits. Use the bottom scorecards to identify the classes with the most compliance failures and examine the specific audits causing issues. In the description of each failed health result CI, you'll see which specific audit is causing the failure along with the audit name. This information is crucial for determining the resolution approach. Method 1: Adjust Audit Conditions (Primary Method) When to use: When audit conditions are too restrictive or need to be updated for current business requirements. Steps: From the left navigation, click on AuditsFind the specific audit that's causing the failures (identified from the CI description)Navigate to the audit templateReview the certification conditions defined in the templateAdjust the conditions to reduce the number of failures while maintaining compliance requirements This is the primary method for resolving compliance failures because it addresses the root cause by modifying the audit criteria that determine compliance status. Method 2: Modify Audit Filter Conditions When to use: When you have too many CIs causing issues and need to reduce the scope of evaluation. Steps: Navigate to the audit template that's causing the failuresOpen the filter condition within the templateReview the number of records it's currently matchingAdd more specific conditions to reduce the number of CIs evaluated for the audit by identifying and adding conditions that would exclude problematic CIsSave the configuration This method reduces overall failures by limiting the scope of CIs that are evaluated during the audit process, focusing on the most relevant items for compliance assessment. Method 3: Increase Failure Threshold When to use: As a temporary measure while working on audit condition adjustments. Steps: Navigate to Health Preferences from the left navigationClick on Health MetricsSelect the appropriate metric type based on your issueLocate the failure threshold settingIncrease the threshold numberSave the configuration Warning: Changes to audit conditions should be carefully considered to maintain compliance requirements Best Practices and Recommendations Start with root cause analysis: Always investigate the specific cause of failures before implementing solutionsPrioritize sustainable fixes: Address data source issues and configuration problems rather than just increasing thresholdsUse threshold increases sparingly: Increasing thresholds should be a temporary measure while working on more permanent solutionsConsider performance impact: Be aware that inclusion rules and high failure thresholds can affect system performance Conclusion Max failure threshold errors can significantly impact health assessment accuracy, but they're manageable with the right approach. Focus on sustainable fixes over threshold increases, and choose methods that align with your business requirements. Only CMDB admins can access and modify failure threshold settings.