MID Servers should not remain in Down status <!-- .SOKMKBArticle div.margin { padding: 10px 40px 40px 30px; color: #283d40; font-family: Century Gothic, Verdana, Helvetica, Arial, sans-serif; font-size: 10pt; } .SOKMKBArticle div.fed{ background-color: #f5f8fa; border: 1px solid; border-color: #bfbfbf; padding: 10px; } .SOKMKBArticle .FedRestricted{ background-color: #c00000; color: #ffffff; padding: 10px; margin-top: 10px; text-align: center; font-size: 14pt; font-weight: bold; } .SOKMKBArticle .CustRestricted{ background-color: #ff0000; color: #ffffff; padding: 10px; margin-top: 10px; text-align: center; font-size: 14pt; font-weight: bold; } .SOKMKBArticle .SNRestricted{ background-color: #ea700d; color: #ffffff; padding: 10px; margin-top: 10px; text-align: center; font-size: 14pt; font-weight: bold; } .SOKMKBArticle .SNConfidential{ background-color: #ffc000; color: #ffffff; padding: 10px; margin-top: 10px; text-align: center; font-size: 14pt; font-weight: bold; } .SOKMKBArticle .Public{ background-color: #00b050; color: #ffffff; padding: 10px; margin-top: 10px; text-align: center; font-size: 14pt; font-weight: bold; } .SOKMKBArticle table.tocTable { border: 1px solid; border-color: #f2f2f2; background-color: #f2f2f2; padding-top: .6em; padding-bottom: .6em; padding-left: .9em; padding-right: .6em; } .SOKMKBArticle table.noteTable { align: left; border: none; border-color: #81b5a1; background-color: #f2f2f2; width: 100%; border-spacing: 2; font-size: 11px; } .SOKMKBArticle table.internalTable { border-top: 1px solid; border-left: 1px solid; border-color: #81b5a1; width: 100%; border-spacing: 1px; } .SOKMKBArticle .sp td { border-bottom: 1px solid; border-right: 1px solid; border-color: #81b5a1; background-color: #ffffff; height: 20px; padding-top: .5em; padding-bottom: .5em; padding-left: .5em; padding-right: .5em; } .SOKMKBArticle .sphr td { border-right: 1px solid; border-bottom: 1px solid; border-color: #81b5a1; background-color: rgb(245, 245, 245); padding-top: .5em; padding-bottom: .5em; padding-left: .5em; padding-right: .5em; height: 20px; } .SOKMKBArticle .sh td { border-bottom: 1px solid; border-right: 1px solid; border-color: #81b5a1; background-color: #81b5a1; color: #ffffff; height: 20px; padding-top: .5em; padding-bottom: .5em; padding-left: .5em; padding-right: .5em; } .SOKMKBArticle th { padding-top: .5em; padding-bottom: .5em; padding-left: .5em; padding-right: .5em; border-bottom: 1px solid; border-right: 1px solid; border-color: #646464; background: #646464; font-family: Century Gothic, Verdana, Helvetica, Arial, sans-serif; font-size: 10pt; color: white !important; height: 20px; } .SOKMKBArticle td { border-color: #646464; margin: 5px 5px 5px 5px; padding: 5px 5px 5px 5px; font-family: Century Gothic, Verdana, Helvetica, Arial, sans-serif; font-size: 10pt; color: #283d40; } .SOKMKBArticle p { color: #283d40; font-family: Century Gothic, Verdana, Helvetica, Arial, sans-serif; } .SOKMKBArticle li { color: #283d40; font-family: Century Gothic, Verdana, Helvetica, Arial, sans-serif; font-size: 10pt; line-height: 1.5; } .SOKMKBArticle pre { font-family: Courier New; } .SOKMKBArticle div { font-family: Century Gothic, Verdana, Helvetica, Arial, sans-serif; } .SOKMKBArticle hr { border-top-width: 1px; border-top-style: solid; border-top-color: #81b5a1; } .SOKMKBArticle a { color: #81b5a1; } .SOKMKBArticle a.two:link { padding: 15px 45px 15px 45px; margin-top: 20px; color: #ffffff; text-align: center; background-color: #1F8476; border: 1px solid; border-color: #1F8476; } .SOKMKBArticle a.two:visited { padding: 15px 45px 15px 45px; margin-top: 20px; color: #ffffff; text-align: center; background-color: #1F8476; border: 1px solid; border-color: #1F8476; } .SOKMKBArticle a.two:hover { color: #ffffff; background-color: #259b8a; } .SOKMKBArticle .button { padding: 15px 45px 15px 45px; margin-top: 20px; color: #ffffff; text-align: center; background-color: #1F8476; border: 1px solid; border-color: #1F8476; } .SOKMKBArticle .title { font-family: Century Gothic, Verdana, Helvetica, Arial, sans-serif; color: #81b5a1; font-size: 30pt; } .SOKMKBArticle .hd1 { font-family: Century Gothic, Verdana, Helvetica, Arial, sans-serif; color: #283d40; font-size: 20pt; border-bottom: 1px solid; border-bottom-color: #81b5a1; text-decoration: none; } .SOKMKBArticle h1 { font-family: Century Gothic, Verdana, Helvetica, Arial, sans-serif; color: #283d40; font-size: 20pt; font-weight: normal; border-bottom: 1px solid; border-bottom-color: #81b5a1; text-decoration: none; } .SOKMKBArticle .hd2 { font-family: Century Gothic, Verdana, Helvetica, Arial, sans-serif; color: #68a1af; font-weight: bold; font-size: 16pt; text-decoration: none; } .SOKMKBArticle h2 { font-family: Century Gothic, Verdana, Helvetica, Arial, sans-serif; color: #68a1af; font-weight: bold; font-size: 16pt; font-weight: normal; text-decoration: none; } .SOKMKBArticle .hd3 { font-family: Century Gothic, Verdana, Helvetica, Arial, sans-serif; color: #283d40; font-weight: normal; font-size: 14pt; text-decoration: none; } .SOKMKBArticle h3 { font-family: Century Gothic, Verdana, Helvetica, Arial, sans-serif; color: #283d40; font-weight: normal; font-size: 14pt; text-decoration: none; } .SOKMKBArticle .hd4 { font-family: Century Gothic, Verdana, Helvetica, Arial, sans-serif; color: #283d40; font-weight: normal; font-size: 12pt; text-decoration: none; } .SOKMKBArticle h4 { font-family: Century Gothic, Verdana, Helvetica, Arial, sans-serif; color: #283d40; font-weight: normal; font-size: 12pt; text-decoration: none; } .SOKMKBArticle .hd5 { font-family: Century Gothic, Verdana, Helvetica, Arial, sans-serif; color: #283d40; font-weight: bold; font-size: 10pt; text-decoration: bold; } .SOKMKBArticle h5 { font-family: Century Gothic, Verdana, Helvetica, Arial, sans-serif; color: #283d40; font-weight: bold; font-size: 10pt; text-decoration: bold; } .SOKMKBArticle .hd6 { font-family: Century Gothic, Verdana, Helvetica, Arial, sans-serif; color: #283d40; font-weight: normal; font-size: 10pt; text-decoration: underline; } .SOKMKBArticle h6 { font-family: Century Gothic, Verdana, Helvetica, Arial, sans-serif; color: #283d40; font-weight: normal; font-size: 10pt; text-decoration: underline; } .SOKMKBArticle details { font-size: 10pt; } .SOKMKBArticle details[open] summary ~ * { animation: sweep .5s; margin-top: 0; padding-top: 10px; } @keyframes sweep { 0% {opacity: 0; margin-top: -10px} 100% {opacity: 1; margin-top: 0px} } .SOKMKBArticle summary { cursor: pointer; outline: none; margin-bottom: 3px; } .SOKMKBArticle .summary { background-color: #81b5a1; font-size: 10px; color: white; cursor: pointer; padding: 5px; width: 100%; border: none; text-align: left; outline: none; vertical-align: top; } --> Product Success Playbook MID Servers should not remain in Down state A step-by-step guide to analyze and remediate ITOM Health & Visibility issue Table of Contents Summary Goal of this Playbook Audience Problem Overview Executive Summary How this playbook can help you achieve business goals How this playbook is structured Problem Analysis Upstream Causes Downstream Consequences Impact on Your Business Engagement Questions Remediation Plays Summary Play 1: Review your MID Servers Play 2: Analyze your MID Server records Play 3: Fix Play - Remove & Reinstall Data Governance Summary Goal of this Playbook The goal of this playbook is to help you identify MID Servers that are Down (offline) and remediate any issues with them to return them to the Up (online) state. Author John (Johnny) WalkerDate 11/01/2021Addresses HSD # HSD0009876Applicable ServiceNow Releases AnyTime Required Approximately 1 to 2 hours (depending on your environment) Audience ServiceNow Administrator or Discovery AdministratorWindows / Linux Administrator supporting your MID Server installations Problem Overview The MID Servers are a necessary part of your ITOM applications on the ServiceNow platform. They perform work within your network and report back to your ServiceNow instance. When these servers are in the Down (offline) state, they are unable to perform the activities assigned to them. Executive Summary How this playbook can help you achieve business goals MID Servers are the gateway to enterprise network and provide visibility into IT infrastructure. Having working MID Servers is essential for the various applications they support. If MID Servers are down, there is potential of losing visibility to all or part of your network infrastructure, which can result in delays, outages and disruptions to business processes. This playbook will help you to identify the MID Servers you have in the Down status and return them to the fully operational (Up) status. How this playbook is structured This playbook will guide you through several plays: An analysis play designed to help you confirm our findingsRemediation play(s) designed to help you resolve or remediate the findingsGovernance is provided to help you keep your system healthy in the future Problem Analysis Upstream Causes Windows patching can lead to java / OS networking stack issues on the host machineIntrusion protection and Antivirus solutions can interfere with MID server performance, processing, integrity and stabilityFirewall, Routing and Proxy changes within your network can interrupt access between the MID server and your SN instanceDuring upgrades, a MID server could fail to update and restart completelyDecommissioned MID server records (ecc_agent) could remain in your SN instance after having been replaced or retiredThe MID Server user could have been deactivated, had the password rotated or somehow lost the mid.server roleDuring development and testing of new features or patterns, the Java application could hang or crashAn administrator might have stopped the Java process for maintenance or testing and it simply needs to be restarted Downstream Consequences Data Consequence Your CMDB records could be getting stale for lack of Discovery coverageYour ITOM Health integrations might not receive critical infrastructure eventsIntegrationHub data ingestion might not be taking place in a timely fashion Operation Consequence Discovery Schedules could cancel for lack of resources to complete them on timeIntegrationHub business processes could be failing to occur when expectedDiscovery / Platform Administrators might be learning to scroll past / ignore Down MID Servers, leading them to overlook a critically Down recordOutages can be taking longer than they need to due to slower processing of ITOM data App Consequence SecOps applications can fail to identify risks in a timely mannerDiscovery Schedules can take longer than they need to, pushing them outside of the off-peak times they are supposed to run withinITOM Visibility (Service Mapping) could fail depending on available MID Servers for IP RangesITOM Health integrations relying upon data collectors could be failing entirelyIntegrationHub activities and flows could be fail entirely for lack of a backupWeb Services integrations that utilize a MID Server could be impacted by Down MID Servers Impact on Your Business MID Servers being down (offline) could impact your business outcomes depending on which SN applications these MID Servers support. Some examples follow. Increased MTTR ITOM Health depends on MID Servers for event data ingestion CMDB Trust ITOM Visibility depends on MID Servers to perform Discovery Schedules Process Automation Data Sources / Import Sets could be relying on your MID ServersIntegrationHub / ETL transforms could be reliant upon MID Servers Engagement Questions: Consider the answers to these questions: Who manages your MID Servers?Is there a plan in place to handle issues with MID Servers?Is the purpose of each of your MID Servers well understood? Remediation Plays SummaryThe table below lists and summarizes each of the remediation plays in the playbook. Details are included later. Play Name Review your data What this play is about How to view Down MID Servers Required tasks Navigate to view MID Servers Analyze Play What this play is about Required tasks Fix Plays What this play is about Determine that a MID Server is no longer needed Required tasks Remove orphaned MID Servers Data Governance What this play is about Ensure that when MID Servers are Down for any length of time an investigation is began Required tasks Create a Flow to open an Incident for Down MID Servers Play 1 - Review your MID Servers What this Play is about The list of MID Servers is readily available within your instances. This play reminds you where that data is. Required tasks Navigate to MID Servers > ServersFill in the Condition Builder as follows and click Run: FieldOperatorValueStatusisDownStoppedRelative > before15 > Minutes > ago Example: (Optional) You can also navigate to MID Servers > Dashboard and view this information and more.Example:Review the records listed here. Take note of the Last refreshed and Stopped fields. These can indicate how long ago communication was received from your MID Servers.Continue to the remediation play to work on these records. Play 2 - Analyze your MID Server records What this Play is about This play is some common sense advice on reviewing the records from the review play above. Notes When reviewing the records in your MID Server (ecc_agent) table, you may or may not immediately recognize each of these records.MID Servers are created when they are installed on host machine and configured with a login credential which has been granted the mid_server role.MID Server status changes to Down when they are restarting - due to upgrade or scheduled maintenance on the host machine, or manually for troubleshooting.MID Server status can also change to Down when communication is lost between the MID (java) process and your ServiceNow instance for a duration of time exceeding ~5 mins.In some circumstances the java process can appear to be running but communication to your instance may have stopped. There can be many reasons for this. It happens from time to time.If the MID Server has crashed (the process is no longer running) or the MID can no longer communicate to the SN instance, you must restart the MID from the host machine. Required tasks If you are immediately aware that a particular MID Server (ecc_agent) record is no longer valid, simply select it in the list and delete it. These records serve no purpose if they are Down and if they no longer exist, they will never show Up again.If you are uncertain about a MID Server, you should either use Remote Desktop or an SSH connection to the host machine to start / restart the MID process on that machine. For assistance with this you might need to contact the support team that has access to your MID hosts. This process differs per customer and is up to you to navigate.If restarting the MID Server process is successful then you can continue on to starting / restarting any other MID Servers which are in the Down status. Return to the List View in Play 1 to confirm that all MID Servers are showing as Up.If when trying to restart the MID process you encounter difficulties you should review the MID Server Landing page collection of KB articles. Many of the common issues with MID Servers are already well documented there. If none of the articles cover the situation you are experiencing, you can also search the Support knowledge base with the specific errors you might encounter in the logs.If you still cannot start the MID process you have two main options: Option ProsConsRequirementsOpen a support case for assistanceWe are always happy to helpCould take a day or so to complete several rounds of troubleshooting. Might require a reinstall anyway.Login to the Support (formerly HI) siteReinstall the MID serverEliminates many issues. Might be required as part of a support ticket.Requires a small amount of system engineering. Not much more than installation of a MID server required in the 1st place.Administrative access to the MID host such as Remote Desktop or SSH, and the willingness to do it! Since opening a support case is somewhat self explanatory, continue to the next section to learn how to perform an in place replacement of the MID Server on windows. Play 3 - Fix Play - Remove & Reinstall MID Server What this Play is about How to replace the MID Server with a fresh downloaded version of the software in the event a MID process refuses to start. The steps below are also located in the ServiceNow Documentation here. Required tasks (optional) To uninstall a manually installed Windows MID Server, navigate to the directory where the agent is located. From the agent/bin directory, run the file UninstallMID-NT.bat - we recommend performing this within a command prompt window to view the output.(optional) To remove a Windows MID Server installed using the guided installation, navigate to Control Panel > Programs > Programs and Features > Uninstall a program and remove the program matching the MID Server's name.Once the server is uninstalled you can either perform a manual reinstallation or guided installation using the steps here.If you have removed and reinstalled the MID Server agent on your host machine but are unable to get the new MID Server to connect to your SN instance, refer to the MID Server Landing page and the section titled MID Server Down & Connectivity. There are quite a few articles here that can help you with most common situations.If your efforts to reinstall the MID Server and connect to your instance are unsuccessful please open a Support Case for assistance.Once you have installed a new MID Server using one of the methods above it should safe to Delete the MID Server (ecc_agent) record which is still showing in status Down. It may be a good idea to take note of which MID Clusters your MID was previously a part of, and be sure to configure this new MID to take the place of the previous one. Instructions on MID Server Clusters is located here. Data Governance What this Play is about The best means to ensure that your MID Servers are available to the applications and processes they support is to ensure that the fact a MID Server has gone Down is presented to support persons who can investigate and remediate the situation.You may already be aware there is a MID Server Down Notification active in your instance at baseline. It may be a good idea to review which Users and Groups this notification is configured to send to. Required tasks Navigate to System Notification > Email > NotificationsSearch for the MID Server Down Notification and open itReview the "Who will receive" tab to ensure there is someone or some group configured to receive this notification. It may be a good idea to also to ensure that the notification is Active.Example:You may decide that a more substantial record demanding action for your Down MID Server is appropriate. In that case, you may decide to put a Flow in place to generate an Incident when a MID Server is Down and remains down for over 15mins. Since the MID Server can be taken down during upgrade or patching to the host machine, it's usually a good idea to allow a small period of time before opening an incident in relation to a Down MID Server.The following steps are completely optional based on the specific needs of your organization.Navigate to Process Automation > Flow DesignerClick "New" and select "Flow" from the list presentedConfigure the Flow properties fields as shown:Example:You may be presented with a welcome message offering you a tour. You can take the tour or not, but we will continue as if you selected to Skip tour.Select to "Add a trigger" and complete the form as shown and then select "Done".Example:Add an Action of type "Wait For Condition". We want to wait for 15 minutes (optionally longer if your needs are different) so it is important to select "Enable timeout" and set 15 minutes in the Duration field. Select "Done" when the form appears as shown.Example:We want to check the MID Server Status again after the timeout to ensure we don't open unnecessary incident records. So click "Add an action", select "Flow Logic" and then choose "If". Complete the step as shown.Example: Click the plus sign next to "then" below this condition to add the create task action.Complete the Create Task form as shown and click "Done".Example:Save and then Activate this Flow.Test the Flow by stopping a MID Server using the UI Action on the form view. Be sure that the MID Server you are stopping isn't currently being used. Congratulations You have completed this Product Success Playbook.