Duplicate Agent ID Detection and Remediation Update Set<!-- /*NS Branding Styles*/ --> .ns-kb-css-body-editor-container { p { font-size: 12pt; font-family: Lato; color: #000000; } span { font-size: 12pt; font-family: Lato; color: #000000; } h2 { font-size: 24pt; font-family: Lato; color: black; } h3 { font-size: 18pt; font-family: Lato; color: black; } h4 { font-size: 14pt; font-family: Lato; color: black; } a { font-size: 12pt; font-family: Lato; color: #00718F; } a:hover { font-size: 12pt; color: #024F69; } a:target { font-size: 12pt; color: #032D42; } a:visited { font-size: 12pt; color: #00718f; } ul { font-size: 12pt; font-family: Lato; } li { font-size: 12pt; font-family: Lato; } img { display: block; max-width: ; width: auto; height: auto; } } Duplicate Agent ID Auto Remediation Update Set Problem The agent ID is supposed to be the unique identifier for each agent, but some users may be in a situation where multiple agents are using the same agent ID. In this scenario, there will only be one agent record on the Instance, and all agents using that agent ID will be stacked on top of that one record, continually replacing each other. This causes those agents to be useless and causes the ACC Admin to not have visibility into exactly how many agents they have connected. Goal The goal of the update set is to automatically regenerate the agent IDs for agents that are sharing the same agent ID. A Note on Duplicate Agent ID Detection and ACC-4004 Error Logging ACC-4004 errors can only be logged when a MID server reports an agent to the Instance. If multiple agents are using the same agent ID and are all connected to the same MID server, the MID server will only report one of them. This implies that the duplicate agent ID detection is neither instantaneous nor guaranteed. The more agents sharing the same agent ID, the more time it will take for every single one of those agents to be reported to the Instance and have an ACC-4004 error logged for every agent. This means that when you are looking at the ACC-4004 Active error count, it is not a complete picture of every duplicate agent ID scenario. Expected Behavior and Timeline After applying the update set, eventually the number of active ACC-4004 errors should trend towards zero. The update set will gradually cause agents that are experiencing the duplicate agent ID issue to regenerate their agent IDs. The process will take a while, depending on how many agents are sharing the same agent ID. Furthermore, you may not see the total number of active ACC-4004 errors decline immediately, but over time, the number should decrease. The reason the total number of active errors may not decline right away is because as agents regenerate their agent IDs, other agents sharing the same agent ID are more likely to be detected and have the ACC-4004 error logged. Restrictions This update set should not be applied to Instances if there are any Linux agents that have been configured after installation to not restart on termination. Any agent that has been configured to not restart on termination will be shut down and user intervention will be required to start the agent again. The Update Set removes a restriction on the "Restart Agent" code that prevents restarting agents when the host OS is unknown. The restriction is in place to ensure that ACC Admins do not restart agents on Linux that do not use systemd, since those agents will not automatically be restarted again after the agent process is terminated. Therefore, the Update Set should not be applied in an environment where these kinds of agents are present. Product Solution Duplicate Agent ID Auto Remediation has a productized solution in the ACC-F Scoped App version 5.0.0 and above. The productized solution requires the agent to be on version 5.0.0 or above as well. Once the ACC-F Scoped App has been upgraded to version 5.0.0 or above and the agents have also been upgraded to version 5.0.0 or above, then this Duplicate Agent ID Auto Remediation Update Set can be backed-out. Update Set Files The KB has two attachments. acc-f-delete-agent-id.tar.gz - this is the signed ACC Plugin that is required to perform the automatic duplicate agent ID detectionsys_remote_update_set_73664ebd7cc76e10f8773b3e1412e514.xml - this is the update set xml that should be imported to the Instance. Instructions Please be sure to thoroughly read the information below before completing any of the instructions for enabling the Duplicate Agent ID Auto Remediation. Import the Update Set Download the sys_remote_update_set_73664ebd7cc76e10f8773b3e1412e514.xml file. In the Navigator, type in "Retrieved Update Sets" and click on that module. On the screen, there should be an "Import Update Set from XML" Related Link. Use that Related Link to upload the update set XML file. Preview the Update Set and then Commit it. Attach the signed tar.gz to the ACC Plugin After applying the update step, you will need to attach the signed tar.gz to the ACC Plugin record. Download the attached acc-f-delete-agent-id.tar.gz file.Navigate to "ACC Plugins" (sn_agent_asset)open the "acc-f-delete-agent-id" ACC Plugin, which should have its OS field set to "All".Attach the signed tar.gz "acc-f-delete-agent-id.tar.gz" on the record.Ensure the record has active=true and save the record. Testing To test the update set, we provided a way to only run the auto remediation against a few targeted agent records. Modify the script in the Scheduled Job "ACC Auto Remediate Duplicate Agent ID Update Set" the default script looks like newACCTempRegenerateAgentID().remediateDuplicateAgentIDs(['bogus_agent_sys_id']); populate the array with the sys_ids of the agents you want to test the agent ID regeneration on new ACCTempRegenerateAgentID().remediateDuplicateAgentIDs(['agentSysID1', 'agentSysID2']); Save the recordActivate the job Full Deployment To start running the auto regeneration on all agents that are having the Duplicate Agent ID issue: Modify the script in the Scheduled Job "ACC Auto Remediate Duplicate Agent ID Update Set" Remove the array passed into the remediateDuplicateAgentIDs function like so new ACCTempRegenerateAgentID().remediateDuplicateAgentIDs(); Save the recordActivate the job if its not already active How it Works Detection Duplicate Agent ID Errors are logged during keepalive payload processing on the Instance. The condition for logging a Duplicate Agent ID Error is if the current agent ID being updated has a different name from the one on the existing agent (sn_agent_ci_extended_info) record, then an error is logged with the agent ID, the new agent name, and the new agent IP address. Error detection is not 100% guaranteed. If there are two agents X and Y using the same agent ID, the Instance may not detect that both agents are using the same ID. The detection relies on one keepalive containing the information for agent X and a different keepalive payload at a different time containing the information for agent Y. However, it may be the case that every keepalive happens to contain the information for agent X and the Instance ends up never knowing the agent Y exists. Fixing the Duplicate Agent ID Issue The Duplicate Agent ID errors should be resolved when the agent specified by the identifying agent ID and agent name no longer shares its agent ID with another agent. To fix the issue, all or all but one of the agents in the set of agents sharing the same agent ID must regenerate their agent IDs. Here are a few options on how this can be achieved: A server admin manually goes to the problematic agent hosts, deletes the agent_now_id files, and restarts the agents A server admin uses deployment tools or mass deploys a script that does the same as aboveThe update set provided will attempt to regenerate the agent IDs of agents for which the Instance has logged Duplicate Agent ID ErrorsUpgraded 5.0.0 agents and ACC-F Store Application supports this automated detection and remediation as a more fleshed out, complete feature The auto remediation deletion attempts to delete the agent_now_id file at these default paths: Windows: 'C:\ProgramData\ServiceNow\agent-client-collector\cache\agent_now_id'macOS: '/Library/Application Support/ServiceNow/agent-client-collector/agent_now_id'Linux: '/var/cache/servicenow/agent-client-collector/agent_now_id' If you have configured your agent installation in a way that moves the location of the agent_now_id file, then the script will not be able to delete the file, causing the regeneration to fail. When the script does not delete the file, the agent will not be instructed to restart. Error Resolution This section will briefly explain the conditions under which a Duplicate Agent ID Error is resolved. We want to resolve these errors when the agent with the duplicate agent ID no longer uses that agent ID (i.e. it has been manually or automatically regenerated). Since we cannot easily tell when an agent is no longer using the duplicate agent ID, the resolution logic involves monitoring the Active Error to see if it continues to be logged. If the Instance has not detected that the error has occurred within the last day, and the agent has been up within the last day, then it can resolve the error. Another case is if the Instance has not detected that the error has occurred within the last day and the agent can no longer be found, then it can resolve the error. This is because when an agent regenerates its ID, it will no longer be searchable by the same agent ID and agent name. A Duplicate Agent ID Error for agent ID X with agent name Y will be resolved in the following conditions: the error has not been had its "sys_updated_on" updated to a time in the last day ANDthe agent with ID X and agent name Y cannot be found OR the agent with ID X and agent name Y has had its "last_refreshed" timestamp updated within the last day This resolution logic can lead to false positives, where the error records are resolved even if the duplicate agent ID issue is not truly fixed. However, in these cases, we expect the error to simply be logged again later, once it is detected again. Files The update set includes Script Includes, Scheduled Jobs, Checks, and Check Types to automatically regenerate agent IDs for agents with associated Duplicate Agent ID errors. Scheduled Job (sys_autoscript) - "ACC Auto Remediate Duplicate Agent ID Update Set" This Scheduled Job finds all agents that have an active Duplicate Agent ID error and triggers the "Regenerate Agent ID Update Set" Check on them. For a set of agents sharing the same agent ID, there will only ever be a single agent record representing those agents on the Instance since the agent ID is supposed to be the unique identifier for each agent. Therefore, each time the job runs, it can only regenerate at most 1 agent in each set of duplicate agents. The remediation of all agents will be an iterative process and will be done over time. This job is deactivated by default and when it is activated, it is set to run every 5 minutes by default. Check (sn_agent_check_def) - "Regenerate Agent ID Update Set" This Check deletes the "agent_now_id" file, which stores the agent ID of the agent. The check result is processed on the Instance by the "Regenerate Agent ID Update Set" Check Type. Check Type (sn_agent_check_type) - "Regenerate Agent ID Update Set" This check type is used to process the result of the "Regenerate Agent ID Update Set" Check. It triggers the "Restart Agent" Check to the same agent to restart the agent. Once the agent restarts, it will generate a new agent ID and save it to "agent_now_id" file since the file does not exist since it was previously deleted. ACC Plugin (sn_agent_asset) - acc-f-delete-agent-id This is a signed ACC Plugin for all OSes. It contains the delete_agent_id.rb file, which is executed by the "Regenerate Agent ID" Check. The script deletes the "agent_now_id" file on the agent. The script will delete the agent_now_id files at these default paths: Windows: 'C:\ProgramData\ServiceNow\agent-client-collector\cache\agent_now_id'macOS: '/Library/Application Support/ServiceNow/agent-client-collector/agent_now_id'Linux: '/var/cache/servicenow/agent-client-collector/agent_now_id' If you have configured your agent installation in a way that moves the location of the agent_now_id file, then the script will not be able to delete the file, causing the regeneration to fail. When the script does not delete the file, the agent will not be instructed to restart. Script Include (sys_script_include) - "ACCTempRegenerateAgentID" New Script Include contains functions for querying for agents with Duplicate Agent ID errors and triggering the "Regenerate Agent ID Update Set" Check against them. Called by the "ACC Auto Remediate Duplicate Agent ID Update Set" Scheduled Job. Script Include (sys_script_include) - "DuplicateAgentErrors" This existing Script Include contains the logic for resolving Duplicate Agent ID errors. Modifications within this update set to this Script Include enhance the Duplicate Agent ID Error resolution logic, which are included in PRB1902306 for the ACC-F Scoped App version 5.0.0 release. They are included in this Update Set because the Error Resolution logic is important for the Scheduled Job to know when to stop running the "Regenerate Agent ID Update Set" Check against sets of agents. Script Include (sys_script_include) - "AgentRestartUtil" Modified this existing Script Include to remove the query on the host CI before executing the "Restart Agent" check. The query was meant to not allow restarts for agents on Linux OS without systemd. Skipping the check is required because in our lab testing, we have seen that the host CI reference is usually cleared out from agent records where multiple agents are using the same agent ID. By skipping the OS check, this adds risk to Linux agents that are configured to not restart upon termination. These agents may receive the "Restart Agent" check, shut down, and not start again unless a user manually starts the agent. Pitfalls 1. Duplicate Agent ID Detection is not guaranteed. Two agents with the same agent ID could be connected to the same MID and the MID could always happen to report the first agent in its keepalive payloads, causing the Instance to never know there is a second agent connected with the same agent ID. If there are two agents with the same hostname, the Instance will not be able to tell that there are two of them, since the detection is based on hostname changing for the same agent ID. 2. Agents could keep regenerating the same agent ID. Deterministic agent ID generation depends on the hostname, the serial number, and the MAC address. If these factors are the same across multiple agents, no matter how many times they regenerate their agent IDs, they will always be the same and they will always collide. This can cause the Auto Remediation to never be successful for this set of agents. Manual intervention will be required in this case. 3. Similarly, if an agent happens to regenerate the same ID and always be connected at the time when the job runs, the other agents will never get the chance to regenerate their agent IDs. For example, agent ID "abc" is the deterministically generated agent ID of some "main" agent, but other agents are also using agent ID "abc" due to cloning issues, if the job always runs when the "main" agent is connected, then the "main" agent will keep regenerating the same agent ID "abc" since the generation is deterministic. In this case, the clones will never get a chance to regenerate their agent IDs, even though it is likely that they would get a different agent ID through regeneration. 4. The Scheduled Job will keep targeting the agent record, up to a day after all the agent IDs are regenerated for the set of duplicate agents. This will cause the agent with that ID to keep restarting (every 5 minutes) until the error can be resolved. 5. The Auto Remediation may result in stale agent records. For example, if there are 5 agents sharing the same agent ID, and the job, over time, ends up instructing all agents to regenerate their agent IDs and they all successfully generate a new different agent ID, then eventually there will be 6 agent records, with 5 being non-stale and 1 being stale. This is not a critical issue, and stale agent records will be cleaned up after 30 days of no keepalives.