MID Servers and Clones


Details

This KB article has some general info on how MID Servers are affected by Instance Clones, and some of the things you need to keep in mind.

Are MID Servers Cloned?

No. When a production instance is cloned over a sub-production instance, the production instance's MID Servers are not duplicated or copied, and the sub-production instance will not end up with the same MID Servers as production.

MID Server installations will always point to the same single instance url set in the config.xml file of that installation, regardless of whether that instance is now actually a copy of another instance.

You will need at least one MID Server installation for each instance. You can install multiple MID Servers on the same host server to accomplish this, so long as they have different MID Server names.

In order to be able to effectively test MID Server features in sub-production instances, it is recommended to mirror the production MID Server configurations and install the MID Servers within the same network environments.

The MID Server's login credential

After a clone has overwritten a target instance with a new configuration, data and even a different version, existing MID Servers for that target instance will still be trying to connect to it using the same username and password that was originally configured for the MID Server.

The user table in the instance will now be a copy of the source instance's user table, and so the MID Server may be trying to log in with a user that now doesn't exist in the instance or may have a different password. That behavior is controlled by the "Preserve users and related tables" checkbox in the options when requesting a clone.

To avoid this possibility, the recommendation is to use the same username and password for MID Servers on all instances. Different MID Servers for different functions can have different passwords, but the MID Servers for a particular function (e.g. Discovery of a particular region/datacenter) would use the same username/password.

For more information, see: Active MID Server post-cloning credential issues

Known issue:

The Clone History log in the source instance, and the comments in the Change Request record in the ServiceNow Support portal, may show this text which indicates a problem that means we will probably not have preserved the MID Server login user:

Preserve sys_user related table feature is turned off due to PRB1391196. If you need to preserve the users and related tables,
you can follow the following KB article: https://hi.service-now.com/kb_view.do?sysparm_article=KB0817569

This will probably cause the MID Server to be Down after the clone, and this error along the top of the MID Server form. The User mentioned may not actually exist:

Error MessageLogged in user 'XXX' is missing the following roles: mid_server. Add the missing roles to this user. More Info 

To resolve this, create a new user, with the missing user's user_id, and set the password the same as had been used when the MDI Servers were installed. Don' t forget to also add the mid_server role.

The MID Server's Mutual Authentication Certificate

For Quebec and later instances, the MID Server can optionally authenticate with the instance using Mutual Authentication (mTLS) instead of a username+password. The certificate imported into the MID Server install needs to match the certificate imported to the instance table "User Client Certificates" [sys_user_certificate] for the MID Server to authenticate with the instance.

Those records and their attachments need to be preserved and excluded in clones if different users and certificates are being used between the clone source and target.

The Instance Credentials table

Prior to the New York release, clones would replace the Credentials table [discovery_credentials] with the records from the source of the clone. Since New York, this table is both Excluded and Preserved in the clone, so that the sub-production instance keeps the Credential records it used to have before the clone.

This table is used by Discovery/Service Mapping probes from the MID Server, but also some other recent integrations features that may not even use the MID Server.

It is possible to configure this behavior yourself, if you do not want to use the default settings. For more information, see:
Exclude a table from cloning
Data preservation on cloning target instances

Note: These excludes/preservers, including the out-of-the-box ones, only take effect if the "Exclude tables specified in Exclusion List" and "Exclude audit and log data" options are checked when requesting the clone.

There is a problem with Data Preserver and Exclude code in the clone engine, where it doesn't automatically preserve all child tables of extended tables. This breaks the OOTB Exclude setting in New York for discovery_credentials, and also custom settings, leaving corrupt credential records on the target. For more information and a workaround, see:
KB0717208/PRB1305469 Excluding table-per-class (TPC) extended tables from a clone can cause orphaned Discovery Credentials with the 'Record not found' error when trying to open them
That was fixed, then broken again by PRB1403259. A more recent problem PRB1391898 causes the same corruption, related to the preservers rather than excludes.

If the source and target instance have different plugins installed, then you can also get corrupt credential records. For example:

To avoid breaking those records, you will need to install the same plugin(s) on production before the clone, or delete those credential records before the clone. If you have broken records because of this cause, installing the plugin after won't help, because you will still be missing the sys_id from the child table, and the record will remain corrupt. It will probably need deleting using a Table Cleanup [sys_auto_flush] job, as that deletes at a lower level that lists and forms. That method is describes in:
KB0723549 How to repair Discovery Credentials not accessible after clone

As of the end of 2021, most TPC extended table related clone engine problems are solve, except this one:

The MID Server's own records

A MID Server installation can only be used by one instance. Production MID Servers will remain pointing to production, and so any records for those MID Servers don't need to be copied, and so are excluded from the clone. MID Servers for the sub-production instance will still be pointing to that, so records for those MID Servers need preserving.

This includes related records such as Properties, Parameters, Clusters, Capabilities, IP Ranges, Applications, Issues and records for the MID Server Dashboard.

Warning: Prior to Madrid, there was a known problem which means this isn't done properly. Upgraded instances may also not get the fixed exclude/preserve lists, and may need fixing manually. For more information, see:
PRB1287729 / KB0688954 - The MID Server Exclude/Preserve Clone settings cause MID Server issues on the target

MID Server Properties is a difficult table, because some are for all MID Servers, but others are for specific MID Servers. A clone will delete and overwrite those records. Any specific MID Server with non-default MID Server Properties may need manually re-configuring after a clone (which is PRB1388744).

There are some ecc_agent... tables which should never be excluded in clones, because they contain Code, which is also version specific. Excluding these can cause missing or mismatched code after a clone, and the MID Server won't work as expected.

Note: These excludes/preservers, including the out-of-the-box ones, only take effect if the "Exclude tables specified in Exclusion List" and "Exclude audit and log data" options are checked when requesting the clone.

Records and settings that reference MID Servers

In settings of all features that use MID Servers, there are likely to be references to specific MID server Sys IDs. e.g.

A lot of those settings will have been copied over with the clone, but the MID Servers won't exist on the target instance, and so many of those features will now not work or do something unexpected and need re-configuring to use the target instance's MID Servers.

Post-Clone Cleanup Scripts can be used to fix or prevent a lot of these settings after a clone. e.g.
KB0789119 A Post-Clone script to Deactivate and Cancel Discovery Schedules

It is possible to configure MID Servers to have the same sys_id as the production MID Servers. If the installations are on the same host server, the names need to be different, but the sys_ids can be the same. This process and the pros and cons are explained in more detail in:
KB0719301 How to manually Clone a MID Server so that you don't have to reconfigure all integrations features to use a different MID Server on the sub-prod instance after an Instance Clone

Warning: Prior to New York, Discovery Schedules set to specific production MID Servers may unexpectedly run after clones:
PRB1311068 / KB0788922 After a Clone, a Discovery Schedule for a Specific MID Server, which does not exist in the target instance and so is now a broken reference, will still run in a random MID Server

Attachments

Often all record attachments, or just large attachments, are excluded in clones. These are not all data, and can be part of out-of-box code too. These should be copied across in clones, but often are not and the symptoms of this may not be obvious.

Clones that change the Version of the instance

After a clone, existing sub-prod MID Servers will suddenly find themselves communicating with a 'different' instance compared to the previous incarnation of that instance. It will still try to connect, and will then try to upgrade/downgrade to match the version the instance is now.

Upgrades should work, and plenty of regression testing is done every release to make sure it does.

However it is possible to have a situation when the instance ends up on an earlier version (e.g. a sub-prod instance is used for an upgrade test, and then cloned over again), and then the MID Server may have problems. A MID Server running a future version compared to the instance isn't going to be able to communicate with the instance properly using API and encryption/validation features that are only present in the future instance version.

A downgrade from a version that uses Signed ZIP files to one that doesn't will require a MID Server restart before the downgrade will work:
MID Server fails to upgrade from a Signed ZIP file version to a Non-Signed version, because instanceinfo is cached. e.g. MP10 HF1b to NP8, or NP9 to OP2

Quebec uses a new MID Server Unified Keystore, which is replaced as part of an upgrade to Quebec, but is not put back to the old method when downgrading.

In a Downgrade situation, especially between major versions, you are going to need to be prepared to manually re-install the MID Server. For more details of that repair process see:
KB0713557 How to manually Upgrade and/or Restore a MID server after a failed Upgrade

A rekey may be all that is required, if you see "unable to decrypt" messages in the agent log or issues records. This seems ot work after Rome to Quebec downgrades.

Agent Client Connector

Each Agent Client Collector installation communicates to a MID Web Server Extension running on a specific MID Server., and therefore specific instance name  The ACC Websocket Endpoint and MID Web Server Contexts need to still be there, with the same ports and credentials setting after a clone, or the Agent Client Collectors will not be able to communicate with the MID Server and instance any more.

By version 2.1.41 (cGTM/202009) these exclude/preserve settings were included (PRB1408738).

Also see Attachments section above for Agent Client Collector Plugin [sn_agent_asset] attachments issues.

Discover and Service Mapping Patterns

Discovery Patterns, including Oracle and Amazon EC2, depend on some synched files in the Uploaded File [sa_uploaded_file] table.

If some attachments (sys_attachment/sys_attachment_doc) are excluded and not preserved, some Discovery Patterns will not longer work. That can be resolved by exporting the sa_uploaded_file records from the clone source as XML, which will automatically include the attachments in the export, and import into the clone target to repair the missing attachments. 

For searchability, the attachment filenames and sa_uploaded_file sys_ids are:

getEC2Detailsv3.ps1 fef7f75e1bfcd0107e02fc078b4bcb16
Oracle_PDBS.sql ee37ed860f02001003bea6f6bc767e66
Oracle_CDB.sql 95c66d860f02001003bea6f6bc767e4a
options_packs_usage_statistics.sql 06f603f2db0500104d2f9eb5db9619e7
Oracle_instance_size.sql bc515611dbdff34003a05561ca961992

More info on file sync:
KB0852276 : How MID Server File Synchronisation works, to help when Troubleshooting

Health Log Analytics

MID Server Extensions for the streaming of logs via the MID Server to the Occultus instance will be affected by Clones, and the following KB is the main source of information on this:
KB0867530 System Clone - Health Log Analytics | HLA Quebec

Full Clones

All the settings related to Clone Exclude and Preserve settings are ignored if the "Exclude tables specified in Exclusion List" and "Exclude audit and log data" options are un-checked when requesting the clone. We call doing that a Full Clone, and you can expect to have to set up all MID Server related things again after the clone.

A problem ticket does exist for this behaviour, even though from the point of view of the full clone this is 'working as expected', because MID Servers will still be broken. Support Cases caused by this should still be linked to it even though no change in behaviour is currently planned:
PRB1251951 / KB0677995 Clone fails to exclude tables like ecc_agent when unchecking the boxes "Exclude audit and log data" and "Exclude large attachment data"