Troubleshooting guide for Service Mapping ML Connection Suggestions


Troubleshooting Guide for ML Connection Suggestions

Troubleshooting for ML connection suggestions. 1

Overview and architecture. 2

Connection Suggestions prerequisites. 3

Problem: Missing AFP due to stuck job/ failed trainer / trainer availability. 4

Solution: Check clustering solution. 4

Problem: Missing connection suggestion. 4

Solution: validate existence of relevant traffic info. 4

Solution: Check sa_ml_process_to_process is populated and ready for ML training. 4

Solution: Check there are no missing Source/Target AFP in sa_ml_process_to_process. 5

Problem: Confidence level field in suggestion records is missing or showing “N/A”. 5

Solution: Validate traffic connection was collected. 5

Solution: Check there are no missing Source/Target AFP in sa_ml_process_to_process. 5

Problem: Missing Confidence (or Missing AFP) due to Predictive Intelligence solution is stuck in “waiting for training” or no movement in progress percentage   6

Solution: Cancel solution pending jobs. 6

Solution: Full cleanup if job cancel didn’t resolve the training issue. 6

Recovery procedure for SM ML based feature after trainer availability failure. 7

Post validation few hours after running sys_autoscript jobs. 13


 

Overview and architecture

 

 

Connection suggestions feature enables new approach to traffic-based service mapping using new ML algorithm and new UI. The new algorithm classifies the connections to several confidence level of relation to application service based on the source and target AFPs of the connection.​

 

The feature is enabled using system properties:

glide.app.p2p.conn.map.enabled=true

process.clustering.appfingerprint.enabled=true

(OOTB the above sys attributes are not present, default is true)

 

Once feature is enabled traffic-based connections are not added automatically to the maps. The user can see the list of connection suggestions with their classifications and decide which connections he wants to add it to the map.​

 

Connection suggestion ML model depends on trained model of Application Fingerprint.

Both models rely on successful Horizontal discovery.

Connection suggestions rely on top-down discovery.

 

 

 

Connection Suggestions prerequisites

 

1.     Validate PI plugin is installed as per the below:

2.    

3.     Validate App fingerprints table cmdb_process_groups is populated.
if not follow guidelines in this section before proceeding: App Fingerprints prerequisites

4.     Check sys property " sa_ml.connection_suggestions.active" exists and set to "true", or does not exist

 

 

 

Problem: Missing AFP due to stuck job/ failed trainer / trainer availability

Solution: Check clustering solution

  1. Review ML clustering solution troubleshooting:  Check solution has completed


Problem: Missing connection suggestion

Solution: validate existence of relevant traffic info

  1. Review the table cmdb_tcp and filter   to an IP on the same port, as per the below example:

 

 

  1. As per the example, green is what we have in order to establish connection: both side of connections to an IP on the same port
    red is what is missing: process id for both connection sides
  2. If other traffic related issues pop-up please use the following traffic troubleshotting guide to make sure the relevant traffic info is collected:
    https://support.servicenow.com/kb?id=kb_article_view&sysparm_article=KB0832798

 

 

Solution: Check sa_ml_process_to_process is populated and ready for ML training

 

  1. Table sa_ml_process_to_process is gathering connection between 2 processes using TCP data collected on discovery​
  2. Depends on tables: cmdb_tcp, cmdb_process_groups​
  3. Review the table sa_ml_process_to_process for existing records
  4. “Is Complete” column –indicates if the row is ready to be sent to Trainer. False means that one of the AFP is still missing
  5. Check that all records have “Is Complete”=true

 

Solution: Check there are no missing Source/Target AFP in sa_ml_process_to_process

 

  1. Incase of missing AFPs in Source/Target AFP columns need to check if the process ID has related AFP by running the below background script:

        var afpProviderJs = new AFPProviderJS();

        var afpSolutionVersion = afpProviderJs.getAFPSolutionVersion();

        var sourceGroupID = afpProviderJs.getAFPIDForProcessVersion(/*ProcessID*/, afpSolutionVersion);

gs.log(“sourceGroupID=” + sourceGroupID);

  1. If the script shows null as process ID it mean solution need to be restarted following this procedure: Recovery procedure for SM ML based feature after trainer availability failure

 

 

Problem: Confidence level field in suggestion records is missing or showing “N/A”

Solution: Validate traffic connection was collected

           

  1. Review the table cmdb_tcp and filter to the relevant IPs/host that are known to be part of the relevant service
  2. If traffic info is missing, review the following troubleshooting guide for further steps to validate traffic is collected:
    https://support.servicenow.com/kb?id=kb_article_view&sysparm_article=KB0832798

 

 

Solution: Check there are no missing Source/Target AFP in sa_ml_process_to_process

 

  1. Incase of missing AFPs in Source/Target AFP columns need to check if the process ID has related AFP by running the below background script:

        var afpProviderJs = new AFPProviderJS();

        var afpSolutionVersion = afpProviderJs.getAFPSolutionVersion();

        var sourceGroupID = afpProviderJs.getAFPIDForProcessVersion(/*ProcessID*/, afpSolutionVersion);

gs.log(“sourceGroupID=” + sourceGroupID);

  1. If the script shows null as process ID it mean solution need to be restarted following this procedure: Recovery procedure for SM ML based feature after trainer availability failure

 

 

 

Problem: Missing Confidence (or Missing AFP) due to Predictive Intelligence solution is stuck in “waiting for training” or no movement in progress percentage

Solution: Cancel solution pending jobs 

1.     Check table: ml_connection_analysis_result​ for solution state and progress

 

2.     For *_application suggestion solution:

2.1.  Navigate to system definition->background script

2.2.  Run the following script in global scope:

 

 

var mlSolution = sn_ml.ClusteringSolutionStore.get(<clustering solution name > (i.e. ml_x_scb_global_global_application_suggestion_1");
var mlSolutionVersion = mlSolution.getActiveVersion();
mlSolutionVersion.cancelUpdateJob(); // cancel update job

 

3.     For *_proc_conn solution


    1. Navigate to system definition->background script
    2. Run the following script in global scope:

var mlSolution = sn_ml.DataAnalysisStore.get(<data analysis solution name> (i.e. ml_x_snc_global_global_proc_conn_1);
var mlSolutionVersion = mlSolution.getActiveVersion();
mlSolutionVersion.cancelUpdateJob(); // cancel update job

 

 

Solution: Full cleanup if job cancel didn’t resolve the training issue

 

1.     Perform: Recovery procedure for SM ML based feature after trainer availability failure

 

 

 

Recovery procedure for SM ML based feature after trainer availability failure

 

 

1.     Delete App Fingerprints solution


    1. Enter “ml_capability_definition_base.list” from the main navigation menu
    2. Filter solution name by: “*global_application_suggestion”
    3. Select all appearances and choose “delete” from the action menu
    4. Delete the flowing entries from sa_hash:

 

 




 

 

 

 

2.     Delete Connection Suggestions solution


    1. Enter “ml_capability_definition_base.list” from main navigation menu
    2. Filter solution name by: “*global_proc_conn”
    3. Select all appearances and choose “delete” from the action menu
    4. Delete the flowing entries from sa_hash:

 

 

 

  1.  
  2. Delete App Fingerprints process groups and connection suggestions tables
    1. Navigate to “system definition->tables” from main navigation menu
    2. Enter cmdb_process_groups tables
    3. Delete all records as per below:

 

 

4.4.                  Do the same for sa_ml_process_to_process

4.5.                  Do the same for sa_ml_connection_suggestions

4.6.                  Do the same for ml_connection_analysis_result

4.7.                  Do the same for P2P_TCP_table_date_index

 

 

 

 

 

5.     App Fingerprints prerequisites

 

5.1.                  Validate PI plugin is installed as per the below:

5.2.                  Check sys property "process.clustering.appfingerprint.enabled" exists and equals "true", or does not exist

 

 

 

 

6.     Run App Fingerprints schedule job

6.1.                  Enter “sysauto_script.list” from the main navigation menu

6.2.                  Filter name to: “Applications suggestion - ITOM Autodiscovery”

6.3.                  Press on execute now as per below:

6.4.                  Check solution has completed: go to  ml_solution.list and filter solution name by “*global_applicaiton_suggesitons”:

 

6.5.                  Check cmdb_process_groups table has records

 

 

 

 

 

 

 

 

7.     Run Connection Suggestion schedule job

7.1.                  Enter “sysauto_script.list” from the main navigation menu

7.2.                  Filter name by: “Service Mapping - Traffic Process to Process”

7.3.                  Press on execute now as per below:

 

 

 

 

Post validation few hours after running sys_autoscript jobs

1.     Check that the data analysis solution under ml_solution exists and in complete state​

2.     Check that solution connected to trainer and has  finalize successfully – recent results record should exist in table: ml_connection_analysis_result​

 

 

3.     Check table sa_ml_process_to_process​ has records
Minimum rows for showing results are 10K, so there should be more then 10K rows with is_complete = true in the table​

 

4.     Check table ml_data_analysis_sol_update has records

5.     The solution that will be created can be found in ml_solution table and will contain the text proc_conn​

6.     After initial solution was created the updates will be shown in table ml_data_analysis_sol_update​