multi-threaded Shazzam probes can cause duplicate results for the same IP, causing duplicate Classify probes, and Completed overcounting in the Discovery Status


Description

Shazzam probes can cause duplicate results for the same IP being returned in the input. e.g. for the SNMP scanner. That will cause the Classify probe to be launched twice for that IP. The Discovery Status Started value is only incremented 1 time, but as duplicate probes are launched, the Completed count will be incremented twice, causing the schedule to end prematurely, and cause symptoms related to that.

Orlando added Shazzam performance optimization to run multi-threaded:
https://docs.servicenow.com/bundle/orlando-release-notes/page/release-notes/it-operations-management/discovery-rn.html

Steps to Reproduce

- Run a large discovery schedule, perhaps a /15, including SNMP devices.
- Leave MID Server parameter mid.shazzam.threads as the default (5 threads)
- You may see ecc_queue inputs continue to come in from MID Servers after the discovery status has already been set as complete, and Discovery Complete is logged.

The Shazzam input will show 2 results for the same port for the same IP. The scanner/port is only listed once in the shazzam output, showing this is going wrong as the Shazzam probe executes in the MDI Server. Shazzam logging in the MID Server will not show anything unusual.

<result active="true" alive="true" ip_address="10.200.128.100"><scanner name="SNMP" port="161" portprobe="snmp" protocol="udp" result="open" service="snmp"><snmp_version>3</snmp_version><scanner_log><![CDATA[2020-12-14 03:48:50.102 phase 0
2020-12-14 03:48:50.102 sent [MDsCAQMwEQIEYJ5gOAIDAP//BAEEAgEDBBAwDgQAAgEAAgEABAAEAAQAMBEEAAQAoQsCAQACAQACAQAwAA==]
2020-12-14 03:48:51.386 phase 0
2020-12-14 03:48:51.386 received [MGcCAQMwEAIEYJ5gOAICBdwEAQACAQMEHzAdBAyAAAAJAwAIT6mDGsACAQICBAJEttMEAAQABAAwLwQMgAAACQMACE+pgxrABACoHQIBAAIBAAIBADASMBAGCisGAQYDDwEBBABBAins]
]]></scanner_log></scanner><scanner name="SNMP" port="161" portprobe="snmp" protocol="udp" result="open" service="snmp"><snmp_version>3</snmp_version><scanner_log><![CDATA[2020-12-14 03:48:49.102 phase 0
2020-12-14 03:48:49.102 sent [MDsCAQMwEQIEYJ5gOAIDAP//BAEEAgEDBBAwDgQAAgEAAgEABAAEAAQAMBEEAAQAoQsCAQACAQACAQAwAA==]
2020-12-14 03:48:50.102 phase 0
2020-12-14 03:48:50.102 received [MGcCAQMwEAIEYJ5gOAICBdwEAQACAQMEHzAdBAyAAAAJAwAIT6mDGsACAQICBAJEttIEAAQABAAwLwQMgAAACQMACE+pgxrABACoHQIBAAIBAAIBADASMBAGCisGAQYDDwEBBABBAinr]
]]></scanner_log></scanner>...

The first Classifier output is triggered by that Shazzam input. 

The second classifier output is triggered when the first classifier's input record was processed by the classify sensor.  This happened because the classifiers parameter value in the payload was pointing to snmp port probe. So when the sensor runs and notices that the probe failed, it checks the classifiers parameter value in the payload and launches the next classification probe based on priority.

The second classify output doesn't have a shazzamSensorID parameter, which can help spot these. Another way to spot them would be from an ecc queue list grouped by IP, and filtered for classify outputs.

Workaround

This problem is currently under review. You can contact ServiceNow Technical Support or subscribe to this Known Error article by clicking the Subscribe button at the top right of this form to be notified when more information will become available.

It is possible to workaround the issue by setting MID Server Parameter mid.shazzam.threads=1. This will still run the new Shazzam code, but with only 1 parallel thread. A Shazzam probe uses queue-based instead of chunking to manage and distribute its workload. This removes the setup and teardown costs of chunking and the queuing mechanism lends itself to multi-threading and increased throughput.

Documentation for MID Server parameters (Orlando).


Related Problem: PRB1472496