MID Server SNMP troubleshootingDescriptionMID servers can be used to run SNMP queries to target CIs. The MID server SNMP can be used by multiple applications including discovery, service mapping, and orchestration. This article aims at covering some of the tools which can be used to troubleshoot MID server SNMP issues. Troubleshooting Tools Very often, the issue with an application using SNMP (discovery, orchestration, etc) is that the SNMP data is not returned completely or at all. If the data is returned, then investigation would need to focus on a different area of the application such as script include or business rule. Therefore, a good starting point to investigate the an issue for an application which depends on SNMP is to check if the data is collected successfully. Two of the main reasons an SNMP query may not collect the desired data are: Invalid SNMP credentialSNMP query timeout Some of the tools which can be used to confirm whether the data is returned or not are: MID Server logsSNMP walk toolsWireshark Credential Test To test an SNMP credential navigate to "Discovery > Credentials"Select the SNMP credential used for the SNMP discoveryClick in the UI Link "Test Credential"Fill in "Target" and "MID Server"Click "OK" Note: Shazzam probe and SNMP credential query iso.org.dod.internet.mgmt.mib-2.system.sysDescr/1.3.6.1.2.1.1.1. However, an SNMP probes request OID mgmt.mib-2.system.sysObjectID/1.3.6.1.2.1.1.2, before running the actual SNMP request, to determine if the credential is valid. A target device must answer to both 1.3.6.1.2.1.1.1 and 1.3.6.1.2.1.1.2 to be discovered successfully. Therefore, when troubleshooting SNMP discovery issues test both OIDs. Examples Review MID Server logs To get more detailed information on the MID server logs for SNMP queries add parameter mid.log.level = debug. Add a MID Server parameter Reproduce issue and review MID server log files. Review the following two docs on how to collect the MID server files: Monitor the MID ServerManage ECC Queue content for a MID Server For detailed SNMP logs: Go to "agent\conf" folderOpen the wrapper-override fileAdd the following line to the additional Java parameters: wrapper.java.additional.201=-Dsnmp4j.LogFactory=com.service_now.mid.extension.trap.Snmp4j2DiscoLogFactory Restart MID ServerReproduce issue Two example logs are shown next. The first example is from a successful query where all the OIDs for an SNMP - Classify probe were returned, while the second is from a partially successful query where only a fraction of the OIDs were returned. The first the classify probe was run with the default timeout of 1500 ms. The second probe had the timeout set to 10 ms to simulate a timeout. Example log showing successful SNMP query: 08/29/18 11:32:52 (911) Worker-Interactive:SNMP Worker starting: SNMP source: 08/29/18 11:32:52 (926) Worker-Interactive:SNMP DEBUG: Timeout: 1500, Retries: 2 08/29/18 11:32:53 (004) Worker-Interactive:SNMP DEBUG: Using GETBULK 08/29/18 11:32:53 (004) Worker-Interactive:SNMP DEBUG: First attempt of getTable on target: /161, OIDs: [1.3.6.1.2.1.4.20.1.1, 1.3.6.1.2.1.4.20.1.2, 1.3.6.1.2.1.4.20.1.3], max rows: 10 08/29/18 11:32:53 (051) Worker-Interactive:SNMP DEBUG: First attempt of getTable on target: /161, OIDs: [1.3.6.1.4.1.9.9.46.1.3.1.1.3], max rows: 10 08/29/18 11:32:53 (051) Worker-Interactive:SNMP DEBUG: First attempt of getTable on target: /161, OIDs: [1.3.6.1.2.1.2.2.1.1, 1.3.6.1.2.1.2.2.1.2, 1.3.6.1.2.1.2.2.1.3, 1.3.6.1.2.1.2.2.1.6, 1.3.6.1.2.1.2.2.1.7, 1.3.6.1.2.1.2.2.1.8], max rows: 10 08/29/18 11:32:53 (114) Worker-Interactive:SNMP DEBUG: First attempt of getTable on target: /161, OIDs: [1.3.6.1.2.1.47.1.1.1.1.11, 1.3.6.1.2.1.47.1.1.1.1.13, 1.3.6.1.2.1.47.1.1.1.1.2, 1.3.6.1.2.1.47.1.1.1.1.12, 1.3.6.1.2.1.47.1.1.1.1.4], max rows: 10 08/29/18 11:32:53 (161) Worker-Interactive:SNMP DEBUG: First attempt of getTable on target: /161, OIDs: [1.3.6.1.2.1.4.22.1.1, 1.3.6.1.2.1.4.22.1.2, 1.3.6.1.2.1.4.22.1.3], max rows: 10 08/29/18 11:32:53 (161) Worker-Interactive:SNMP DEBUG: First attempt of getTable on target: /161, OIDs: [1.3.6.1.2.1.25.3.2.1.2, 1.3.6.1.2.1.25.3.2.1.3], max rows: 10 08/29/18 11:32:53 (161) Worker-Interactive:SNMP DEBUG: First attempt of getTable on target: /161, OIDs: [1.3.6.1.2.1.43.5.1.1.17], max rows: 10 08/29/18 11:32:53 (176) Worker-Interactive:SNMP DEBUG: Event: GenericScalarMetricEvent 08/29/18 11:32:53 (176) Worker-Interactive:SNMP DEBUG: Event: CheckSessionCanceledEvent, correlator: , sysID: 405c1f5cdb54a7008597d8c75e961967, canceled: false 08/29/18 11:32:53 (176) Worker-Interactive:SNMP Enqueuing: C:\ServiceNow\emprcoeljak\agent\work\monitors\ECCSender\output_0\ecc_queue.405c1f5cdb54a7008597d8c75e961967.xml 08/29/18 11:32:53 (176) Worker-Interactive:SNMP DEBUG: Event: GenericCounterMetricEvent 08/29/18 11:32:53 (192) Worker-Interactive:SNMP DEBUG: ** enqueued C:\ServiceNow\emprcoeljak\agent\work\monitors\ECCSender\output_0\ecc_queue.405c1f5cdb54a7008597d8c75e961967.xml 08/29/18 11:32:53 (192) Worker-Interactive:SNMP DEBUG: Event: MessageProcessedEvent, sysID: 405c1f5cdb54a7008597d8c75e961967 08/29/18 11:32:53 (192) Worker-Interactive:SNMP DEBUG: Event: SendMessageEvent, message: SNMP SNMP - Classify: 61 OIDs 08/29/18 11:32:53 (192) Worker-Interactive:SNMP Worker completed: SNMP source: time: 0:00:00.250 Example log showing failed SNMP query: 08/30/18 07:29:03 (997) Worker-Interactive:SNMP DEBUG: Timeout: 10, Retries: 2 08/30/18 07:29:03 (997) Worker-Interactive:SNMP DEBUG: Snmp4jSessionFactory: connection created for key SnmpSessionPoolKey[target: &port:161&fixed_cred:&tag:] 08/30/18 07:29:04 (075) Worker-Interactive:SNMP DEBUG: Using GETBULK 08/30/18 07:29:04 (075) Worker-Interactive:SNMP DEBUG: First attempt of getTable on target: /161, OIDs: [1.3.6.1.2.1.4.22.1.1, 1.3.6.1.2.1.4.22.1.2, 1.3.6.1.2.1.4.22.1.3], max rows: 10 08/30/18 07:29:04 (075) Worker-Interactive:SNMP DEBUG: First attempt of getTable on target: /161, OIDs: [1.3.6.1.2.1.4.20.1.1, 1.3.6.1.2.1.4.20.1.2, 1.3.6.1.2.1.4.20.1.3], max rows: 10 08/30/18 07:29:04 (122) Worker-Interactive:SNMP DEBUG: First attempt of getTable failed on target: /161, OIDs: [1.3.6.1.2.1.4.20.1.1, 1.3.6.1.2.1.4.20.1.2, 1.3.6.1.2.1.4.20.1.3], error: Request timed out. 08/30/18 07:29:04 (122) Worker-Interactive:SNMP DEBUG: Second attempt of getTable on target: /161, OIDs: [1.3.6.1.2.1.4.20.1.1, 1.3.6.1.2.1.4.20.1.2, 1.3.6.1.2.1.4.20.1.3], max rows: 5 08/30/18 07:29:04 (169) Worker-Interactive:SNMP DEBUG: Second attempt of getTable failed on target: /161, OIDs: [1.3.6.1.2.1.4.20.1.1, 1.3.6.1.2.1.4.20.1.2, 1.3.6.1.2.1.4.20.1.3], error: Request timed out. 08/30/18 07:29:04 (169) Worker-Interactive:SNMP DEBUG: Third attempt of getTable on target: /161, OIDs: [1.3.6.1.2.1.4.20.1.1, 1.3.6.1.2.1.4.20.1.2, 1.3.6.1.2.1.4.20.1.3], max rows: 5, forcing GETNEXT pdu type 08/30/18 07:29:04 (215) Worker-Interactive:SNMP DEBUG: Event: GenericScalarMetricEvent 08/30/18 07:29:04 (215) Worker-Interactive:SNMP DEBUG: Event: CheckSessionCanceledEvent, correlator: , sysID: 561ea3acdbdca7008597d8c75e96191a, canceled: false 08/30/18 07:29:04 (215) Worker-Interactive:SNMP Enqueuing: C:\ServiceNow\emprcoeljak\agent\work\monitors\ECCSender\output_0\ecc_queue.561ea3acdbdca7008597d8c75e96191a.xml 08/30/18 07:29:04 (215) Worker-Interactive:SNMP DEBUG: Event: GenericCounterMetricEvent 08/30/18 07:29:04 (231) Worker-Interactive:SNMP DEBUG: ** enqueued C:\ServiceNow\emprcoeljak\agent\work\monitors\ECCSender\output_0\ecc_queue.561ea3acdbdca7008597d8c75e96191a.xml 08/30/18 07:29:04 (231) Worker-Interactive:SNMP DEBUG: Event: MessageProcessedEvent, sysID: 561ea3acdbdca7008597d8c75e96191a 08/30/18 07:29:04 (231) Worker-Interactive:SNMP DEBUG: Event: SendMessageEvent, message: SNMP SNMP - Classify: 12 OIDs 08/30/18 07:29:04 (231) Worker-Interactive:SNMP Worker completed: SNMP source: time: 0:00:00.218 In the above example we can see that some probes timeout due to the low timeout configured. SNNP Walk tool Using an SNMP tool we can confirm whether the results are returned as expected. Failure or partial success in retrieving OIDs would further confirm no issues with the MID server SNMP implementation, while consistent success using a third party tool would suggest the MID server logs need to be reviewed to look for any potential issues. In the following example, from the MID server a query is executed for OID 1.3.6.1.2.1.1.1. This OID is the sysDescr and will return a description of the device. Note that the commands may change depending on the SNMP tool used. The following example uses SnmpWalk.exe, however the credential was set to "publi" which is an incorrect community string for this device. The correct public string for this example should be public C:\SNMPWalk>.\SnmpWalk.exe -r:10.127.212.181 -c:"publi" -os:.1.3.6.1.2.1.1 -op:.1.3.6.1.2.1.1.1.0%Failed to get value of SNMP variable. Timedout. As seen above there is no credential failure error. Instead of an error the query eventually times out. In the following example the public string was corrected, public. C:\SNMPWalk>.\SnmpWalk.exe -r:10.127.212.181 -c:"public" -os:.1.3.6.1.2.1.1 -op:.1.3.6.1.2.1.1.1.0OID=.1.3.6.1.2.1.1.1.0, Type=OctetString, Value=Linux Linux-Tomcat 3.10.0-327.el7.x86_64 31 SMP Thu Nov 19 22:10:57 UTC 2015 x86_64 As seen above, once the public string was corrected then the sysDescr was returned, only part of it is shown above, instead of timing out. Note: It is important to run the test from the same host where the MID server is installed and with same configuration for the credential Network Traffic Monitoring Tool(Wireshark example) Using a network traffic monitoring tool would help in determining where the issue is found. For example, we could confirm whether the packets are sent and if they are ever returned. Setup: Download and install Wireshark from https://www.wireshark.org/download.html.Once installed, double-click the application icon to start the application.Select the interface that will be used to collect traffic.In the following image, Ethernet is selected. In the following example we review the traffic for an SNMP query of table mgmt.mib-2.printmib.prtMarkerColorant.prtMarkerColorantTable prtMarkerColorantValue. We can see from the ecc_queue record what was returned: Use display filter "udp && ip.addr == <target_ip>" to filter for only the SNMP traffic to the target device (In the screenshot replaced with loopback IP after packets were collected). The following screenshot shows data returned by the device in detail for one of the OIDs. Note: Wireshark has both capture filters and display filters. Per https://wiki.wireshark.org/CaptureFilters, "Capture filters (like tcp port 80) are not to be confused with display filters (like tcp.port == 80). The former are much more limited and are used to reduce the size of a raw packet capture. The latter are used to hide some packets from the packet list. Capture filters are set before starting a packet capture and cannot be modified during the capture. Display filters on the other hand do not have this limitation and you can change them on the fly". System performance wise it can be helpful to setup a capture filter before doing a large packet capture. Decrypt Wireshark SNMPv3 SNMPv3 traffic is encrypted and thus needs to be decrypted for review. Note that the following steps only decrypt the packets in memory. 1. Open the captured packets using the Wireshark application. 2. Go to Edit > Preferences > Protocols. 3. Select SNMP from the protocol list. 4. Click to "Edit" the "Users Table".5. Click on Add button and put the following details: Engine ID (Engine ID can be collected from the wireshark encrypted captures as this value is not encrypted. To do so open the SNMP packet header and check for Engine ID string).SNMPv3 username.Choose the authentication model (MD5 | SHA1).Put the password for authentication model.Choose the privacy protocol (DES | AES | AES192 | AES256).Put the privacy password.Packet content should be decrypted now. Solutions Confirm Correct Credentials Incorrect credentials are more often than not the root cause. SNMP v1/v2 will be simpler to configure as it only uses the community string. For SNMP v3 confirm the Username, Authentication Protocol, Authentication Key, Privacy Protocol and Privacy Key configured all match what is configured in the target device. A third party SNMP walk tool can be used as well to confirm the credential is correct. Increase SNMP Timeout The device at times may not be capable to reply within the timeout configure, or there could be a network issue. In most cases increasing the timeout would increase the changes of being able to retrieve the OIDs. SNMP timeout can be configure per MID server or directly on a probe. View the following documents for the available parameters for probes and MID servers: SNMP ProbesMID Server SNMP Configuration Parameters Response Out of Range A "Start OID" can be configured for some devices. The start OID will be returned when a walk is done against such device. Because the OID returned is not within the range requested, and not really a value of interest given the walk, the value is not used. No results will be returned to the probe in such configuration even when credentials are correct. This would be expected behavior as probes requests specific values to classify and update devices, anything outside the ranges requested would be disregarded. Logs like below example can be seen on the MID server when the MID is set to debug: DefaultUDPTransportMapping_0.0.0.0/0 DEBUG: Response out of range. Received: iso.org.dod.internet.private.enterprises.f5.bigipTrafficMgmt.bigipSystem.sysGlobals.sysGlobalAttrs.sysGlobalAttr.sysAttrArpMaxEntries.0 (1.3.6.1.4.1.3375.2.1.1.1.1.1.0); Range is: 1.3.6.1.2.1.1.2 - 1.3.6.1.2.1.1.3. Request ID: 1853998886 In the example above, we see the request is for ODs 1.3.6.1.2.1.1.2 - 1.3.6.1.2.1.1.3 however the device returned 1.3.6.1.4.1.3375.2.1.1.1.1.1.0. Note: OID 1.3.6.1.4.1.3375.2.1.1.1.1.1.0 is just an example and would likely be a different value in your case. To resolve this issue, please review the device configuration and adjust accordingly so that the device can return the proper OIDs requested. Specific instructions cannot be provided as this would differ for different devices. Context Some probes need to use context to collect information when discovering Cisco devices. For example, the probes triggered by/after SNMP - Switch probe need to pass context information in order to collect information for each vlan. Without the context only the default vlan information would be returned. SNMP - Switch probe authentication error and Layer 2 data not collectedSNMPv3 fails to gather information on layer 2 tables for the non-default Vlans (Cisco Switches Only)Additional InformationDiscovery: Deep Dive - SNMP classification capabilities and propertiesWhy Discovery may not return a Serial Number for a SNMP DeviceSNMP Probes