Administrating and Troubleshooting CNO for Visibility (Cloud-Native-Operations)Troubleshooting Deployment Issues Following the deployment, check if the informer pod is running by replacing INSTANCE_NAME and NAMESPACE in the command below and running it. export POD_NAME=$(kubectl get pods --namespace k8s-informer-INSTANCE_NAME -l "app=k8s_informer-INSTANCE_NAME" -o jsonpath="{.items[0].metadata.name}")kubectl get pod $POD_NAME --namespace NAMESPACE If the pod is not running, it is either because Kubernetes failed to download the informer image or that the secret holding the instance credentials is not found. In case Kubernetes failed to pull the image the pod will be in state of ErrImagePull or ImagePullBackOff. In case the secret does not exist, the pod state would be ContainerCreating.Run kubectl get events -n NAMESPACE in order to figure out the exact reason. The informer expects to have a secret named k8s-informer-cred-INSTANCE_NAME in the relevant namespace. If this secret does not exist, the pod will stay in the ContainerCreating state and you will see an event similar to this: 51s Warning FailedMount pod/k8s-informer-cnotal5-67bd77784d-bzb7k MountVolume.SetUp failed for volume "credentials" : secret "k8s-informer-cred-cnotal5" If the informer pod is running, run the following command to see the informer logs. Remember to replace INSTANCE_NAME and NAMESPACE. export POD_NAME=$(kubectl get pods --namespace k8s-informer-INSTANCE_NAME -l "app=k8s_informer-INSTANCE_NAME" -o jsonpath="{.items[0].metadata.name}")kubectl logs $POD_NAME --namespace NAMESPACE Failure to Access Kubernetes Resources In case the deployment is using a custom ClusterRole and not the one provided with the Helm chart, the ClusterRole provided should have list, get, and watch access to all relevant resources. When it does not have sufficient rights, you will see in the logs messages like: W1126 07:25:17.368897 1 reflector.go:424] pkg/mod/k8s.io/client-go@v0.26.0/tools/cache/reflector.go:169: failed to list *v1.Endpoints: endpoints is forbidden: User "system:serviceaccount:k8s-informer-cnotal5:servicenow-cnotal5" cannot list resource "endpoints" in API group "" at the cluster scope In that case, fix the ClusterRole. No need to restart the informer pod. Failure on DNS Resolution of ServiceNow Instance Name The informer is installed by default with dnsPolicy with value Default. For more information on the dnsPolicy parameter, see here. In case DNS resolution fails, you will see messages like those: 2023/11/24 13:41:20 Failed to send ecc message with payload size: 0 Post https://cnotal5.service-now.com/api/now/table/ecc_queue: dial tcp: lookup cnotal5.service-now.com on 10.60.41.190:53: read udp 10.225.11.24:58409->10.60.41.190:53: i/o timeout We recommend that you will redeploy with dnsPolicy=ClusterFirst. In case you install with Helm, add the command line argument --set dnsPolicy=ClusterFirst. In case you are using the k8s_informer.yaml, change the dnsPolicy value in the file. Incorrect Secret Keys The informer expects to have a secret named k8s-informer-cred-INSTANCE_NAME in the relevant namespace, and this secret should contain the keys ".user" and ".password". If the secret is available but one of the expected keys is missing, you will see messages like this in the logs: 2023/11/26 07:55:58 Failed to send to SN instance Missing credentials. Will not try to get ECC messages If you see this message, follow the documentation in order to create the secret, and delete the informer pod. Kubernetes will restart it. Invalid Credentials In case the credentials provided are invalid, the messages in the logs will look like this: 2023/11/26 08:02:20 Failed to send to SN instance 401 Unauthorized. Invalid credentials If you see this message, verify you got the correct user and password, then delete and recreate the secret correctly, and delete the informer pod. Kubernetes will restart the pod. Credentials with Insufficient Role In case the credentials are valid, but the user does not have the discovery_admin role, the messages in the logs will look like this: 2023/11/26 08:09:39 Failed to send to SN instance 403 Forbidden. User has insufficient roles If you see, this message, add the discovery_admin role to the user being used on the ServiceNow's instance. No need to restart the pod. Turning off the TLS Certificate Validation In some cases, due to network policies, the informer might fail to connect to the ServiceNow instance and will report in the logs on x509 error. For example: 2023/09/26 15:29:16 Failed to get ecc output messages Get https://myinstance.service-now.com/api/now/table/ecc_queue?sysparm_query=sys_created_onRELATIVEGT%40minute%40ago%4030%5Equeue=output%5Eagent=k8s_informer_2e105171-98c4-4780-9c10-675e12d8bb20%5Etopic=k8s_informer%5Estate=ready&sysparm_fields=sys_id,payload&sysparm_limit=1: x509: certificate signed by unknown authority If the network issue cannot be resolved, we advise to turn off the TLS Certificate Validation. With Helm chart: use the command line argument: --set skipTLSCertificateValidation=true in your helm install command. With k8s_informer.yaml: Place the value “true” in the line under SKIP_TLS_CERT_VALIDATION Post Deployment Administration and Troubleshooting Grabbing Logs To grab the logs from the “CNO for Visibility” pod running in a given cluster, navigate to “CNO for Visibility/Home”, and click on the row of the cluster. Then click on “Grab Informer Logs” in the “Related Links” section: Wait for up to two minutes and reload the page. The system will grab the last 1MB of data from the recent log and add it as an attachment to the current record. Whenever you grab logs again, the system will replace the attachment with the newer log. Running On-Demand Full Discovery To start on-demand full discovery, navigate to “CNO for Visibility/Home” and click on the row associated with the cluster you want to discover. Then click on “Full Discovery” in the “Related Links” section. The “Full Discovery Status” field will change to “In Progress”. Once the discovery is done, the field value will be changed again to “Completed”. The expected time to complete the full discovery depends on the cluster size and the load on the instance. Pause/Resume the Informer To pause or resume the informer, navigate to “CNO for Visibility/Home” and click on the row associated with the relevant cluster, then click on Pause or Resume in the “Related Links” section. When clicking on Pause, the status field will first change to “Pausing” and after up to one minute to “Paused”. When clicking on Resume, the status will first change to “Resuming” and after up to one minute to “Up”. Restarting the Informer In case you need to restart the informer pod, navigate to “CNO for Visibility/Home” and click on the row associated with the relevant cluster, then click on “Restart Informer” in the “Related Links”. The program will exit and Kubernetes will restart it.