How to install SRM with Cloud ObservabilitySummaryWhat is Service Reliability Management (SRM)? SRM aims to help organizations perform incident response utilizing team based SRE to improve digital application/technical service reliability, availability, and health. This application will replace Site Reliability Operations and Site Reliability Metrics. It helps teams set up Application Performance Monitoring (APM) integrations for alerting and provides health visibility for SLO and error budget performance. SRM is currently in controlled Go-To-Market. SRM requires ITOM Operator Pro for the on-call and team response features, and AIOps Enterprise for the SLO/I/Error Budget management capabilities. What is Cloud Observability? ServiceNow Cloud Observability brings together critical telemetry data (logs, metrics, traces) to enable teams to improve security, workflows, collaboration, customer and employee experiences, and ROI. Having a solution that can scale, and integrate into the new software development lifecycle ultimately results in better performance, more innovation, and improved visibility across their entire user journey. With Cloud Observability, simplify cloud complexity and realize the value of self-healing operations with an end-to-end cloud management solution combining insights into performance of your hybrid estate. Why Cloud Observability with SRM? With ServiceNow, organizations can tame the complexity of cloud-native environments by adopting technologies suited to the ephemeral nature of cloud native. Having a solution that can scale, and integrate into the new software development lifecycle ultimately results in better performance, more innovation, and improved visibility across their entire user journey. With ServiceNow Cloud Observability, you can have true end-to-end visibility and control across all your revenue-generating apps, the services they run on, and understand how it all impacts the end-user. Ultimately breaking down the organizational silos that delay transformation, empowering Operations and Developer teams. ServiceNow and Cloud Observability are perfectly positioned to help customers drive organization-wide technology transformation as organizations embrace cloud-native to grow and scale their businesses. Service Reliability Management and Cloud Observability provides end-to-end tracking and alerting of business-critical services. Cloud Observability sends service specific metrics alerts to SRM through a connector where these alerts are used to decrease the error budget of defined SLOs. For customers who have adopted OpenTelemetry and are using the Service Graph Connector for OpenTelemetry (SGC for OTel) to send application and cloud-native data to the CMDB, using Cloud Observability with SRM will provide an end-to-end incident response tracking solution. Using OpenTelemetry based instrumentation, organizations can populate the service list showing instrumented and inferred services in Cloud Observability which can easily be integrated to SRM. Pre-requisites: Data available in Cloud ObservabilityService Reliability Management entitlementService Graph Connector for OpenTelemetry How to setup SLOs? From the Service Operations workspace, click on ‘Services’ tab and select ‘add service’ to create/import service that needs to be monitored. Select or define the service you want to configure.Select the team who will monitor this service. A new team can be created from the admin portal. Click on ‘Reliability Metrics’ and set a service level objective across which the error budget will degrade. 4. At the Service Level Indicator (SLI) step, define the conditions for which alerts should degrade the defined Service Level Objective (SLO). Only alerts matching the conditions provided will be considered from their initial event time to the last event time. Create an indicator metric which when sent from APM tool should be captured by SRM. Anytime the alerts come in and the metric name matches, the degradation across SLO begins. 5. Set the error budget window for the service. An error budget is the amount of error that a service can accumulate over a certain period before it affects user experience negatively. How to setup alerts in Cloud Observability (formerly known as Lightstep)? Select the option of ‘Lightstep’ from the integration window in SRM and copy the integration URL shown on the screen. Move to Cloud Observability portal home screen and click on ‘Alerts’ tab. Move to ‘Notification destinations’ and ‘Create a destination.’ From the dropdown choose ‘Webhook’ and use the default payload dropdown. Add header to save id and password and remove placeholders from the URL copied from SRM integration tab. Note the header needs to be base64 encoded. Instructions for setting up header are explained in the following section.Save the webhook and ‘Activate’ the integration from SRM.Now that the destination is ready, click on ‘Create an alert’. Search for the metric/service you would like to track and provide an aggregation window. 7. Select the ‘Notification rules’ to the webhook destination name that was set before. 8. Test the alert using the notification destination tab and observe the alert showing in SRM portal. How to setup basic auth header for webhook destination in Cloud Obs? On the command line, generate a base64 encoded username and password pair. The username and password must be separated with a “:”, where in the example below “admin” is the username and “Passw0rd” is the password: $ echo admin:Passw0rd | base64 ZW1fYWRtaW46UGFzc3cwcmQK The returned value will be used when setting up the Cloud Obs webhook destination. Customer facing recommendations (FAQs or Best Practices) How would the services be populated in SRM for monitoring existing services? It would be populated through the SGC for OpenTelemetry. SGC needs to be installed and configured with your organization project in Cloud Observability. 2. How do I install SGC for OpenTelemetry on my Service Now instance? The SGC for OpenTelemetry is available on the ServiceNow App store. Configure Service Graph Connector for OpenTelemetry 3. Can SRM work without SGC for OpenTelemetry? Yes, without OpenTelemetry the services need to be created manually in SRM. Based on the notification destination for alerts created in Cloud Observability, the alerts would flow to SRM. 4. What are the entitlements needed for SRM installation? ITOM Pro apps for ingesting alerts/monitoring ITOM AIOps Enterprise for SLO/I management 5. Who can view the services in SRM populated by SGC for Open Telemetry? Services would show up for you if you are managing a service, a group, or a member of a group that manages the service. 6. What should be an ideal first Service Level Objective to be created? Service Level RED metrics would be a good place to start with. RED stands for- Rate (the number of requests per second)- allows to manage request volumeErrors (the number of those requests that are failing)- allows to manage error budgetDuration (the amount of time those requests take)- allows to manage service latency 7. How are multiple events from the same alert packaged? The same events are all packaged in one single alert but the time for which the alert is open is tracked towards the SLO. To decrease noise, all events within a single alert is not shown as separate line items. Although engineers can manually go and close the individual events or alert. 8. What happens when a new event comes in as part of a closed alert? The alert will show again with the reopen status. 9 . What can be possible Service Level Indicators (SLIs)? SLIs are nothing but a way to filter out all the alerts. Service State, System ID, Assigned To or Metric Name such as UpTime can be used as SLIs. 10 . What are the steps taken for setup as we login to SRM? The setup is guided with SRM admin and SRM user as two roles. SRM admin needs to be configured first because the user, team and service additions are dependent on the admin. 11. How to create an approver team for SRM? Add an existing team by adding sys_id to the team governance tab of SRM. 12. How to approve a team addition request? Owners of a service can approve or deny team addition requests. As the owner of the service, after logging in, open the team addition request, and go to related records to approve it. 13. How do I add a user to the group who manages a service? Edit the team members on the ServiceNow instance to add more users. 14. Is it necessary to set up an SRM team approval request form? Yes, if you do not wish to set an approval process for access then select SRM Team- (No approval required) in the team setup governance tab in SRM admin setup. You must be logged in to an admin role. 15. How to get access to Cloud Observability and Service Reliability Management? Service Reliability Management is generally available starting August 2024.