Inbound Web Services — Rate Limits and Throttling<!-- /*NS Branding Styles*/ --> .ns-kb-css-body-editor-container { p { font-size: 12pt; font-family: Lato; color: var(--now-color--text-primary, #000000); } span { font-size: 12pt; font-family: Lato; color: var(--now-color--text-primary, #000000); } h2 { font-size: 24pt; font-family: Lato; color: var(--now-color--text-primary, black); } h3 { font-size: 18pt; font-family: Lato; color: var(--now-color--text-primary, black); } h4 { font-size: 14pt; font-family: Lato; color: var(--now-color--text-primary, black); } a { font-size: 12pt; font-family: Lato; color: var(--now-color--link-primary, #00718F); } a:hover { font-size: 12pt; color: var(--now-color--link-primary, #024F69); } a:target { font-size: 12pt; color: var(--now-color--link-primary, #032D42); } a:visited { font-size: 12pt; color: var(--now-color--link-primary, #00718f); } ul { font-size: 12pt; font-family: Lato; } li { font-size: 12pt; font-family: Lato; } img { display: ; max-width: ; width: ; height: ; } } Contents IssueAbout HTTP 429 — What This Status Code MeansQuick TriageBackground — The Three Limit Types You Need to KnowAdding Nodes — What Changes and What Doesn’tDiagnosing the 429 — What to CapturePer-Table Quotas — The Hidden LimitCommon MisconceptionsResolution / WorkaroundsWhen to Open a Support CaseVerification 1. Issue This article addresses three customer-reported symptoms that share the same family of underlying causes: Inbound REST calls return HTTP 429 (Too Many Requests).Inbound calls succeed under light load but fail under heavier or bursty load — sometimes even after adding nodes to the instance.Rate-limit responses appear despite the integration appearing to be well below documented limits. If your situation matches any of the above, this article will help you identify which of three distinct limit types you are hitting and tell you exactly what to do about each one. ↑ Back to top 2. About HTTP 429 — What This Status Code Means HTTP 429 is a standard response code defined by the IETF in RFC 6585. It is not specific to ServiceNow. Understanding what 429 means in general — and how clients are expected to behave when they receive one — will help you read the rest of this article in context. Definition 429 Too Many Requests indicates that the server is refusing to process a request because the client has sent too many requests in too short a window. The server is healthy, the request itself is well-formed, and the client is authenticated — but the server has decided to throttle the caller, typically to protect itself from overload or to enforce a fair-share policy across consumers. 429 is part of the 4xx Client Error family of status codes. The classification matters: a 4xx code tells the client that the responsibility for the situation lies on its side (rate of requests), not on the server’s side. The remedy is for the client to change its behavior — usually by slowing down. How 429 differs from neighboring status codes 200 OK — request succeeded.400 Bad Request — the request itself is malformed (bad JSON, missing field). Fix the request body.401 Unauthorized — credentials missing or invalid. Fix the authentication.403 Forbidden — credentials valid, but the user does not have permission for this resource. Fix the user / ACL.429 Too Many Requests — credentials are valid, the request is well-formed, the user has permission, but the rate of calls is too high. Slow down.500 Internal Server Error — something went wrong on the server. Not the client’s fault.503 Service Unavailable — the server is temporarily unable to handle the request (maintenance, capacity exhaustion). Usually transient. Important: A common misdiagnosis is treating a 429 as a 5xx (“the platform is broken”). It is not. A 429 is the platform telling you it is intentionally rejecting your call to protect itself. The right response is to back off — not to retry immediately, and not to escalate as a platform outage. Standard response headers on 429 RFC 6585 defines Retry-After as the canonical header for 429 responses. Many APIs also include vendor-specific headers prefixed with X-RateLimit-* to provide more context: Retry-After — number of seconds (or an HTTP date) the client should wait before retrying. This is the most important header on a 429 response. Honor it.X-RateLimit-Limit — the maximum number of requests allowed in the current window.X-RateLimit-Remaining — the number of requests the client has left in the current window.X-RateLimit-Reset — when the current window resets (commonly a Unix timestamp).X-RateLimit-Rule — vendor-specific; in ServiceNow, identifies which rate limit rule matched the request. The presence or absence of these headers is itself diagnostic — see Section 6. A 429 with X-RateLimit-* headers is a configured rate limit. A 429 without them is more likely capacity saturation. What a well-behaved client does on 429 Read Retry-After and wait that many seconds before retrying. Do not retry immediately — that will produce another 429 and may compound the throttling.If Retry-After is absent, use exponential backoff with jitter. Start at a few seconds, double on each retry, add randomness to avoid synchronized retry storms across multiple clients.Set a maximum retry count. Three to five retries is reasonable for most integration patterns. Beyond that, log the failure and surface it to a human.Track 429 rates over time. A steady stream of 429s indicates a systemic mismatch between client volume and server capacity — the fix is architectural, not retry-tuning.Never treat 429 as an error to swallow silently. It is the server asking you to change behavior. Ignoring it shifts the problem from “occasional 429s” to “blocked or degraded service.” Where 429 sits in the ServiceNow context In ServiceNow, a 429 can be issued at any of three distinct layers — per-instance rate limit rules, per-node semaphore saturation, or per-table transaction quotas. The HTTP status code is the same in all three cases, but the response headers differ, and the right remedy differs. This is the central insight of this article and is covered in Sections 4 through 7 below. ↑ Back to top 3. Quick Triage If you only have 30 seconds, use this triage block: Only during a burst window? Likely client-side burst pattern. Jump to Section 6.Steady after a scaling event? Adding nodes does not help all limit types. Jump to Section 5.Only on one specific table? Per-table transaction quota. Jump to Section 7.Response includes X-RateLimit-* headers? A configured rate limit rule matched. Jump to Section 4, Layer 1. ↑ Back to top 4. Background — The Three Limit Types You Need to Know ServiceNow enforces inbound traffic limits at three distinct layers. Confusing one for another is the most common reason customers misdiagnose 429s. Layer 1 — Per-Instance Rate Limit Rules Defined in the sys_rate_limit_rules table.Count requests per hour, scoped to a specific user, role, or all users.Apply across the entire instance — not per node.Adding nodes will NOT change these limits. The count is shared globally.On rejection, you receive HTTP 429 with response headers: X-RateLimit-Rule, X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Reset, and Retry-After.Check recent violations in Rate Limit Violations (sys_rate_limit_violations). Note: Each node maintains its own counter and syncs to the database every 30 seconds. A rate limit rule change may take up to 30 seconds to take effect across all nodes. Layer 2 — Per-Node Semaphore Pools (API_INT) Every application node maintains a pool of concurrent transaction slots called API_INT semaphores.When all semaphores on a node are in use, additional requests are queued.When the queue is also full, the node returns HTTP 429. No Retry-After header is included — because this is not a configured rate limit, it is capacity saturation.Adding nodes DOES help. Each new node adds a full semaphore pool, so total instance capacity scales linearly with node count.You can view live semaphore state at https://<instance>.service-now.com/stats.do. Look for the API_INT section. What to look for in stats.do Available semaphores — free slots right nowIn use semaphores — active transactions and their elapsed timeQueue depth — requests currently waiting for a free slotQueue depth limit — maximum queue size before 429s are returnedMax queue depth — high-water mark since last node restart429 rejections — cumulative 429s issued by this node Approximate maximum concurrency: A rough ceiling for sustained inbound concurrency per node is: available_semaphores × (1 + queue_depth_limit / available_semaphores). Customer Community pattern reports values around semaphores + 150 as a working estimate for default platform configuration. Use stats.do for your actual numbers. Layer 3 — Per-Table Transaction Quotas Transaction quota rules cap how long an individual transaction may run, and how many can run concurrently, for a specific table.The default REST/JSON catch-all quota applies a 5-minute (300 second) ceiling. Queries that exceed it are cancelled.Long-running Table API queries on large tables are the most common trigger.Adding nodes will NOT help — the quota is scoped to the table, not to per-node compute.On hit, the transaction is cancelled and the client receives an error — often a 429 or 5xx, depending on where in the lifecycle the cancellation occurred. ↑ Back to top 5. Adding Nodes — What Changes and What Doesn’t This is the single most expensive misconception. Worked example: A customer running on 2 nodes adds 2 more nodes to relieve 429s. After the addition: per-instance rate limit count remains the same (no change); per-node semaphore capacity doubles (relief); but if the failing pattern was bursty writes to a single table where a per-table quota is binding, the 429s continue. The customer logs a case believing the upgrade failed. The actual fix is to address the per-table quota — not more nodes. ↑ Back to top 6. Diagnosing the 429 — What to Capture Before any other troubleshooting, capture the following from a failing call. The first three fields determine which layer is rejecting you. Full response headers, especially: X-RateLimit-Rule, X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Reset, Retry-After. Presence or absence is diagnostic.Time of failure to the second, with timezone (UTC strongly preferred).Endpoint and target table (e.g., POST /api/now/table/incident).Client IP and integration user that made the call.Client-side request volume for the 10-minute window around the failure (so we can tell whether the pattern was actually bursty).A snapshot of https://<instance>.service-now.com/stats.do captured as close to the failure time as you can reach. ↑ Back to top 7. Per-Table Quotas — The Hidden Limit Per-table transaction quotas surprise more customers than any other limit type. They deserve their own section. What they are A per-table cap on transaction duration and / or concurrent transactions.Default REST/JSON catch-all quota: 300 seconds (5 minutes) per transaction.Configured under System Definition → Quota Rules. Why they exist Protect the table from being locked or saturated by a single misbehaving query.Prevent a single slow consumer from starving other consumers of the same table. How to identify you are hitting one The failure occurs only on one specific table.The failure correlates with large or unfiltered Table API queries (often GETs against large tables with no sysparm_query, or POSTs with very large payloads).The response includes no X-RateLimit-* headers — because this is not a configured rate limit rule.In your instance logs, look for transaction cancellation entries referencing the quota. How to mitigate Paginate. Use sysparm_limit and sysparm_offset to break large reads into smaller calls.Filter early. Use sysparm_query to limit returned rows server-side. Querying a 5M-row table with no filter is the most common cause.Batch writes. Coalesce many single-record creates into a single batched call where the API supports it.Move heavy reads to off-peak windows where the quota has more room and other workloads are quiet.Request a quota review via HI ticket if your business case genuinely requires longer transactions. Bring data from Section 6. Note: This is the limit type where adding nodes provides zero relief. If you have already scaled horizontally and still hit 429s on a single table, this is almost certainly your cause. ↑ Back to top 8. Common Misconceptions “We added nodes — shouldn’t the limits scale automatically?” Only Layer 2 (per-node semaphores) scales with node count. Layers 1 and 3 do not. See Section 5. “We’re well under the documented per-hour limit — why are we throttled?” You are almost certainly hitting Layer 2 (semaphore saturation) or Layer 3 (per-table quota). Documented rate limits are only Layer 1. “Can we get the per-instance rate limits raised?” Yes — but bring the Section 6 data. ServiceNow Support reviews these requests with the load characteristics in hand. “Does the integration user matter for rate-limit accounting?” Yes for Layer 1 rate limit rules (most rules are scoped to user or role). No for Layers 2 and 3 (which are about platform capacity and table protection, respectively). “Will this affect outbound calls from my instance?” No. Outbound REST calls have their own separate semaphore counter and rate-limit configuration. They are tracked independently from inbound traffic. ↑ Back to top 9. Resolution / Workarounds Ordered by effort, lowest first: Lowest effort — Honor Retry-After Implement client-side retry that respects the Retry-After header value (in seconds).Use exponential backoff with jitter for cases where Retry-After is not present. Medium effort — Reshape the workload Batch writes; coalesce updates on the client side before sending.Paginate reads with sysparm_limit — a common starting point is 1000 records per page.Shift bursty workloads to off-peak windows to reduce contention with other consumers.Use dedicated integration users for high-volume integrations so the rate-limit accounting is isolated. Higher effort — Open a support case with data in hand Captured Section 6 data (headers, time, endpoint, user, stats.do snapshot).Clearly state which Layer (1, 2, or 3) your diagnostic concluded.Whether you are requesting a rule change, a semaphore review, or a quota review — these have different ownership paths. Highest effort — Architectural Move high-volume writes to an async / queued ingestion pattern (e.g., Import Sets with scheduled processing).Pre-aggregate or stage data outside the instance before pushing in via REST.Consider Event Streaming integrations if your data shape supports it. ↑ Back to top 10. When to Open a Support Case Open a case only if all of the following apply: You have captured Section 6 data.The pattern is steady and reproducible (not a single one-time burst).You have identified which Layer (1, 2, or 3) is rejecting calls, per the diagnostic tree in Section 6.The pattern emerged after a platform upgrade with no client-side change, OR response headers contradict what this article describes (potential platform issue). ↑ Back to top 11. Verification You will know the issue is resolved when: 429 rate falls to your expected baseline (state explicitly what “expected” means — zero is rarely realistic for a high-volume integration).Retry-After values, when present, are short and consistent.Observed throughput matches the documented limit for your instance class and configuration.stats.do shows the queue clearing between bursts (queue_depth returns to near zero between spikes). 12. Related Articles Inbound Web Services — Troubleshooting Guide (hub)Inbound Web Services — Timeouts and Slow ResponsesInbound Web Services — No Response and Connection FailuresInbound Web Services — Authentication and Authorization FailuresKB0547836 — Web service export sizing and throttling ↑ Back to top