Rate Limits & Quotas
Two independent budgets govern every call: a per-route token bucket (QPS) and a plan-level credit quota (monthly spend). Both surface as HTTP 429 with explicit Retry-After guidance.
Plan budgets
| Plan | Req / minute | Concurrent jobs | Monthly credits |
|---|---|---|---|
| Free | 60 | 1 | 200 |
| Developer | 600 | 4 | 5 000 |
| Growth | 1 800 | 16 | 50 000 |
| Enterprise | custom | custom | pooled |
Per-route ceilings
Even on unlimited-plan tenants, each route carries a safety ceiling to prevent queue-bomb or polling-loop abuse. Keep long-running jobs asynchronous — don't poll in a tight loop.
| Route | Budget | Notes |
|---|---|---|
| POST /v1/client/jobs | 1 request / sec / tenant | Job submission — controlled to prevent queue-bomb DoS. |
| POST /v1/client/datasets/upload | 10 req / min / tenant | Covers multipart + profiling; larger datasets bill per MB. |
| POST /v1/client/seals | 60 req / min / tenant | Mint new sealed contract seal. |
| GET /v1/client/jobs/:id | 60 req / min / key | Polling — use webhooks where possible to avoid burning this budget. |
| POST /v1/client/agent/projects | 10 req / min / tenant | ADS project creation. Plan + approve + stream reuse the same budget. |
| GET /v1/client/evidence/:id | 30 req / min / key | Bundle download. Cache locally; bundles are immutable. |
| POST /v1/client/webhooks | 10 req / min / tenant | Webhook registration / rotation. |
Response headers
Every response includes rate-limit telemetry so a well-behaved client can pace itself without ever hitting 429.
HTTP/1.1 200 OK
X-RateLimit-Limit: 60
X-RateLimit-Remaining: 57
X-RateLimit-Reset: 1700003600
X-RateLimit-Route: GET /v1/client/jobs/:id
Content-Type: application/json- Limit — max allowed per window
- Remaining — tokens left in the current window
- Reset — Unix timestamp when the bucket refills
- Route — which budget was charged
Handling 429
When a budget trips, the response carries a Retry-Afterheader (delta-seconds or HTTP-date) and an error_code of RATE_LIMITED. The SDK already respects the header; a hand-written client must honour it.
# The SDK uses Retry-After when the server sets it, falling back to
# exponential backoff with a 1s floor. Nothing to configure.
from radmah_sdk import RadMahClient
client = RadMahClient(api_key="sl_live_…", max_retries=5)
for jid in jobs:
try:
status = client.jobs.get(jid) # polls safely under backpressure
except RadMah AIError as exc:
if exc.status_code == 429:
# Every automatic retry exhausted. Quota is persistently
# exhausted — move to a webhook subscription instead of polling.
schedule_switch_to_webhooks()Patterns that burn budget
- Tight polling on
GET /jobs/:id. Prefer webhooks; if you must poll, back off exponentially once the job enters a non-terminal state. - Parallel submit loops that ignore
concurrent_jobs. Useclient.batch_create_jobs(), which respects the ceiling server-side. - Synchronous ADS streaming across dozens of keys. Open one streaming connection per project; don't re-establish on every turn.
Need a higher ceiling? Enterprise plans negotiate custom per-route budgets with pooled credits across a tenant's keys — contact sales with your steady-state QPS + burst peak.