Error Handling & Retries
Every error the platform returns carries a stable shape, a typed error code, and explicit retry semantics. This page is the authoritative catalogue.
#Error envelope
Every non-2xx response returns JSON of the shape below — never HTML, even for gateway errors. The correlation_id field is the single value support staff need to trace the request in our logs.
{
"error_code": "VALIDATION_FAILED",
"message": "body fails schema check",
"detail": {
"field_errors": [
{ "field": "rows", "issue": "must be >= 100" }
]
},
"correlation_id": "req_01JG5XKH2C3DKAS7"
}#HTTP status catalogue
| Status | error_code | Retry? | When you see it |
|---|---|---|---|
| 400 | VALIDATION_FAILED | no | Body fails schema. The response includes a per-field error array; fix the field and resubmit. Never retry automatically. |
| 401 | UNAUTHENTICATED | no | API key missing, expired, or revoked. SDK auto-refreshes once if a refresh_token was supplied; otherwise raise a prompt-for-new-key alert. |
| 403 | FORBIDDEN | no | Token authenticated but lacks the required scope. Check the key's scope set in the dashboard or mint a new key. |
| 404 | NOT_FOUND | no | Resource not found within the calling tenant. Confirm the id; cross-tenant access is refused even if the id exists elsewhere. |
| 409 | IDEMPOTENT_REPLAY | no | Same Idempotency-Key replayed with a different body. Drop the stale payload — the server already processed the first form. |
| 422 | SEMANTIC_REJECT | no | Body parsed but business rule rejected it (e.g. quality gate failed on SCADA run). Inspect detail.block_reasons. |
| 429 | RATE_LIMITED | yes | Request exceeded per-tenant or per-route rate budget. Honour Retry-After header; exponential backoff floor applies. |
| 500 | INTERNAL | yes | Transient server error. Safe to retry with exponential backoff; SDK does this automatically up to max_retries. |
| 502 | BAD_GATEWAY | yes | Load balancer lost the upstream connection. Retry with backoff; persistent 502s warrant opening a ticket. |
| 503 | UNAVAILABLE | yes | Planned maintenance or circuit-breaker open. Respect Retry-After header; SDK auto-honours it. |
| 504 | GATEWAY_TIMEOUT | yes | Request exceeded the proxy's timeout budget. Retry with increased client timeout; long synthesis jobs should use the async submit + poll pattern. |
#Retry strategy
The SDK retries on {429, 500, 502, 503, 504} with exponential backoff and honours the server's Retry-Afterheader on 429 / 503. Callers who roll their own HTTP client should mirror this behaviour.
from radmah_sdk import RadMahClient
# max_retries defaults to 3. The SDK handles the full retry loop:
# - Exponential backoff floor: 1 · 2^(attempt-1) seconds
# - Honours Retry-After on 429 / 503
# - Auto-refreshes access token once on 401
# - Attaches a stable Idempotency-Key to every POST/PUT/PATCH
client = RadMahClient(api_key="sl_live_…", max_retries=3)
#Idempotency
Every POST / PUT / PATCH should carry an Idempotency-Key header. The server stores the response for 24 hours keyed by this value; a replayed key with the same body returns the original response, a replayed key with a different body returns 409 IDEMPOTENT_REPLAY.
Idempotency-Key automatically on every POST / PUT / PATCH. Caller-supplied values passed via headers= take precedence.#Typed error taxonomy (SDK 1.2.0+)
SDK 1.2.0 introduced typed subclasses of RadMah AIError so enterprise callers can pattern-match on error shape without string-matching error_code. Every subclass inherits from RadMah AIError — existing except RadMah AIError blocks catch all five new subclasses automatically.
| Subclass | Raised on | Meaning |
|---|---|---|
| AuthError | 401 · 403 | Credentials missing, expired, revoked, or lack scope. Never retry in an outer loop — prompt the operator for fresh credentials. |
| ValidationError | 400 · 422 | Request body fails server-side validation. detail.field_errors carries a per-field breakdown when the server can identify the fault. |
| QuotaError | 429 | Rate limit or tenant quota exceeded. The SDK already honoured Retry-After internally; this subclass surfaces only after max_retries is also exhausted. Switch to webhook-driven flows or a longer back-off strategy. |
| ServerError | 500 · 502 · 503 · 504 | Server-side fault, surfaces only after automatic retries have exhausted. Inspect detail.correlation_id and open a support ticket if it persists. |
| NetworkError | transport | Connect timeout, DNS failure, or TLS handshake error — the request never reached a RadMah AI server. Check base URL, firewall, and DNS resolution. |
#Catching errors in code
from radmah_sdk import (
RadMahClient,
RadMah AIError,
AuthError,
ValidationError,
QuotaError,
ServerError,
NetworkError,
BudgetExceededError,
)
client = RadMahClient(api_key="sl_live_…")
try:
job = client.submit_job_with_budget(
kind="synthesize", engine="mock", rows=10_000, max_credits=50.0,
seal_id="seal_abc",
)
except BudgetExceededError as exc:
alert(f"budget refused — quote {exc.quoted:.0f} c > {exc.max_credits:.0f} c")
except ValidationError as exc:
# detail.field_errors names the exact fields at fault
for field in exc.detail.get("field_errors", []):
log_field_error(field["field"], field["error"])
except AuthError:
# Never retry — prompt the operator for fresh credentials
prompt_reauth()
except QuotaError as exc:
# The SDK already retried while honouring Retry-After; escalate
# to a longer backoff window.
schedule_retry_in(3600)
except NetworkError as exc:
# Request never reached the server — check DNS / firewall
alert_oncall("network path to RadMah AI API is broken", exc)
except ServerError as exc:
# SDK already retried; surface with correlation_id for support
open_ticket(exc.detail.get("correlation_id"))
except RadMah AIError as exc:
# Fallback catch — any status not matched above
raise