Error Handling & Retries

Every error the platform returns carries a stable shape, a typed error code, and explicit retry semantics. This page is the authoritative catalogue.

#Error envelope

Every non-2xx response returns JSON of the shape below — never HTML, even for gateway errors. The correlation_id field is the single value support staff need to trace the request in our logs.

{
  "error_code": "VALIDATION_FAILED",
  "message": "body fails schema check",
  "detail": {
    "field_errors": [
      { "field": "rows", "issue": "must be >= 100" }
    ]
  },
  "correlation_id": "req_01JG5XKH2C3DKAS7"
}

#HTTP status catalogue

Status	error_code	Retry?	When you see it
400	VALIDATION_FAILED	no	Body fails schema. The response includes a per-field error array; fix the field and resubmit. Never retry automatically.
401	UNAUTHENTICATED	no	API key missing, expired, or revoked. SDK auto-refreshes once if a refresh_token was supplied; otherwise raise a prompt-for-new-key alert.
403	FORBIDDEN	no	Token authenticated but lacks the required scope. Check the key's scope set in the dashboard or mint a new key.
404	NOT_FOUND	no	Resource not found within the calling tenant. Confirm the id; cross-tenant access is refused even if the id exists elsewhere.
409	IDEMPOTENT_REPLAY	no	Same Idempotency-Key replayed with a different body. Drop the stale payload — the server already processed the first form.
422	SEMANTIC_REJECT	no	Body parsed but business rule rejected it (e.g. quality gate failed on SCADA run). Inspect detail.block_reasons.
429	RATE_LIMITED	yes	Request exceeded per-tenant or per-route rate budget. Honour Retry-After header; exponential backoff floor applies.
500	INTERNAL	yes	Transient server error. Safe to retry with exponential backoff; SDK does this automatically up to max_retries.
502	BAD_GATEWAY	yes	Load balancer lost the upstream connection. Retry with backoff; persistent 502s warrant opening a ticket.
503	UNAVAILABLE	yes	Planned maintenance or circuit-breaker open. Respect Retry-After header; SDK auto-honours it.
504	GATEWAY_TIMEOUT	yes	Request exceeded the proxy's timeout budget. Retry with increased client timeout; long synthesis jobs should use the async submit + poll pattern.

#Retry strategy

The SDK retries on {429, 500, 502, 503, 504} with exponential backoff and honours the server's Retry-Afterheader on 429 / 503. Callers who roll their own HTTP client should mirror this behaviour.

from radmah_sdk import RadMahClient

# max_retries defaults to 3. The SDK handles the full retry loop:
#   - Exponential backoff floor: 1 · 2^(attempt-1) seconds
#   - Honours Retry-After on 429 / 503
#   - Auto-refreshes access token once on 401
#   - Attaches a stable Idempotency-Key to every POST/PUT/PATCH
client = RadMahClient(api_key="sl_live_…", max_retries=3)

#Idempotency

Every POST / PUT / PATCH should carry an Idempotency-Key header. The server stores the response for 24 hours keyed by this value; a replayed key with the same body returns the original response, a replayed key with a different body returns 409 IDEMPOTENT_REPLAY.

The SDK generates a UUID4 Idempotency-Key automatically on every POST / PUT / PATCH. Caller-supplied values passed via headers= take precedence.

#Typed error taxonomy (SDK 1.2.0+)

SDK 1.2.0 introduced typed subclasses of RadMah AIError so enterprise callers can pattern-match on error shape without string-matching error_code. Every subclass inherits from RadMah AIError — existing except RadMah AIError blocks catch all five new subclasses automatically.

Subclass	Raised on	Meaning
AuthError	401 · 403	Credentials missing, expired, revoked, or lack scope. Never retry in an outer loop — prompt the operator for fresh credentials.
ValidationError	400 · 422	Request body fails server-side validation. `detail.field_errors` carries a per-field breakdown when the server can identify the fault.
QuotaError	429	Rate limit or tenant quota exceeded. The SDK already honoured `Retry-After` internally; this subclass surfaces only after `max_retries` is also exhausted. Switch to webhook-driven flows or a longer back-off strategy.
ServerError	500 · 502 · 503 · 504	Server-side fault, surfaces only after automatic retries have exhausted. Inspect `detail.correlation_id` and open a support ticket if it persists.
NetworkError	transport	Connect timeout, DNS failure, or TLS handshake error — the request never reached a RadMah AI server. Check base URL, firewall, and DNS resolution.

#Catching errors in code

from radmah_sdk import (
    RadMahClient,
    RadMah AIError,
    AuthError,
    ValidationError,
    QuotaError,
    ServerError,
    NetworkError,
    BudgetExceededError,
)

client = RadMahClient(api_key="sl_live_…")
try:
    job = client.submit_job_with_budget(
        kind="synthesize", engine="mock", rows=10_000, max_credits=50.0,
        seal_id="seal_abc",
    )
except BudgetExceededError as exc:
    alert(f"budget refused — quote {exc.quoted:.0f} c > {exc.max_credits:.0f} c")
except ValidationError as exc:
    # detail.field_errors names the exact fields at fault
    for field in exc.detail.get("field_errors", []):
        log_field_error(field["field"], field["error"])
except AuthError:
    # Never retry — prompt the operator for fresh credentials
    prompt_reauth()
except QuotaError as exc:
    # The SDK already retried while honouring Retry-After; escalate
    # to a longer backoff window.
    schedule_retry_in(3600)
except NetworkError as exc:
    # Request never reached the server — check DNS / firewall
    alert_oncall("network path to RadMah AI API is broken", exc)
except ServerError as exc:
    # SDK already retried; surface with correlation_id for support
    open_ticket(exc.detail.get("correlation_id"))
except RadMah AIError as exc:
    # Fallback catch — any status not matched above
    raise