AI Gateway · Live in production

Your engineers already use AI.
Govern it.

One gateway in front of OpenAI, Anthropic, Google Gemini, and AWS Bedrock. Fail-closed budgets so a single workload can't drain the account, PHI redaction at the boundary, and a cryptographically signed audit log for every call.

Bring your own provider keys; the invoice stays with your provider. Every call streams token-by-token over the same hybrid post-quantum TLS edge. Live today at app.edge.scrutari.ai.

One API
OpenAI · Anthropic · Gemini · Bedrock
Fail-closed
Per-Tenant Token Budgets
Signed + Merkle
Tamper-Evident Audit
PQ-TLS
Post-Quantum Edge
  1. Inbound
    API request
  2. PHI Scan
    Audit / Redact / Block
  3. Token Quota
    Real-time monthly cap
  4. Primary Provider
    OpenAI · gpt-4o-mini
  5. Response
    200 OK
AI Gateway · Live in production

LLM providers cap your seats. We cap your workloads.

A 100-person engineering team has 100 seats and 12 internal AI products. Your CFO budgets the 12, not the 100. Your CISO audits the data crossing each one, not the seat that signed the request. Per-seat caps from OpenAI, Anthropic, or Bedrock can't see the workload. We can.

Insights Detections panel showing mixed BLOCKED and REDACTED audit cards across SSN, MRN, credit card, NPI, and US phone categories. Each card pairs the matched category, the action taken, and the originating request ID for compliance investigation.
CISO compliance

PHI never leaves your perimeter via an LLM call.

A vendor BAA doesn't prevent a developer from pasting a patient record into a prompt. Boundary redaction does. Eight HIPAA-mapped detectors (SSN, US phone, email, credit card, MRN, NPI, ICD-10, date of birth) scan every inbound AI request body before it forwards upstream. Per-category modes let your compliance team decide: audit, redact in-place with a category placeholder, or refuse with HTTP 422 and the matched categories in a response header.

  • Eight detectors compiled into a RegexSet quick-check; hot-path scan adds under 1ms to AI requests.
  • Three enforcement modes per category, set by Owner or Admin from the dashboard. Every change writes an audit row.
  • Live Detections panel surfaces every match the gateway saw, joined back to the originating request_id for investigation.
Monthly token quota editor showing 19,008,016 / 21,000,000 tokens used (91% of cap), Resets in 2 days, with Save and Clear cap controls. The operator surface for setting and adjusting the per-tenant monthly cap.
IT control

Cap the workload, not the seat.

OpenAI, Anthropic, Bedrock: every major LLM provider caps users. We cap the workloads your users power. Set a monthly token quota per tenant; the gateway enforces it in real time across every provider, with a 5-minute reconcile loop that keeps the Redis counter and the Postgres rollup in sync. Type 0 to incident-freeze AI traffic during a compliance investigation; clear the cap to resume.

  • Real-time 429 enforcement with monthly_quota_exceeded + Retry-After headers, not a silent end-of-month invoice surprise.
  • Cap-freeze (limit = 0) refuses every AI request until an Owner clears it. Single-toggle incident response.
  • Distributed enforcement via Redis with Postgres-rollup fallback. Gateway degrades gracefully instead of failing open.
Token spend dashboard showing 19,008,016 tokens consumed over 30 days across OpenAI and Anthropic, $81.83 estimated cost, top model claude-sonnet-4-6, with a stacked-bar timeseries chart and per-provider legend.
CFO visibility

Every AI dollar, every tenant, every provider, in one place.

Your CFO doesn't want three provider invoices and a spreadsheet. They want one number per tenant, per route, per day. We watch every AI request that crosses the boundary and roll token spend up across OpenAI, Anthropic, Bedrock, Azure OpenAI, and Google in one 30-day window. The Insights dashboard surfaces total tokens, estimated cost, top model, and top route at a glance, with per-route drill-down.

  • Token spend tiles for every AI-aware route, refreshed every 5 minutes from rolled-up daily aggregates.
  • Per-provider stacked bar so a CFO sees the OpenAI vs. Anthropic split without opening two billing portals.
  • 30-day timeseries with sparse-data handling so a single busy day reads correctly against an otherwise quiet window.
Unified inference API

One API. Five providers. Your keys.

Point one OpenAI-compatible, Anthropic-native endpoint at the gateway and reach five upstream integrations without rewriting a client. Every token streams back as the model generates it, over the same hybrid X25519MLKEM768 edge that fronts the rest of your traffic.

Five provider integrations

OpenAI, Anthropic, Google Gemini on both AI Studio and Vertex AI, and AWS Bedrock through the Converse API. One request shape in, a consistent response out. Move a workload between providers without touching client code.

Token-by-token streaming

Real incremental Server-Sent Events, not a buffered answer replayed in a single frame. The first token reaches your user the moment the model emits it, streamed over the hybrid PQ-TLS edge.

Self-service BYOK

Add your own provider key once from the dashboard. It stays encrypted in a key-management boundary and authenticates upstream on your behalf, so the model invoice stays with your provider. A missing key fails closed, never silently falls back.

Governed on every call

Every request is quota-checked and writes an append-only audit row before it leaves the boundary. On Growth and Enterprise that row is Ed25519-signed and anchored into a Merkle root, making any tampering or deletion detectable. A dropped mid-stream connection still settles its quota and records the call.

What if you go down?

The question every CISO asks. Here is the honest answer.

We would rather tell you exactly how the gateway behaves under failure than quote an uptime number we have not earned yet.

Fail-closed by default

A gateway that cannot reach its quota store, credential store, or PHI scanner refuses the call rather than letting an ungoverned request through. For a compliance boundary, blocking is the safe failure mode, not leaking. Fail-open is an explicit, per-control opt-in, never the default.

Hermetic Rust data plane

The data plane is a single, self-contained Rust process: no garbage collector, so no surprise pauses under load, and memory-safe by construction. The control plane is physically decoupled, so an overloaded admin API never touches live inference traffic.

Provider fallback (roadmap, not shipped)

Today a hard upstream outage surfaces as a clean, typed error rather than a silent hang. Automatic fail-over across your own provider keys, so an OpenAI rate limit reroutes to your Anthropic key, is the next reliability milestone we are building, stated as a plan, not a shipped capability.

Put a governed boundary in front of your AI.

Start free on the Starter plan and route your first call in minutes, or talk to engineering about a design-partner pilot. Either way you get fail-closed budgets, boundary redaction, and a signed audit trail from day one.