Your engineers already use AI.
Govern it.
One gateway in front of OpenAI, Anthropic, Google Gemini, and AWS Bedrock. Fail-closed budgets so a single workload can't drain the account, PHI redaction at the boundary, and a cryptographically signed audit log for every call.
Bring your own provider keys; the invoice stays with your provider. Every call streams token-by-token over the same hybrid post-quantum TLS edge. Live today at app.edge.scrutari.ai.
- InboundAPI request
- PHI ScanAudit / Redact / Block
- Token QuotaReal-time monthly cap
- Primary ProviderOpenAI · gpt-4o-mini
- Response200 OK
LLM providers cap your seats. We cap your workloads.
A 100-person engineering team has 100 seats and 12 internal AI products. Your CFO budgets the 12, not the 100. Your CISO audits the data crossing each one, not the seat that signed the request. Per-seat caps from OpenAI, Anthropic, or Bedrock can't see the workload. We can.

PHI never leaves your perimeter via an LLM call.
A vendor BAA doesn't prevent a developer from pasting a patient record into a prompt. Boundary redaction does. Eight HIPAA-mapped detectors (SSN, US phone, email, credit card, MRN, NPI, ICD-10, date of birth) scan every inbound AI request body before it forwards upstream. Per-category modes let your compliance team decide: audit, redact in-place with a category placeholder, or refuse with HTTP 422 and the matched categories in a response header.
- Eight detectors compiled into a RegexSet quick-check; hot-path scan adds under 1ms to AI requests.
- Three enforcement modes per category, set by Owner or Admin from the dashboard. Every change writes an audit row.
- Live Detections panel surfaces every match the gateway saw, joined back to the originating request_id for investigation.

Cap the workload, not the seat.
OpenAI, Anthropic, Bedrock: every major LLM provider caps users. We cap the workloads your users power. Set a monthly token quota per tenant; the gateway enforces it in real time across every provider, with a 5-minute reconcile loop that keeps the Redis counter and the Postgres rollup in sync. Type 0 to incident-freeze AI traffic during a compliance investigation; clear the cap to resume.
- Real-time 429 enforcement with monthly_quota_exceeded + Retry-After headers, not a silent end-of-month invoice surprise.
- Cap-freeze (limit = 0) refuses every AI request until an Owner clears it. Single-toggle incident response.
- Distributed enforcement via Redis with Postgres-rollup fallback. Gateway degrades gracefully instead of failing open.

Every AI dollar, every tenant, every provider, in one place.
Your CFO doesn't want three provider invoices and a spreadsheet. They want one number per tenant, per route, per day. We watch every AI request that crosses the boundary and roll token spend up across OpenAI, Anthropic, Bedrock, Azure OpenAI, and Google in one 30-day window. The Insights dashboard surfaces total tokens, estimated cost, top model, and top route at a glance, with per-route drill-down.
- Token spend tiles for every AI-aware route, refreshed every 5 minutes from rolled-up daily aggregates.
- Per-provider stacked bar so a CFO sees the OpenAI vs. Anthropic split without opening two billing portals.
- 30-day timeseries with sparse-data handling so a single busy day reads correctly against an otherwise quiet window.
One API. Five providers. Your keys.
Point one OpenAI-compatible, Anthropic-native endpoint at the gateway and reach five upstream integrations without rewriting a client. Every token streams back as the model generates it, over the same hybrid X25519MLKEM768 edge that fronts the rest of your traffic.
Five provider integrations
OpenAI, Anthropic, Google Gemini on both AI Studio and Vertex AI, and AWS Bedrock through the Converse API. One request shape in, a consistent response out. Move a workload between providers without touching client code.
Token-by-token streaming
Real incremental Server-Sent Events, not a buffered answer replayed in a single frame. The first token reaches your user the moment the model emits it, streamed over the hybrid PQ-TLS edge.
Self-service BYOK
Add your own provider key once from the dashboard. It stays encrypted in a key-management boundary and authenticates upstream on your behalf, so the model invoice stays with your provider. A missing key fails closed, never silently falls back.
Governed on every call
Every request is quota-checked and writes an append-only audit row before it leaves the boundary. On Growth and Enterprise that row is Ed25519-signed and anchored into a Merkle root, making any tampering or deletion detectable. A dropped mid-stream connection still settles its quota and records the call.
The question every CISO asks. Here is the honest answer.
We would rather tell you exactly how the gateway behaves under failure than quote an uptime number we have not earned yet.
Fail-closed by default
A gateway that cannot reach its quota store, credential store, or PHI scanner refuses the call rather than letting an ungoverned request through. For a compliance boundary, blocking is the safe failure mode, not leaking. Fail-open is an explicit, per-control opt-in, never the default.
Hermetic Rust data plane
The data plane is a single, self-contained Rust process: no garbage collector, so no surprise pauses under load, and memory-safe by construction. The control plane is physically decoupled, so an overloaded admin API never touches live inference traffic.
Provider fallback (roadmap, not shipped)
Today a hard upstream outage surfaces as a clean, typed error rather than a silent hang. Automatic fail-over across your own provider keys, so an OpenAI rate limit reroutes to your Anthropic key, is the next reliability milestone we are building, stated as a plan, not a shipped capability.
Put a governed boundary in front of your AI.
Start free on the Starter plan and route your first call in minutes, or talk to engineering about a design-partner pilot. Either way you get fail-closed budgets, boundary redaction, and a signed audit trail from day one.