AI Gateway · Live in production

Your engineers already use AI.
Govern it.

One gateway in front of OpenAI, Anthropic, Google Gemini, and AWS Bedrock. Fail-closed budgets so a single workload can't drain the account, PHI redaction at the boundary, and a cryptographically signed audit log for every call.

Bring your own provider keys; the invoice stays with your provider. Every call streams token-by-token over the same hybrid post-quantum TLS edge. Live today at app.edge.scrutari.ai.

Get Starter · $49/mo Talk to engineering

One API

OpenAI · Anthropic · Gemini · Bedrock

Fail-closed

Per-Tenant Token Budgets

Signed + Merkle

Tamper-Evident Audit

PQ-TLS

Post-Quantum Edge

MCP Security Gateway

Govern the tools your AI can call

Connect Claude, IDEs, and agents to your internal MCP servers through the same post-quantum edge. Every tool call passes a per-server allow-list, boundary PHI and PII redaction, and a signed, Merkle-anchored audit record. OAuth on-behalf-of means each upstream server only ever sees the verified tenant, never another customer's data.

Allow-listed

Per-Server Tool Control

On-Behalf-Of

Tenant-Isolated Upstreams

Signed + Merkle

Every Tool Call Audited

PQ-TLS

Post-Quantum Edge

AI Gateway · Live in production

LLM providers cap your seats. We cap your workloads.

A 100-person engineering team has 100 seats and 12 internal AI products. Your CFO budgets the 12, not the 100. Your CISO audits the data crossing each one, not the seat that signed the request. Per-seat caps from OpenAI, Anthropic, or Bedrock can't see the workload. We can.

CISO compliance

PHI never leaves your perimeter via an LLM call.

A vendor BAA doesn't prevent a developer from pasting a patient record into a prompt. Boundary redaction does. Eight HIPAA-mapped detectors (SSN, US phone, email, credit card, MRN, NPI, ICD-10, date of birth) scan every inbound AI request body before it forwards upstream. Per-category modes let your compliance team decide: audit, redact in-place with a category placeholder, or refuse with HTTP 422 and the matched categories in a response header.

Eight detectors compiled into a RegexSet quick-check; hot-path scan adds under 1ms to AI requests.
Three enforcement modes per category, set by Owner or Admin from the dashboard. Every change writes an audit row.
Live Detections panel surfaces every match the gateway saw, joined back to the originating request_id for investigation.

IT control

Cap the workload, not the seat.

OpenAI, Anthropic, Bedrock: every major LLM provider caps users. We cap the workloads your users power. Set a monthly token quota per tenant; the gateway enforces it in real time across every provider, with a 5-minute reconcile loop that keeps the Redis counter and the Postgres rollup in sync. Type 0 to incident-freeze AI traffic during a compliance investigation; clear the cap to resume.

Real-time 429 enforcement with monthly_quota_exceeded + Retry-After headers, not a silent end-of-month invoice surprise.
Cap-freeze (limit = 0) refuses every AI request until an Owner clears it. Single-toggle incident response.
Distributed enforcement via Redis with Postgres-rollup fallback. Gateway degrades gracefully instead of failing open.

CFO visibility

Every AI dollar, every tenant, every provider, in one place.

Your CFO doesn't want three provider invoices and a spreadsheet. They want one number per tenant, per route, per day. We watch every AI request that crosses the boundary and roll token spend up across OpenAI, Anthropic, Bedrock, Azure OpenAI, and Google in one 30-day window. The Insights dashboard surfaces total tokens, estimated cost, top model, and top route at a glance, with per-route drill-down.

Token spend tiles for every AI-aware route, refreshed every 5 minutes from rolled-up daily aggregates.
Per-provider stacked bar so a CFO sees the OpenAI vs. Anthropic split without opening two billing portals.
30-day timeseries with sparse-data handling so a single busy day reads correctly against an otherwise quiet window.

Talk to an engineer about a 30-day pilot

Unified inference API

One API. Five providers. Your keys.

Point one OpenAI-compatible, Anthropic-native endpoint at the gateway and reach five upstream integrations without rewriting a client. Every token streams back as the model generates it, over the same hybrid X25519MLKEM768 edge that fronts the rest of your traffic.

Five provider integrations

OpenAI, Anthropic, Google Gemini on both AI Studio and Vertex AI, and AWS Bedrock through the Converse API. One request shape in, a consistent response out. Move a workload between providers without touching client code.

Token-by-token streaming

Real incremental Server-Sent Events, not a buffered answer replayed in a single frame. The first token reaches your user the moment the model emits it, streamed over the hybrid PQ-TLS edge.

Self-service BYOK

Add your own provider key once from the dashboard. It stays encrypted in a key-management boundary and authenticates upstream on your behalf, so the model invoice stays with your provider. A missing key fails closed, never silently falls back.

Governed on every call

Every request is quota-checked and writes an append-only audit row before it leaves the boundary. On Growth and Enterprise that row is Ed25519-signed and anchored into a Merkle root, making any tampering or deletion detectable. A dropped mid-stream connection still settles its quota and records the call.

Transport

Your inference rides the same hybrid post-quantum edge, now over HTTP/3 (QUIC). X25519MLKEM768 on the public client hop, terminated on our own infrastructure behind a stock layer-4 load balancer. No CDN in the path.

See the transport story

Response caching

Stop paying twice for the same answer.

Turn caching on per model route. When an identical request comes back, the gateway serves the stored answer and bills you nothing for the upstream call. Because you bring your own provider keys, that is your invoice that shrinks, not ours.

Exact-match, not fuzzy

A hit is keyed by a SHA-256 over the normalized request: tenant, model, system prompt, messages, token cap, and temperature. Identical inputs return the identical answer; change any of them and you get a fresh call. No similarity guessing, so a cache hit is always the answer that request would have produced, never a near-miss from a different prompt.

Zero upstream cost on a hit

A cached response never reaches the provider, so it is recorded with zero token usage and counts nothing against your spend or your monthly budget. It works for both JSON and streaming calls: a streamed hit replays the stored answer over SSE, and the audit log records it as a distinct cache-hit event.

Opt-in per route, PHI-aware

Caching is off by default and enabled per model route, so a tool-using, real-time, or PHI-sensitive workload is never cached unless you choose to. The cache is held in memory at the gateway and scoped per tenant, and a bypass header forces a fresh call whenever you need one.

What if you go down?

The question every CISO asks. Here is the honest answer.

We would rather tell you exactly how the gateway behaves under failure than quote an uptime number we have not earned yet.

Fail-closed by default

A gateway that cannot reach its quota store, credential store, or PHI scanner refuses the call rather than letting an ungoverned request through. For a compliance boundary, blocking is the safe failure mode, not leaking. Fail-open is an explicit, per-control opt-in, never the default.

Hermetic Rust data plane

The data plane is a single, self-contained Rust process: no garbage collector, so no surprise pauses under load, and memory-safe by construction. The control plane is physically decoupled, so an overloaded admin API never touches live inference traffic.

Automatic provider failover

If an upstream rate-limits (429) or has a transient outage (503), the gateway reroutes to the next provider in the chain you configured, so an OpenAI rate limit can fail over to your Anthropic key without your users seeing it. Only retryable failures fail over; a content-filter block stays terminal by design, and you are billed only for the provider that actually served the request. For streaming, failover happens on the connection handshake, before the first token reaches the client.

Put a governed boundary in front of your AI.

Get started on the Starter plan for $49 a month and route your first call in minutes, or talk to engineering about a design-partner pilot. Either way you get fail-closed budgets, boundary redaction, and a signed audit trail from day one.