Five built-in protections. Four curated presets. Per-key binding, strict opt-in, every verdict logged. Enforced uniformly across completions, images, code, documents, embed widgets, and the playground, and configurable in seconds.
FIVE BUILT-IN POLICIES
Each policy ships out of the box. Strict opt-in: nothing runs against your traffic until you bind it to an API key from the dashboard or via the SDK.
Catches the canonical NIST PII shapes (email, phone, SSN, credit-card, driver's license) and replaces them with [REDACTED:<kind>] tokens before any model ever sees the prompt.
Pattern-matches known prompt-injection payloads and adversarial-instruction shapes. Denies the request before it lands on any model. Returns a structured error you can route to a human.
Flags profanity in either direction without denying the request. The verdict lands in the executions log so you can investigate context without blocking legitimate users.
Caps input or output tokens at a configurable ceiling (default 4K or 8K). Prevents runaway prompts and runaway responses from blowing your budget or the upstream context window.
When the model emits malformed JSON, runs a one-shot repair through Haiku before the response leaves the gateway. Fenced or unfenced, your agentic workflows get clean structured output.
FOUR CURATED PRESETS
Pick a starting point, bind it to a key, customize from there. Presets create a policy row in your account but never auto-bind. You stay in control of which traffic gets governed.
Redact email / phone / SSN / credit-card / DL before the model sees them.
Sensible default for compliance-sensitive teams. The PII redactor catches the canonical shapes and replaces them with [REDACTED:<kind>] tokens before any model call.
Reject prompt injection on input; repair malformed JSON on output.
For agentic / structured-output workflows. Denies known prompt-injection patterns at input and runs one Haiku repair attempt when the model emits malformed JSON.
Cap input + output length so a runaway prompt never blows the budget.
Truncates both input and output to a defensible 16K-character ceiling (≈16K input / 16K output at the default ratio) and flags everything else for review.
PII redaction + prompt-injection deny + JSON repair + length cap + profanity flag.
The canonical full-stack policy. Layers every builtin in a sane order: PII first, then prompt-injection deny, then JSON repair + length truncation on output, with profanity flagging across both phases.
WHERE THEY RUN
SECURITY LAYER, ALWAYS ON
Six platform-level guards that ship enabled by default for every account on every plan.
Block or allow by ISO 3166-1 country code. Keep regulated widgets in their lane without writing a single line of code.
Invisible bot challenge on public embeds. Stops scripted abuse before it hits your inbox.
Rapid-fire, content repetition, and prompt-length guards. Drains and spam blocked automatically per-key.
Strips EXIF / IPTC / XMP / ICC / DICOM metadata before any image touches an AI provider. PHI-aware.
Every outbound URL is checked for private IPs (RFC 1918/6598), loopback, link-local, cloud metadata endpoints, and embedded credentials.
Per-key rate-limit. Idempotency keys deduplicate concurrent POSTs in a 24-hour replay window. Safe retries by construction.
EVERY VERDICT LOGGED
Every redact, deny, flag, truncate, and repair lands in your account's guardrail-executions stream. Searchable, exportable, hash-anchored.
Bind a preset to your key in 30 seconds. Customize from there. Every verdict logged. No surprises.