Best practices
Best practices
Rollout
Start in
monitor. Deploy each control in monitor mode, watch the would-have-blocked stream viaGET /auditand the events, and measure your false-positive rate before enforcing.Enable the database stores. The audit is the product — point
audit.store=database(andfirewall_log/output_statsas needed) so you keep the forensic record.Set audit hygiene. Default
redactkeeps PII out; choosehashif you only need correlation,rawonly when you control the data.Wire one event listener to your SIEM/Slack so blocks and settings changes are visible in real time.
Flip to
enforceonce the monitor data looks clean.
Pattern authoring
- Patterns match the casefolded, normalized form — write them lowercase or with
/i. - Keep patterns anchored and bounded (
\b…\b,.{0,40}) to avoid catastrophic backtracking;pattern_safetybounds the backtrack limit and fails closed, but tight patterns are cheaper. - Bump
pattern_safety.ruleset_versionwhen you change the rules — it is stamped on every verdict and audit row, so you can correlate detections with the ruleset that produced them.
Tools
- Wrap every tool with
guard()— it is a no-op when disabled, so there’s no cost to always-on wrapping. - For destructive tools, also
routeForApproval()and restricthitl.allowed_tool_classesto those FQCNs. - Turn on
tool_authorization.enabledonly after defining the Gate ability — it fails closed.
Operations
- Schedule
ai-guardrails:purgewith a fixed--actor(e.g.scheduler) for GDPR retention; keep it--dry-runin staging. - Protect the HTTP API with real auth middleware — it exposes audit data and lets an operator change security settings. An empty middleware stack throws at boot by design.
- Mirror the CI mutation gate locally with Docker if you change the security logic.
Anti-patterns
- Don’t trust the model’s owner arguments — that’s the whole point of Control A.
- Don’t forward the full
InjectionAttempt/event payload to a third-party webhook — it carries the raw prompt. - Don’t run firewalled tools before authentication — a null principal with an owner key fails closed.
- Don’t treat
monitoras protection for Control D — destructive calls execute in monitor.