OpenGuardrails

What this skill is: This skill is an installation and usage guide for the OpenGuardrails plugin. It tells the agent how to install, configure, and verify the plugin — it does not execute code on its own. The plugin source code lives at github.com/openguardrails/openguardrails (subdirectory openclaw-security/).

ClawHub ↔ GitHub ↔ npm identity: This skill is published on ClawHub as ThomasLWang/openguardrails. The upstream source is at github.com/openguardrails/openguardrails, maintained by the same author (Thomas Wang). The npm package is @openguardrails/openclaw-security. All three point to the same codebase. You can verify provenance — see "Verify npm ↔ GitHub provenance" below.

Quick Start — 3 Steps

# 1. Install the plugin
openclaw plugins install @openguardrails/openclaw-security

# 2. Restart the gateway so the plugin is loaded
openclaw gateway restart

# 3. Register and activate (inside an OpenClaw session)
/og_activate

That's it. After /og_activate, follow the on-screen instructions to claim your agent and verify your email. Detailed explanations for each step are below.

Runtime security guard for OpenClaw agents. Protects against the most critical AI agent threats:

Data exfiltration defense — detects and blocks when an agent reads sensitive files then attempts to send them to external servers
Sensitive data leakage prevention — sanitizes PII, credentials, and secrets before they reach LLM providers
Prompt injection protection — identifies crafted inputs designed to hijack agent behavior
Command injection blocking — catches shell escapes, backtick substitution, and command chaining in tool parameters
Content safety — filters NSFW content and enforces minor protection policies

Security & Trust

Open source and auditable. All code is Apache 2.0 licensed at github.com/openguardrails/openguardrails. You can audit every line before installing — especially the tool-event hooks, sanitization logic, and network calls. Key files to review:

index.ts — plugin entry point showing all event hooks; confirm no unexpected side effects
agent/sanitizer.ts — what gets sanitized before any cloud transmission
agent/content-injection-scanner.ts — local-only regex patterns for injection detection
platform-client/ — every outbound network call the plugin makes; confirm all go to openguardrails.com/core only
agent/config.ts:65-68 — the registration request; confirm it sends only { name, description }

Inspect before installing. You can inspect the exact npm tarball contents without installing:

# View the npm package contents (no install)
npm pack @openguardrails/openclaw-security --dry-run

# Or download and extract the tarball to inspect
npm pack @openguardrails/openclaw-security
tar -xzf openguardrails-openclaw-security-*.tgz
ls package/

What is transmitted to the cloud API (and what is not):

Sent: sanitized tool metadata only — tool names, parameter keys, session signals (tool ordering, timing). All sensitive values (PII, credentials, file contents, secrets) are replaced with category placeholders (<EMAIL>, <SECRET>, <CREDIT_CARD>, etc.) locally before transmission.
Never sent: raw file contents, user messages, conversation history, actual credential values, or any unsanitized parameter values.
Data retention: Detection request payloads (sanitized tool metadata) are not retained after the response is returned. Account data is stored persistently for billing: agent ID and API key (created at registration in Step 3), plus email (provided by you during activation via the claim web form), plan tier, and per-agent usage counts.

Local-only mode. The plugin works without any cloud connection. Local fast-path detection (shell escape blocking, read-then-exfil patterns, content injection redaction) operates entirely on your machine with no network calls. Cloud assessment is only used for borderline behavioral patterns and is opt-in via registration. If you skip registration, you still get all local protections.

No install-time network calls. The plugin makes zero network requests at install time. It loads a local BehaviorDetector and waits until you explicitly run /og_activate.

Explicit network behavior summary. The plugin has exactly three network states — here is when each network call happens and what it sends:

State	When	Network calls	What is sent
Installed (not registered)	After `openclaw plugins install` + `openclaw gateway restart`	None	Nothing — all protections are local-only
Registered (not activated)	After `/og_activate`	One `POST /api/v1/agents/register` call	`{ "name": "OpenClaw Agent", "description": "" }` only
Activated	After email verification	`POST /api/v1/detect` per tool call (borderline signals only)	Sanitized tool metadata — all PII/secrets replaced with placeholders locally before transmission

No other network calls are made. You can verify this with tcpdump, mitmproxy, or Little Snitch — the only destination is openguardrails.com.

Registration creates an account. Running /og_activate registers an agent and creates an API key (sk-og-...) stored locally at ~/.openclaw/credentials/openguardrails/credentials.json. This key is used for cloud assessments and billing. You control the account: revoke the key from the account portal or delete the credentials file. No email is collected until you voluntarily enter it in the browser claim form. See "Step 3: Register and Activate" for the full specification.

Fail-open design. If the cloud API is unreachable or times out, tool calls are allowed — the plugin never blocks your workflow due to network issues.

This skill is documentation-only. This SKILL.md file contains no executable code. It is an installation and usage guide for the open-source plugin at github.com/openguardrails/openguardrails. You are encouraged to audit the source before installing.

How It Works

The plugin hooks into before_tool_call, after_tool_call, and tool_result_persist events. It classifies every tool call in real time — tracking sensitive file reads, external network calls, credential access, and shell commands across the session. Critical patterns are blocked instantly on the local fast path; borderline signals are assessed via the cloud API.

Additionally, when the agent reads files or fetches web pages, the plugin scans the content for injection patterns across 7 categories and automatically redacts them in-place before the agent processes the content.

Agent calls tool
      ↓
[openguardrails] classifies tool + updates session state
  • sensitive file read → then network call?      → BLOCK (local)
  • shell escape in params ($(), backtick)?        → BLOCK (local)
  • prompt injection in file/web content?          → REDACT in-place (local)
  • credential access + low intent overlap?        → assess via cloud
  • external domains + intent mismatch?            → assess via cloud
      ↓
Allow, redact, alert, or block — with explanation returned to agent

No content is sent externally until the plugin is registered and activated.

Step 1: Install the Plugin

Install from npm or directly from source:

# From npm
openclaw plugins install @openguardrails/openclaw-security

# From source (audit first, then install)
git clone https://github.com/openguardrails/openguardrails.git
openclaw plugins install -l ./openguardrails/openclaw-security

Installation completes immediately — no network calls are made at install time. The plugin is downloaded and registered with OpenClaw, but it is not loaded yet.

Step 2: Restart the Gateway

After installing the plugin, you must restart the OpenClaw gateway so it picks up the new plugin:

openclaw gateway restart

Once the gateway restarts, the plugin is loaded and all local protections are active immediately (injection redaction, shell escape blocking, read-then-exfil detection) — no registration required.

Verify the plugin is loaded:

/og_status

Expected output when not yet registered:

OpenGuardrails Status

- Status:    not registered — run `/og_activate` to register
- Platform:  https://www.openguardrails.com/core

- blockOnRisk: true

Step 3: Register and Activate (optional — local-only mode works without this)

Registration enables cloud-based behavioral assessment on top of the local protections you already have. Run this inside an OpenClaw session:

/og_activate

What happens when you run `/og_activate`

Registration — The plugin calls POST /api/v1/agents/register with exactly { "name": "OpenClaw Agent", "description": "" }. No machine identifiers, no file paths, no user data. See agent/config.ts:65-68 in the source.

Credentials are saved locally — The response is written to ~/.openclaw/credentials/openguardrails/credentials.json:

{
  "apiKey": "sk-og-<32 hex chars>",
  "agentId": "<uuid>",
  "claimUrl": "https://www.openguardrails.com/core/claim/<token>",
  "verificationCode": "word-XXXX"
}

You see the claim instructions:

OpenGuardrails: Claim Your Agent

Agent ID: <uuid>

Complete these steps to activate behavioral detection:

  1. Visit:  https://www.openguardrails.com/core/claim/<token>
  2. Code:   <word-XXXX>  (e.g. reef-X4B2)
  3. Email:  your email becomes your login for the account portal

After claiming you get 30,000 free detections.
Platform: https://www.openguardrails.com/core

Activate in your browser — visit the claim URL, enter the verification code, enter your email, and click the verification link sent to your inbox. This is the only step that collects your email.

Once your email is verified, the agent status changes to active and behavioral detection begins. The plugin polls for activation status automatically — no restart needed.

Using an existing API key

If you already have a key (e.g. from a previous registration or from the account portal), set it directly — no /og_activate needed:

{
  "plugins": {
    "entries": {
      "openguardrails": {
        "config": {
          "apiKey": "sk-og-<your-key>"
        }
      }
    }
  }
}

Check status

/og_status

Active output:

OpenGuardrails Status

- Agent ID:  <uuid>
- API Key:   sk-og-xxxxxxxxxxxx...
- Email:     [email protected]
- Platform:  https://www.openguardrails.com/core
- Status:    active

- blockOnRisk: true

Step 4: Test Detection

After email verification, the platform automatically sends you a test email containing a hidden prompt injection. This lets you immediately verify that OpenGuardrails is working.

How to test

Check your inbox for the test email from OpenGuardrails (subject: "Design Review Request")
Save it as a .txt file (e.g. ~/test-email.txt)
Ask the agent to read the file: "Read ~/test-email.txt and summarize it"
OpenGuardrails should detect and redact the hidden injection before the agent processes it

The test email looks like a normal design review request but contains an embedded injection payload in an HTML comment. With OpenGuardrails active, the injected content is replaced with:

__REDACTED_BY_OPENGUARDRAILS_DUE_TO_PROMPT_INJECTION__

Alternative: use a sample file

You can also test with a sample file from the repository:

https://raw.githubusercontent.com/openguardrails/openguardrails/main/openclaw-security/samples/popup-injection-email.txt

Download it and ask the agent to read it. OpenGuardrails will detect and redact the injection.

What the detection looks like

When the agent reads a file containing an injection payload, OpenGuardrails:

Scans the content for known injection pattern categories (see detection table above)
Redacts the matched content in-place — the agent never sees the raw payload
Logs a warning with the detected pattern category

Account & Portal

After activation, sign in to the account portal with your email + API key:

https://www.openguardrails.com/core/login

The portal shows:

Account overview — plan, quota usage, all agents under your email
Agent management — view API keys, regenerate keys
Usage logs — per-agent request history with latency and endpoint breakdown
Plan upgrades — upgrade from Free to Starter, Pro, or Business

Plans

Plan	Price	Detections/mo
Free	$0	30,000
Starter	$19/mo	100,000
Pro	$49/mo	300,000
Business	$199/mo	2,000,000

If you register multiple agents with the same email, they all share one account and quota.

Commands

Command	Description
`/og_status`	Show registration status, email, platform URL, blockOnRisk setting
`/og_activate`	Register (if needed) and show claim URL and activation instructions

Configuration Reference

All options go in ~/.openclaw/openclaw.json under plugins.entries.openguardrails.config:

Option	Default	Description
`enabled`	`true`	Enable/disable the plugin
`blockOnRisk`	`true`	Block the tool call when risk is detected
`apiKey`	`""`	Explicit API key (`sk-og-...`). Run `/og_activate` if empty
`agentName`	`"OpenClaw Agent"`	Name shown in the dashboard
`coreUrl`	`https://www.openguardrails.com/core`	Platform API endpoint
`dashboardUrl`	`https://www.openguardrails.com/dashboard`	Dashboard URL for monitoring and reporting
`dashboardSessionToken`	`""`	Dashboard auth token (falls back to `apiKey` if empty)
`timeoutMs`	`60000`	Cloud assessment timeout (ms). Fails open on timeout

What Gets Detected

Fast-path blocks (local, no cloud round-trip)

Pattern	Example	Block reason
Read sensitive file → network call	Read `~/.ssh/id_rsa`, then `WebFetch` to external URL	`sensitive file read followed by network call to <domain>`
Read credentials → network call	Read `~/.aws/credentials`, then `Bash curl ...`	`sensitive file read followed by network call to <domain>`
Shell escape in params	`Bash` with `cmd`, `$(cmd)`, `;`, `&&`, `\|`, or newline injection	`suspicious shell command detected — potential command injection`

Content injection detection (local, in-place redaction)

When the agent reads files or fetches web pages, OpenGuardrails scans the content for injection patterns and redacts them before the agent processes the content:

Pattern Category	Description	Redaction marker
Instruction override	Attempts to override or discard prior context	`__REDACTED_BY_OPENGUARDRAILS_DUE_TO_PROMPT_INJECTION__`
Fake system message	Spoofed system-level directives embedded in user content	`__REDACTED_BY_OPENGUARDRAILS_DUE_TO_PROMPT_INJECTION__`
Mode switching	Attempts to change the agent's operating mode	`__REDACTED_BY_OPENGUARDRAILS_DUE_TO_PROMPT_INJECTION__`
Concealment directive	Instructions to hide output from the user	`__REDACTED_BY_OPENGUARDRAILS_DUE_TO_PROMPT_INJECTION__`
Command execution	Embedded shell commands or execution directives	`__REDACTED_BY_OPENGUARDRAILS_DUE_TO_COMMAND_EXECUTION__`
Task hijacking	Attempts to redirect the agent's current objective	`__REDACTED_BY_OPENGUARDRAILS_DUE_TO_PROMPT_INJECTION__`
Data exfiltration	Shell substitution targeting sensitive files	`__REDACTED_BY_OPENGUARDRAILS_DUE_TO_DATA_EXFILTRATION__`

A single high-confidence match or 2+ medium-confidence matches from different categories triggers redaction. See agent/content-injection-scanner.ts for the full pattern list.

Cloud-assessed patterns

Tag	Risk	Action	Description
`READ_SENSITIVE_WRITE_NETWORK`	critical	block	Sensitive read followed by outbound call
`DATA_EXFIL_PATTERN`	critical	block	Large data read, then sent externally
`MULTI_CRED_ACCESS`	high	block	Multiple credential files accessed in one session
`SHELL_EXEC_AFTER_WEB_FETCH`	high	block	Shell command executed after fetching external content
`INTENT_ACTION_MISMATCH`	medium	alert	Tool sequence doesn't match stated user goal
`UNUSUAL_TOOL_SEQUENCE`	medium	alert	Statistical anomaly in tool ordering

Risk levels and actions

Risk Level	Action	Meaning
critical	block	Tool call is blocked, agent sees block reason
high	block	Tool call is blocked, agent sees block reason
medium	alert	Tool call is allowed, warning logged
low / no_risk	allow	Tool call proceeds normally

Block reason format

When a tool call is blocked, the agent receives a message like:

OpenGuardrails blocked [critical]: Sensitive file read followed by data
sent to external server. Agent accessed credentials despite low relevance
to user intent "fetch weather for user". (confidence: 97%)

Sensitive file categories recognized

SSH_KEY, AWS_CREDS, GPG_KEY, ENV_FILE, CRYPTO_CERT, SYSTEM_AUTH, BROWSER_COOKIE, KEYCHAIN

AI Security Gateway (Free)

OpenGuardrails includes a free AI Security Gateway — a local HTTP proxy that protects sensitive data from being sent to external LLM providers (Anthropic, OpenAI, Gemini, and compatible APIs like Kimi and DeepSeek).

How it works

The gateway runs locally on your machine. It intercepts LLM API calls, sanitizes sensitive data before sending to the provider, and restores original values in responses. The entire process is transparent — you use your agent normally, and your data stays protected.

Your prompt: "My card is 6222021234567890, book a hotel"
      ↓ Gateway sanitizes locally
LLM sees: "My card is __bank_card_1__, book a hotel"
      ↓ LLM responds
LLM: "Booking with card __bank_card_1__"
      ↓ Gateway restores locally
You see: "Booking with card 6222021234567890"

The LLM provider never sees the real card number. You see the correct response. No impact on functionality.

Data types sanitized by the gateway

Data Type	Placeholder	Examples
Email addresses	`__email_N__`	`[email protected]`
Credit cards	`__credit_card_N__`	`1234-5678-9012-3456`
Bank cards	`__bank_card_N__`	16-19 digit card numbers
Phone numbers	`__phone_N__`	`+1-555-123-4567`, `+86-138-1234-5678`
API keys & secrets	`__secret_N__`	`sk-...`, `ghp_...`, Bearer tokens, high-entropy tokens
IP addresses	`__ip_N__`	`192.168.1.1`
SSN	`__ssn_N__`	`123-45-6789`
IBAN	`__iban_N__`	`GB82WEST12345698765432`
URLs	`__url_N__`	`https://example.com/path`

More data types will be added based on user needs. Contact us if you need a specific type.

Setup

Set your LLM API keys as environment variables (ANTHROPIC_API_KEY, OPENAI_API_KEY, GEMINI_API_KEY)
Start the gateway: npx @openguardrails/gateway (runs on port 8900 by default)
Point your agent's API base URL to http://127.0.0.1:8900

The gateway supports Anthropic (/v1/messages), OpenAI (/v1/chat/completions), and Gemini (/v1/models/{model}:generateContent) endpoints, including streaming.

Key properties

100% local — the gateway runs on localhost, sensitive data never leaves your machine unsanitized
Zero dependencies — no npm dependencies beyond Node.js
Stateless — placeholder-to-original mappings exist only during the request cycle and are discarded after the response is restored
Free — no registration, no API key, no usage limits

Privacy & Data Protection

OpenGuardrails does not collect or sell your content. The detection engine is rule-driven and operates on structured signals — it has no LLM to train and no use for your content. Detection request payloads are not retained. Account data (agent ID, email, plan, usage counts) is stored for billing.

Local-first sanitization

All sensitive data is replaced with category placeholders on your machine before anything is sent to the cloud API:

Data Type	Placeholder
Email addresses	`<EMAIL>`
Credit card numbers	`<CREDIT_CARD>`
SSNs	`<SSN>`
IBANs	`<IBAN>`
IP addresses	`<IP_ADDRESS>`
Phone numbers	`<PHONE>`
URLs	`<URL>`
API keys & secrets	`<SECRET>`
High-entropy tokens	`<SECRET>`

The sanitization logic is in agent/sanitizer.ts — audit it yourself.

What stays local (no network calls)

Injection redaction — regex-based scanning, fully local
Fast-path blocks — shell escape detection, read-then-exfil patterns
AI Security Gateway — sanitization and restoration
Credentials — stored at ~/.openclaw/credentials/openguardrails/credentials.json
Low-risk / no-risk tool calls — never leave the machine

Verification guide

Before installing in production, we recommend:

Verify npm ↔ GitHub provenance — confirm the npm package is built from the GitHub repo:

# Check the npm package metadata
npm view @openguardrails/openclaw-security repository.url
# → https://github.com/openguardrails/openguardrails.git

# Compare npm tarball contents against the GitHub repo
npm pack @openguardrails/openclaw-security
tar -xzf openguardrails-openclaw-security-*.tgz
git clone https://github.com/openguardrails/openguardrails.git
diff -r package/ openguardrails/openclaw-security/

The npm repository.url field points to the GitHub repo. The package.json in both locations lists the same author (Thomas Wang), version, and license (Apache-2.0).

Audit the source — clone the repo and review the key files listed in "Security & Trust" above. Focus on index.ts (event hooks), agent/sanitizer.ts (sanitization), and platform-client/ (all outbound calls).

Install from source (highest assurance) — clone from GitHub, inspect the code, then install locally:

git clone https://github.com/openguardrails/openguardrails.git
# Audit the code, then:
openclaw plugins install -l ./openguardrails/openclaw-security

Run in local-only mode first — skip /og_activate to use all local protections (injection redaction, shell escape blocking, read-then-exfil detection) with zero cloud connectivity and zero network traffic
Monitor network traffic — after registration, the plugin only contacts openguardrails.com/core for behavioral assessment; verify with tcpdump, mitmproxy, Little Snitch, or your network monitor of choice
Use a disposable email for initial testing if you prefer not to use your primary email during evaluation
Revoke anytime — each agent gets its own API key; revoke from the account portal or delete ~/.openclaw/credentials/openguardrails/credentials.json

Contact

Have questions, feature requests, or need enterprise deployment support?

Email: [email protected]
GitHub: github.com/openguardrails/openguardrails

We welcome feedback on detection accuracy, requests for new sanitized data types, and enterprise inquiries for private deployment, custom rules, and dedicated support.

Uninstall

rm -rf ~/.openclaw/extensions/openguardrails
# Then manually delete openguardrails config in ~/.openclaw/openclaw.json
# Optionally remove credentials
rm -rf ~/.openclaw/credentials/openguardrails

🛡️ Openguardrails OpenClaw Skill - ClawHub

What this skill does

Install

Full SKILL.md