Your AI agent just ran
rm -rf /
— and you weren't asked. (deletes everything)

Name: geofrey.ai
Author: geofrey.ai

That's life with unguarded AI agents. Geofrey wraps Claude Code in a local safety layer. So nothing dangerous executes without your explicit approval.

View on GitHub Quick Start ↓

Safety layer / month

90%

Classified locally

L0–L3

Risk tiers

Messaging platforms

Comparison

Why not OpenClaw?

Three rebrandings later and still the same problems: critical CVEs, fire-and-forget approvals, and $200–600/month in API bills. We built something fundamentally different.

Attack Vector

OpenClaw

geofrey.ai

Monthly cost

$200–600

$0 orchestrator

Network exposure

42,000+ exposed instances

0 exposed ports

RCE vulnerabilities

CVE-2026-25253 (CVSS 8.8)

No web UI = no attack surface

Command injection

CVE-2026-25157, CVE-2026-24763

4-layer defense + shlex decomposition

Approval mechanism

Fire-and-forget (Issue #2402)

Structural blocking (Promise)

Marketplace security

7.1% of skills leak credentials

MCP with allowlist, no marketplace

Prompt injection defense

None

3-layer + MCP sanitization

Secret handling

Plaintext in local files

Env-only, Zod-validated, no logging

Image metadata defense

None

EXIF/XMP/IPTC stripping + injection scan

Audit trail

Basic plaintext logs

SHA-256 hash-chained JSONL

Data anonymization

None — all data sent unfiltered

Privacy rules + PII detection + email anonymization

Security Architecture

Four-tier risk classification

Every action is classified before execution. 90% handled instantly by deterministic patterns. No single point of failure.

L0 — Auto-Approve

Execute immediately

Safe read-only operations. Zero latency, zero cost. The agent proceeds without interrupting you.

read_filegit statuslscat

L1 — Notify

Execute + inform

Safe write operations. Executed immediately, but you get a notification about what happened.

write_filegit addgit branch

L2 — Require Approval

Block until approved

Dangerous operations. The agent is structurally suspended until you tap Approve or Deny.

delete_filegit commitnpm installshell_exec

L3 — Block Always

Refuse & log

Destructive commands. Always blocked, always logged. No override, no bypass mode, no exceptions.

rm -rfsudocurl | shpush --force

Command decomposition

Shlex-style split on &&, ||, ;, |, \n — each segment classified individually.

Deterministic classifier

Regex patterns block known dangerous commands in <1ms. Handles ~90% of all classifications.

LLM classifier

Qwen3 8B evaluates ambiguous commands (~10%). XML output format, JSON fallback.

Structural approval gate

Promise-based blocking. The agent is suspended — not polling, not timing out. No code path from "pending" to "execute" without the Promise resolving.

Structural blocking,
not policy checking

OpenClaw's approval is fire-and-forget — the tool returns before the user approves (Issue #2402). geofrey.ai uses a JavaScript Promise that structurally suspends the agent loop. Not a policy that can be overridden — a property of the execution flow.

// OpenClaw: fire-and-forget (broken)
void (async () => { /* returns in ~16ms */ })();

// geofrey.ai: structural blocking
const { nonce, promise } = createApproval(tool, args);
const approved = await promise; // suspended here
if (!approved) throw new Error('Denied');

Features

Built for control

Claude Code does the heavy lifting. Your local LLM guards every action. You approve via your messaging app.

</>

Claude Code integration

Complex coding tasks delegated to Claude Code CLI with risk-scoped tool profiles. Live streaming to your messaging app.

Multi-platform messaging

Telegram, WhatsApp, Signal, Slack, Discord, WebChat. Approve or deny from the app you already use. No web UI to expose.

⚡

MCP ecosystem

10,000+ community tool servers via Model Context Protocol. Every call wrapped by risk classifier. Explicit allowlist.

🔗

Hash-chained audit

Every action logged with SHA-256 hash chain. Tamper-evident: one modified entry breaks the entire chain.

🧠

Hybrid classification

Deterministic regex handles 90% of classifications in <1ms. LLM fallback only for edge cases.

🛡

Prompt injection defense

3-layer isolation: user input, tool output, model response. MCP responses Zod-validated and instruction-filtered.

🔒

Secret isolation

All credentials from env vars only. No token logging. Sensitive paths (.env, .ssh) are L3-blocked.

📁

Filesystem confinement

All file operations pass through confine() — paths outside the project directory are rejected.

🖼

Image metadata defense

EXIF/XMP/IPTC stripped before images reach the LLM. Metadata scanned for prompt injection patterns.

🌐

Privacy layer

Privacy rules DB with per-entity allow/anonymize/block decisions. Local vision model classifies images (faces → block). Emails anonymized before cloud APIs. Output filter catches leaked credentials.

💰

Cost transparency

Per-request cost display. Budget alerts. Every API call tracked with cloud vs. local token breakdown.

🌍

i18n support

German + English with typed translation keys. Setup wizard, approvals, and errors in your language.

🔧

Auto-tooling

Detects capability gaps and builds standalone programs in Docker-isolated Claude Code. Registers as cron job or background process automatically.

⏰

Proactive agent

Morning briefings, calendar reminders, email monitoring. All privacy-filtered through the local orchestrator. Runs on your schedule.

💻

20 local-ops tools

File, directory, text, system, and archive operations handled natively. Zero cloud tokens, instant execution, no API cost.

Cost

Stop paying for orchestration

The local LLM handles intent classification, risk assessment, and communication. Cloud APIs only for complex coding tasks.

OpenClaw

$200–600

per month (moderate use)

10K token system prompt resent every API call
4,320+ background API calls/month (monitoring)
Every classification = paid cloud API roundtrip
Power users report up to $3,600/month

geofrey.ai

$0–30

per month (same workload)

Orchestrator runs locally (Qwen3 8B, loaded once)
Zero background API calls (event-driven)
90% of classifications handled by regex (<1ms, free)
Cloud API only for complex coding tasks

Hardware

Runs on your machine

One tested default today, configurable via ORCHESTRATOR_MODEL. Fits on an M-series MacBook.

Tested Default

Standard

18GB+

Qwen3 8B (5GB Q4)

$0 / month

M-series Mac or equivalent. 0.933 F1 tool-call accuracy. ~40 tok/s inference. 90% of classifications handled by deterministic regex.

Coming Soon

Power

64GB+

Qwen3 8B + Qwen3-Coder-Next

$0 / month

Tiered routing: simple code tasks handled locally, complex tasks to Claude API. Saves ~30–40% API costs.

Got a machine with 64GB+ RAM? Qwen3-Coder-Next is an 80B MoE model with only 3B active parameters, achieving 70.6% on SWE-Bench Verified at near-3B cost (~52GB Q4). Zero API costs for simple coding tasks.

Getting Started

Up and running in 5 minutes

Interactive setup wizard handles prerequisites, credentials, and platform configuration.

Clone & install

Clone the repository and install dependencies with pnpm.

Pull the model

Download Qwen3 8B via Ollama (~5GB, one-time download).

Run setup wizard

pnpm setup — auto-detects prerequisites, validates credentials, configures your messaging platform.

Start the agent

pnpm dev for development or pnpm build && pnpm start for production.

terminal

# Clone & install $ git clone https://github.com/slavko-at-klincov-it/geofrey.ai.git $ cd geofrey.ai && pnpm install # Pull orchestrator model $ ollama pull qwen3:8b pulling manifest... done pulling model... 100% ████████████ 5.0GB # Interactive setup wizard $ pnpm setup ✔ Node.js 22.4.0 detected ✔ Ollama running (qwen3:8b loaded) ✔ Claude Code CLI authenticated ? Platform: Telegram ✔ Bot token validated ✔ .env generated — run pnpm dev to start # Start the agent $ pnpm dev geofrey.ai v1.0.0 — listening on Telegram risk classifier: loaded (53 patterns) audit log: ./data/audit/2026-02-14.jsonl

Open Source

Take back control

geofrey.ai is MIT-licensed. Claude Code does the work. A local LLM makes sure nothing goes wrong. Read the code, verify the claims, run it on your machine.

Star on GitHub Read the Whitepaper

Your AI agent just ranrm -rf /— and you weren't asked. (deletes everything)

Why not OpenClaw?

Four-tier risk classification

Command decomposition

Deterministic classifier

LLM classifier

Structural approval gate

Structural blocking,not policy checking

Built for control

Claude Code integration

Multi-platform messaging

MCP ecosystem

Hash-chained audit

Hybrid classification

Prompt injection defense

Secret isolation

Filesystem confinement

Image metadata defense

Privacy layer

Cost transparency

i18n support

Auto-tooling

Proactive agent

20 local-ops tools

Stop paying for orchestration

Runs on your machine

Up and running in 5 minutes

Clone & install

Pull the model

Run setup wizard

Start the agent

Take back control

Your AI agent just ran
rm -rf /
— and you weren't asked. (deletes everything)

Structural blocking,
not policy checking