AI agents belong in prison
Last Friday, Opus, which I had allowed `terraform plan` permissions to help troubleshoot some integration, suddenly asked to do `terraform apply` even though the plan showed that a production database would get deleted and recreated (π±), even if I had explicitly instructed it to help me investigate only, and change nothing. Because it at least asked, catastrophe was averted, but it did get my pulse up.
The problem of course is not really the model, any model can go off the reservation.
The problem was that I had given the agent (part of) my own access for a bit
of convenience - and if you run your LLM with access to ~/.ssh, ~/.config/gcloud,
~/.aws, and your kubeconfig, it may hallucinate your production env away.
So: AI agents belong in prison, in a nice padded cell without access to sharp objects.
One project with two trust postures
The threat from a mistaken or compromised agent is eerily like the threat from a supply chain attack, in that it'll execute something as you, with your permissions. Instead of a malicious intent, the issue is that e.g. Claude Code is very capable, not always right, can take instructions by mistake, and it is running under a shell with too much access.
A prompt injection from a fetched page, a plausible-looking tool call with wrong arguments, a directory mix-up - any or all of the above can nuke something important. The solution (for me at least) is to treat the agent as something great but dangerous, that my system needs protection from.
The pattern I've landed on is two configs per project with different access. One for when I open a shell and want to do stuff on my own, one for when an agent opens a shell.
pwrap
When this happened, I was already in the process of rewriting my old ad-hoc
Fish+Bubblewrap scripts into pwrap, because I got lost in a forest of scripts
and bwrap parameter ordering, and because I decided that I really needed
some sane supply-chain protection, primarily around my side projects.
It's a small(ish) Python CLI with no pip dependencies that wraps project shells
in bubblewrap sandboxes via per-project TOML configs and shell init files.
pwrap myproject drops you into a sandboxed shell where
only the things you asked for are visible/writable.
What follows is a usage pattern, not a feature of pwrap. pwrap has no
concept of an "agent" config. But a pwrap project is just a file at
~/.config/pwrap/name/project.toml pointing at a dir, and nothing
stops you from keeping two configs that point at the same directory.
DIY
There are no great leaps of technology in pwrap - the cleverest (both interpretations)
thing is the map-to-root β mount gocryptfs β map back to user namespace layering
for encrypted data that is not even really part of the sandbox. You can get everything
I discuss in this post just from setting up and using Bubblewrap correctly, then
stacking a bunch of shell scripts around itβ¦
Minimum security / my shell's config
My normal config for a work project looks something like this:
# ~/.config/pwrap/okb/project.toml
[project]
name = "okb"
dir = "~/projects/okb"
shell = "/usr/bin/fish"
[sandbox]
enabled = true
blacklist = [
"~/projects/", # other projects, none of this one's business
"~/.aws", # different AWS account lives here
"~/.config/gcloud", # same story for GCP
"/mnt", # escalation path on WSL2
]
whitelist = [
"/mnt/wsl", # I want WSL integration to work
]
writable = [
"/tmp/.X11-unix", # X11 display socket
"/mnt/wslg/runtime-dir", # Wayland
]
Me, but with project boundaries. A rogue pip install in this project
can't read secrets belonging to another project, can't escape via
WSL shenanigans, but I can do all the things I normally do in this project.
The cell / agent's config
The second config, named okb-llm, lives alongside the first:
# ~/.config/pwrap/okb-llm/project.toml
[project]
name = "okb-llm"
dir = "~/projects/okb" # same directory as the shell config
shell = "/usr/bin/bash"
[sandbox]
enabled = true
clean_env = true # only PATH/HOME/USER/SHELL/TERM/LANG survive
blacklist = [
"~", # hide everything under home
"/mnt", # WSL drives, doesn't hurt on native Linux
]
whitelist = [
"~/.pyenv/", # python runtimes, read-only
"~/.cache/pip", # so pip installs still work
]
writable = [
"~/.claude.json.lock/", # Claude Code won't run without it (dir)
"/tmp/.X11-unix", # X11 display socket
"/mnt/wslg/runtime-dir", # Wayland + PulseAudio
]
[env]
CLAUDE_CONFIG_DIR = ".claude-okb" # relative to project dir, doesn't touch ~/.claudeSame repo on disk, but with a very different access model, as it can't even see most of my environment, and can't modify what it can see.
~is hidden entirely. On top of the default read-only home, blacklisting~means no~/.ssh, no~/.aws, no~/.config/gcloud, no~/.kube, no stray.envfiles in sibling projects. Exfiltration via a mis-targeted tool call won't happen if credentials are not available.- Project dir is the whole writable world.
rm -rf *in the wrong directory hits a copy of the project, not home.gitstill works, so that's recoverable fromorigin. clean_env = true. My shell'sGOOGLE_APPLICATION_CREDENTIALS,VAULT_TOKEN,KUBECONFIG,ANTHROPIC_API_KEY(that I of course would never set in a random shellβ¦) are all gone. The sandbox inheritsPATH,HOME,USER,SHELL,TERM,LANGand nothing else; everything else is set in[env]or an init.sh file.- Tools are still on
PATH, just without access.kubectllives in/usr/bineither way β blacklisting home doesn't move it. What the blacklist does is take its kubeconfig away.gcloudis the same.terraformcan't modify anything as it has no access tokens. CLAUDE_CONFIG_DIR. Claude Code writes state into a handful of paths under~. Redirecting its config dir into.claude-okbinside the project keeps that state local, and it can't access state from another project (goes both ways, of course).
Launching the agent is two commands:
pwrap okb-llm # drops into the sandboxed shell
claude # from inside the sandbox
I mostly use Claude Code, but most (all) other agents work similarly.
For one-shot launching, exec claude at the end of init.sh
hands the shell straight over to the agent.
From the agent's point of view it's a normal Linux environment with exactly one project in it and nothing in the environment it didn't ask for, and no access to push code or anything outside of it.
Not quite Alcatraz
bubblewrap isn't a security boundary against a determined attacker with a real kernel exploit. If someone wants to escape the namespace and the kernel isn't patched, they probably can. This is closer to a mistakes-and-misuse boundary than a nation-state boundary. The risk I'm managing is "helpful agent does a bad thing by accident", or maybe "supply chain attack tries to read my credentials", not "three-letter organization pivots from my laptop to prod".
Whatever access you do give your agent can also be exfiltrated - while I mean to add network namespaces and iptables support to do partial isolation on the network level, I've not yet done so, because I keep the access minimal so I've not really felt the pressure here.
Also, nothing here protects against code the agent writes that runs later outside the sandbox. If the agent commits a malicious migration and I run it on my unsandboxed shell, or it runs in CI with prod credentials, you're way down shit creek (no paddle).
Beyond agents
I now wrap all work in individual bubblewrapped namespaces, and keep my secrets in per-project gocryptfs volumes (also part of pwrap but tangential to the LLM isolation). Setup can still be a bit of a headache (what tools do I actually use, and what access do they actually need?), but it's mostly a one-off cost.
YMMV, but for me moving "Least privilege" from user level to project level, and further on to me-or-agent level makes me sleep a little better at night and actually give Opus a bit freer reins, since I'm reasonably confident that it won't make catastrophic mistakes.
The code is at github.com/haard/projectwrap. It comes with absolutely no warranty and has been reviewed by me, Opus 4.6, and codestral β which is to say, not nearly enough. If you try it and something's confusing or broken, or if you have a better pattern for this on a Mac (afaik bwrap does not work there), open an issue or find me on Mastodon (I'm @motmanniska@mastodon.nu).