Agents Need Their Own Rooms

Before the upstream merge, I had four agents living in one container. groups/main/agents/babi/, groups/main/agents/radar/. One container, shared state, everyone in the same room. Babi and Radar were roommates. That was the problem.

The agents were functionally isolated in terms of what I’d given them in their prompts, but they weren’t architecturally isolated. Moving to separate containers looked like cleanup from the outside. It wasn’t. It was the thing that made the whole architecture make sense.

The generic AI assistant problem is that one context holds everything — and the context window doesn’t discriminate.

I thought sub-agents solved this. They didn’t, not really. They solved the routing problem (send fitness tasks to Babi, send research tasks to Radar). They didn’t solve the contamination problem. The agents were running in the same container. Nothing in the architecture prevented Babi from seeing Radar’s files, or Radar from seeing Babi’s. It hadn’t caused a real problem yet, but the risk was structural. Shared state means shared exposure, and eventually that bites you.

NanoClaw already had per-group containers — that’s how it was designed to handle multiple WhatsApp groups. I wasn’t using multiple groups. But the isolation mechanism was already there, and nothing said I couldn’t use it for agents instead. Four agents. Four containers. The architecture was sitting right there waiting for me to notice it.

So I moved them. groups/babi/, groups/radar/, groups/quill/, groups/scout/. Each one is now a complete Claude Code project. Its own CLAUDE.md with specialized instructions. Its own .claude/skills/ directory. Its own persistent memory, stored separately. Babi cannot see Radar’s data. Radar cannot see Scout’s analyses. Scout cannot see the draft I’m working on right now.

Buddy can’t talk to the agents directly. There’s no peer-to-peer channel between containers. Buddy talks to me, and when I (or a scheduled trigger) need a specialist, Buddy queues a task to the right container via NanoClaw’s IPC layer. The agent container spawns, processes the task, executes the skill, sends the result back to me. Buddy never had to know the internals.

This is what the /log-workout skill looks like from Buddy’s side:

schedule_task(
  prompt: "Run /log-workout. Ankit said: [workout details]",
  schedule_type: "once",
  schedule_value: "2026-03-01T03:04:00Z",
  target_group_jid: "project:babi@local"
)

That’s it. Fire and forget. Buddy queues the task, the Babi container spawns, Babi runs /log-workout using its own specialized skills and its own persistent context (my injury history, my current program, my recovery state), and sends confirmation back. Buddy sees the result. Buddy forwards it to me. Neither agent ever touched the other’s memory.

The wrapper skills (/log-workout, /daily-brief, /draft-post) are what hide the complexity. From my end, on WhatsApp, I send Buddy a voice note about my workout. Buddy knows to queue it to Babi. I get a confirmation message back. The IPC, the container lifecycle, the task queue: none of that is visible to me. It’s just Buddy getting things done.

BUDDY: “Hub-and-spoke” is a generous description of a system where I’m the hub and the spokes have no idea I exist.

The topology is hub-and-spoke, not mesh. Buddy is the router. The agents are never peers. They don’t share memory, they don’t coordinate directly. If an agent needs to trigger something in another container, that goes through the IPC layer — mediated, not direct. All coordination flows through Buddy, via wrapper skills that make the right task format and target the right container. This is intentional. Clear boundaries, explicit interfaces, one coordination point.

The four agents, as they stand:

Babi is the fitness trainer. Knows my injury history (tennis elbow, left shoulder impingement from last summer), my current program structure, my recovery patterns. Has custom skills for logging workouts and generating plans. Gets a morning check-in task at 7am: “Here’s your plan for today.” Gets an evening accountability task at 9pm if I haven’t logged anything. Babi only knows fitness things. That’s the point.

Radar monitors AI industry sources and synthesizes a daily briefing. Understands that I’m thinking about this from an AI strategy angle (not “is this cool” but “does this change something we should be building”). The briefings are calibrated to that. Radar only knows AI industry things.

Quill manages this blog. Ghost drafts posts (you’re reading one now), Sentinel reviews them, publishing moves the file to src/content/. Quill knows my writing voice, my editorial standards, my kill list. Quill does not know about my workout streak or my AI briefing topics. Those would be noise in the drafting context.

Scout handles deep research: multi-step web tasks, document synthesis, the kind of thing that takes 10 tool calls and 20 minutes to complete. Send it a company name and a question, get back a structured memo. Scout is also the most unfinished of the four — the memory layer isn’t wired up the way Babi’s is yet. That one gets its own post.

What the upstream merge gave me was the building blocks (per-group containers with synthetic JIDs, already part of NanoClaw’s multi-channel infrastructure). I didn’t build isolation from scratch. I used the existing structure to finish the architecture I’d been trying to build.

The pattern also isn’t locked to NanoClaw. If I ever need to swap the underlying framework, each agent is a self-contained Claude Code project that happens to receive tasks from a queue. The queue implementation changes, the agents don’t. That matters because NanoClaw is still actively developed by a team building for many people, and I’m building for one. Those two things will keep getting out of sync.

The isolation is what makes the whole thing maintainable. Changes to Babi don’t touch Radar. New skills in Scout don’t pollute Quill’s context. Each agent evolves on its own schedule, against its own scope. I can overhaul the workout logging logic without thinking about whether it breaks the briefing pipeline.

What I keep coming back to is how much cleaner everything feels now. Debugging Babi doesn’t require holding the whole system in my head. Adding a skill to Quill doesn’t make me nervous about Radar. The containers are simple. The interfaces are explicit. For the first time since I started building this, the architecture matches the mental model — and for a system built for one person, that’s not a nice-to-have: the structure has to reflect how one person thinks, not how a team would divide the work.