← ~/articles

One Thread Talking, Many Threads Working

Lion Hummer/

I had a problem: Kit was slow when it was doing real work. Not model-slow. Mechanically slow. It would be ten tool calls deep into a research task and I'd ask a question, and it couldn't answer until the whole chain finished. The conversation was hostage to whatever the agent was currently doing.

The fix was multithreading.

Not subagents. Threads.

Most agent frameworks solve this with subagents: spawn a separate agent, give it a task, collect the result. I didn't want that. A subagent is a different entity. It has a different name, a different identity, a different relationship to the user. When you tell someone "I've dispatched a subagent to handle that," it sounds like you passed the work to an intern.

Kit doesn't delegate. Kit multithreads. The distinction is deliberate and it's baked into the system prompt: Kit never says "I've started a subagent." It says "I'm working on that in the background." Because it is. It's still Kit. It's just Kit doing more than one thing.

Humans are bad at this. We interleave, we context-switch, and every switch costs us. Kit doesn't have that problem. A background thread has its own context, its own tool loop, and runs independently. The main thread doesn't track it, doesn't monitor it, doesn't lose focus because of it.

How it works

The main thread is an orchestrator with a restricted tool set: start_work, check_work, answer_work, plus todos and memory. It can't run shell commands or write files directly. When I ask Kit to do something substantial, it calls start_work, which spawns a thread with the full toolset (sandbox, filesystem, git, browser) plus two special tools: ask_main_thread for clarification and report_complete to deliver results.

The ask_main_thread part matters. Without it, every task has to be perfectly specified upfront. With it, Kit's background thread just asks when it's stuck, the main thread answers, and work continues. The round-trip is fast.

All threads share the same workspace. They write to the same filesystem, versioned by the same git repo. When a thread finishes, the result surfaces back to the main thread. If I'm mid-conversation, Kit weaves it in. If I'm away, it goes through the todo/debrief flow.

The result

I can ask Kit to research something, then immediately ask it an unrelated question. It answers instantly. Three threads can be running in the background and the conversation doesn't slow down at all. Kit just says "I'm on it" and keeps talking.

Keeping the orchestrator's tool set small actually helps. The model makes better decisions with fewer options. It stops trying to do the work itself and learns when to spin up a thread. And because it's all still Kit, the experience is seamless. There's no handoff, no "your subagent has completed." Just: "that's done, here's what I found."

One Thread Talking, Many Threads Working — Lion Hummer