← ~/articles

Mid-Turn: What Happens When You Interrupt an Agent

Lion Hummer//#ai#agents#ux

Most agent systems have a clean mental model: you send a message, the agent thinks, the agent responds, you send another message. Turns are atomic. Sequential. The agent finishes what it's doing before it hears from you again.

Except that's not how anyone actually uses these things.

I discovered this the hard way while building my assistant. I'd send a request — "analyze these files and write a summary" — and then thirty seconds later remember something important. "Oh wait, ignore the test files." Or I'd realize mid-task that I'd given the wrong directory. Or I'd just get impatient and want to redirect entirely: "actually, never mind that, do this other thing instead."

The question is: what happens to that second message?

The Two Paths

There are two reasonable answers, and I've implemented both.

Inject means the message appears immediately in the agent's current context. Mid-turn. Mid-thought. The agent is in the middle of processing your first request when your second message just... shows up. Like tapping someone on the shoulder while they're working.

Queue means the message waits politely in line. The agent finishes whatever it's doing for the current turn, returns a response, and then picks up your queued message as if it were a fresh new turn.

Both sound reasonable. Both have trade-offs. And both work better than I expected.

Inject: Changing Course Mid-Flight

The first time I sent an injected message, I half-expected it to break something. The agent was already three function calls deep into a task, and I was basically inserting a new user message into the context while it was still generating.

It just... worked.

The agent read the injected message, acknowledged the redirect, and smoothly pivoted. No confusion, no "wait, I was in the middle of something." It incorporated the new information naturally and adjusted its approach on the fly.

This makes sense if you think about how these models work. They're not executing a linear plan that gets disrupted. They're generating tokens based on context, and the context just got updated. From the model's perspective, it's all just context. A message from five minutes ago and a message from five seconds ago are both just there, in the history, informing what comes next.

The experience is weirdly natural. It feels like working with someone who's actually paying attention — someone you can course-correct in real time without waiting for them to finish their current train of thought.

Queue: The Polite Alternative

But injection has a downside: it's interruptive by design. Sometimes you don't want to derail the agent mid-task. Sometimes the agent is clearly in the middle of something useful, and your follow-up message is additive, not corrective. You don't want to tap them on the shoulder — you want to leave them a note they'll see when they're done.

That's where queueing comes in.

When you queue a message, the agent doesn't see it until the current turn completes. It finishes its work, returns a response, and then the queued message becomes the next turn. Clean. Sequential. No interruption.

What surprised me is how well agents handle this, too. I expected some weirdness — some "wait, why am I being told this now?" confusion. But no. The agent just picks up the queued message as if it were a normal follow-up. There's no cognitive dissonance, no awkward seam between turns.

I think this works because agents are pretty good at dealing with asynchronous information. They don't have a strong sense of "real time" — they just know what's in their context at the moment they're invoked. If a message arrives after they finish turn N, it's just part of turn N+1. The fact that you typed it while they were still working on turn N doesn't matter.

The Bug I Found

Of course, this only works if the infrastructure is solid. And for a while, it wasn't.

I noticed that when I injected messages mid-turn, the agent would respond to them — acknowledge the redirect, adjust its behavior — but then later in the conversation, it would act like those messages never happened. Context loss. The injected messages were being used in the moment but weren't getting properly stored in the conversation history.

This was subtle enough that I didn't catch it immediately. The agent seemed to be handling injections fine. But a few turns later, I'd reference something from the injected message, and the agent would have no idea what I was talking about.

The fix was straightforward once I understood the issue: injected messages need to be written to the history store, just like any other message, with proper timestamps and ordering. But it's the kind of bug that only shows up when you actually use the system in the messy, impatient way that real users do.

Which is a broader point: you can't design agent UX from first principles alone. You have to actually use the thing, get annoyed by it, break it, and see what happens.

What This Means for Agent UX

Here's the thing that keeps nagging at me: if agents handle both injection and queueing gracefully, maybe the entire turn-based model is wrong.

Or at least, maybe it's a leaky abstraction that doesn't match how people actually want to interact with agents.

The traditional chat interface enforces strict turn-taking: you talk, the agent talks, you talk, the agent talks. But that's not how human collaboration works. Real conversations are messier. You interject. You refine. You remember something important and blurt it out before the other person finishes their thought.

Agents can handle that. They're surprisingly robust to mid-turn updates. The rigid turn structure is a UI constraint, not a model constraint.

I'm not saying we should throw out turns entirely. There's value in having clear boundaries between interactions. But I think there's design space here that most agent systems haven't explored. What if the default was that you could always send another message, and the system would intelligently decide whether to inject it or queue it based on context? What if the interface made it clear when the agent was "busy" versus "listening," and you could choose to interrupt or wait?

Right now, most agent interfaces pretend that turns are atomic because it's simpler to build and reason about. But we're leaving user experience on the table.

The Deeper Pattern

There's a pattern here that applies beyond just message timing: agents are more flexible than we give them credit for, and a lot of our design constraints are about making the system easier for us to reason about, not about actual limitations of the models.

We treat turns as atomic because it's a clean abstraction. We make users wait because it's easier than handling interruptions. We enforce sequential processing because it's predictable.

But when you actually test these assumptions — when you inject a message mid-turn, or queue up three requests while the agent is working, or completely redirect mid-task — the agent just rolls with it. The models are doing something closer to "continuous reasoning over available context" than "executing a predetermined plan."

Which means the bottleneck isn't the model. It's the interface. It's the assumptions we make when we design these systems.

I don't have all the answers here. I'm still figuring out what the right UX is for agents that can handle asynchronous, interruptible, non-sequential interaction. But I know that "one message at a time, wait your turn" isn't it.

The agents are ready for something more fluid. Now we just have to build the interface that lets them breathe.