Building an AI Agent from Scratch
I work at Obvious.ai. We're building what I genuinely believe is the most capable AI agent out there. I spend my days thinking about agent architecture, debugging tool loops, and arguing about the right way to handle memory and context. And yet, until last week, I had never actually built an agent from scratch myself.
The irony wasn't lost on me. It's like being a car designer who's never changed their own oil — you understand the system, sure, but you're missing something fundamental about how the pieces actually fit together. So I decided to fix that. I fired up Cursor, loaded up Opus 4.6, and gave myself a weekend to build a real agent. Not a chatbot. Not a wrapper around an API. An actual agent with tool use, memory, and the ability to accomplish multi-step tasks.
What shocked me wasn't that I could do it. What shocked me was how fast it came together.
The Thing About "From Scratch"
Let's be clear about what "from scratch" means here, because I think there's a useful distinction. I didn't write my own transformer architecture or train my own weights. That would be like saying you're building a car from scratch by first mining iron ore. When I say "from scratch," I mean: starting with nothing but an API key and building up all the scaffolding that makes an agent actually work.
That means implementing the tool loop — the cycle where the model decides what to do, executes a tool, observes the result, and decides what to do next. It means writing system prompts that actually guide behavior instead of just setting vibes. It means figuring out how to handle memory so the agent can maintain context across multiple turns without bleeding tokens everywhere. It means error handling, retry logic, and all the unglamorous plumbing that separates a demo from something that actually works.
I've used plenty of agents. I use them every day at work. But using an agent is like driving a car with an automatic transmission — you get where you need to go, but you don't really understand what's happening under the hood. Building one yourself is different. You feel every gear shift.
Cursor + Opus: Unreasonably Effective
I started with Cursor because, well, I use it for everything these days. But I was genuinely curious how much of the heavy lifting it could handle for something this architectural. Turns out: a lot.
I began by outlining the basic structure in comments — just rough pseudocode for how the tool loop should work. Cursor filled in a surprisingly coherent first pass. Not perfect, but coherent. The kind of code you'd get from a sharp junior engineer who understood the concept but hadn't hit all the edge cases yet.
Then I switched to Opus in the CLI for the harder parts — the system prompt engineering, the tool schema design, the memory management logic. This is where I expected to hit friction. These are the parts that feel more like craft than engineering, where you need to iterate and feel your way forward.
But Opus just... got it. I'd describe what I wanted the agent to do, mention a few examples of failure modes I was worried about, and it would come back with prompts that actually worked. Not placeholder prompts that look good but collapse under pressure. Actual, working prompts that handled ambiguity and kept the agent on track.
By the end of Saturday, I had a working tool loop. By Sunday afternoon, I had memory persistence and basic error recovery. The whole thing was maybe 800 lines of Python. It felt almost embarrassingly fast.
What I Learned That I Couldn't Have Learned Before
Here's the thing nobody tells you about agents: the hard part isn't the AI. The hard part is everything else.
The model itself is shockingly capable. Give it a clear task, well-defined tools, and decent context, and it'll figure things out. What's hard is handling all the ways reality intrudes on that clean abstraction. What happens when a tool call fails? What if it returns malformed data? What if the agent gets stuck in a loop, calling the same tool over and over with slightly different parameters?
Using an agent, you never see these problems. They're handled (or not handled) by whoever built the thing. Building one yourself, you hit every edge case personally. You watch your agent confidently march into an infinite loop and have to figure out how to teach it not to do that. You see it misinterpret a tool's output and realize your schema wasn't as clear as you thought.
The other thing I learned: system prompts matter way more than I expected. I knew they mattered — I'm not naive. But there's a difference between knowing something intellectually and feeling it. A tiny rephrasing in how I described the agent's role changed its behavior completely. Adding one sentence about "thinking step-by-step before choosing a tool" cut my error rate in half.
This is knowledge you can't get from a blog post or a paper. You have to feel the difference yourself.
The Gap Between Using and Building
There's a gap between people who use AI tools and people who build them, and I think that gap matters more than we talk about.
When you only use AI tools, you develop superstitions. You think certain prompts are magic because they worked once. You blame the model when things go wrong, even when the problem is actually in how the tool is integrated. You treat the whole thing as a black box and try to learn its quirks through trial and error.
When you build AI tools, even once, even badly, you develop intuition. You understand what's actually happening under the hood. You know which problems are hard and which just look hard. You stop treating the model like a magic oracle and start treating it like a powerful but ultimately mechanical component in a larger system.
I'm not saying everyone needs to build an agent. But I do think the gap between these two mindsets is becoming a problem. We're building a world where AI is infrastructure, but most people interact with it purely as consumers. That's fine for some things, but for anything beyond surface-level use, you need the builder's mindset.
And here's the good news: that gap is closing. Not because people are getting smarter, but because the tools are getting better. Five years ago, building an agent from scratch would have taken me weeks and required deep ML expertise. Today, with Cursor and Opus, I did it in a weekend with Python I learned in college. The barrier to entry is collapsing.
Why This Matters
Working at Obvious.ai, I see the cutting edge of what agents can do. We're pushing the boundaries of what's possible — more capable, more reliable, more useful. But playing with my scrappy weekend project, I realized something: the distance between "the best agent in the world" and "an agent I built in two days" isn't as large as you'd think.
The frontier is moving so fast that even a rough, homegrown agent built with off-the-shelf tools can do things that would have seemed impossible a year ago. And that's exciting, but also a little unsettling. If someone like me — someone who spends all day thinking about this stuff but had never actually done it — can build something functional in a weekend, what does that mean for how fast this technology spreads?
I don't have a tidy conclusion here. Building this agent didn't give me some grand insight into the future of AI. What it did give me was something more concrete: a visceral understanding of how these systems work, what makes them tick, and where the real challenges live.
Also, I can now say I've actually built the thing I spend all day talking about. Which, honestly, just feels better.
If you're working with AI and you've never built anything yourself, I'd encourage you to try it. Not because you need to become an engineer. Not because it'll make you better at your job (though it might). But because there's something clarifying about closing that gap between using and building, even just once.
You might be surprised by how fast it comes together. I know I was.