This is the English version of a previously published article.
What Is the Software 3.0 Era?
In June 2025, Andrej Karpathy gave a talk at Y Combinator AI Startup School. He broke software's evolution into three stages.
Software 1.0: What we've done for decades. Writing explicit logic in Python, Java, or C++. Branching with if-else, looping with for, abstracting with functions. Telling the computer exactly how to do things—in code.
Software 2.0: Kicked off with the deep learning boom in the 2010s. You stop writing rules by hand. Collect data, train a model, and the neural network weights become the program. Tesla Autopilot, for instance, replaced huge chunks of C++ with neural networks.
Software 3.0: Where we are now. You tell an LLM what you want, in plain language. The prompt is the program.

As Karpathy puts it: "Software 3.0 is eating 1.0/2.0." The new paradigm is swallowing the old ones.
📺 Andrej Karpathy: Software Is Changing (Again) — Y Combinator AI Startup School
Harness: Making LLMs Actually Useful
But the reality is messier.
You can't just tell ChatGPT, "Fix the bug in our service," and expect a patch to ship. LLMs are powerful—but on their own, they can't read your codebase, run commands, or touch a database.
That's where the idea of a harness comes in.
A harness is the gear you put on a horse. It's what lets humans actually use that power. No matter how fast or strong the horse, without a harness, that power goes nowhere.

Same goes for LLMs. Raw capability isn't enough. You need tools and environments that fill the gaps and connect them to real work.
Hallucination
Fact grounding, RAG
Lack of domain knowledge
Knowledge base
No state management
Session management, orchestration
No access to external systems
Tooling, MCP
Claude Code Is Also a Harness
Claude Code is Anthropic's CLI-based coding agent. At its core, it's essentially a harness for Claude.
Here's what it provides:
File system access
Lets Claude read and write code
Terminal execution
Lets Claude run commands
MCP (Model Context Protocol)
Connects to external systems
Sub-agent
Splits and handles complex tasks
Slash command
Routes user intent
Skills
Reusable functional units
Hooks
Event-driven automation
Together, these turn Claude from a model into an agent that can actually ship things.
But look at this structure for a second. Doesn't it feel familiar?
Seeing It Through Software 1.0 Eyes
MCP, skills, sub-agents, slash commands...
New terms pile up fast, and with them, cognitive load. But look closely at this structure, and you might notice something: it maps surprisingly well onto layered architecture—something most of us have been working with for years. At least as a starting point.

Breaking Down Each Layer
Slash command = Controller
Like Spring's @RestController or Express's router.get(), a slash command is the entry point for user requests. Type /review and the review workflow kicks off. Type /refactor and refactoring begins.
Sub-agent = Service Layer
Just as a Service layer coordinates multiple repositories and domain objects, a sub-agent orchestrates multiple skills to complete a workflow. Each sub-agent maintains its own independent context—think of it as a self-contained unit of work, separate from others.
Skills = Domain-level Component (SRP)
A skill is a single-purpose unit that follows the Single Responsibility Principle. "Review code," "Generate tests," "Write docs"—one clear job per skill. Just as classes shouldn't bloat into monoliths, a skill should do exactly one thing, and do it well.
MCP = Infrastructure / Adapter
Think of MCP as the layer that manages connections to external systems—databases, APIs, and similar outside interfaces. Much like the Repository or Adapter pattern, it's an abstraction boundary: internal logic doesn't need to know how the outside world is implemented.
CLAUDE.md = the project's constitution
Think of CLAUDE.md as the project's stable foundation—the norms and principles that don't change often: tech stack, coding conventions, build commands. Less a dependency manifest, more a shared understanding of how this project works.
Anti-patterns Apply Here Too
The anti-patterns from layered architecture carry over to agent design with surprising fidelity. The names even ring a bell.
Traditional Anti-pattern
Agent Version
Symptom
God Class
God skill
One 3,000-line skill handling everything
Spaghetti Code
Spaghetti CLAUDE.md
All instructions dumped together with no structure
Tight Coupling
Hardcoding without MCP
Direct curl calls; breaks when the API changes
Leaky Abstraction
Sub-agent knows MCP internals
Abstraction boundaries collapse; reusability is lost
Circular Dependency
Circular skill calls
A→B→C→A, risking infinite loops
Code smells apply too:
- Feature Envy: A skill excessively references another skill's data
- Duplication: Identical prompts copy-pasted across multiple skills
- Long Method: One sub-agent sequentially calling 10 skills
The Crucial Difference: What the Metaphor Misses
The layered architecture analogy holds up well. But there's one thing it doesn't quite capture.
Think about a traditional service layer. What happens when inventory runs out mid-order? You throw an OutOfStockException, or fall back to a back-order policy. Payment fails? Retry, or return an error.
Every branch has to be predefined.
But in real development, you hit moments like:
"This edge case... I need to check with the PM." "This scenario isn't in the spec. What do I do?"In traditional architecture, there's no way for the code to pause and ask. It throws an error, makes an arbitrary call, or logs it and moves on.
Agents Can Ask Questions
Agents are different. With Human-in-the-Loop (HITL), there's another option.
With tools like UserAskQuestion, an agent can delegate judgment mid-execution.
Exceptions become questions.
All edge cases must be predefined
When unsure, just ask
Exception → error or default
Exception → request user judgment
All-or-nothing automation
Partial automation works fine
Mistakes require rollback
Catch mistakes before they happen
When to Ask, When to Just Do It
HITL is great—but an agent that asks every two seconds is just annoying.
Ask when:
- The action is hard to reverse (deletions, deployments, external API calls)
- There are multiple valid paths and no clear winner
- The stakes are high
Just do it when:
- The task is safely repeatable
- A convention already covers it
- It's easy to undo
A great agent is one that knows when to ask.
The Path from 1.0 to 3.0
The Software 3.0 era is here. But that doesn't make everything we've learned obsolete.
What to Leave Behind
- The compulsion to write every piece of logic explicitly
- The urge to predefine every conceivable edge case
- Seeing LLMs as little more than "smart autocomplete"
What to Carry Forward
- Layer separation, SRP, abstraction
- Dependency management, interface design
- Testability and debugging strategies
- Code reviews and iterative improvement
The tools have changed. The principles of good design—cohesion, coupling, abstraction—haven't.
When designing an MCP, think Adapter Pattern. When writing a skill, think SRP. When building a sub-agent, think Service Layer.
The architectural thinking you've built up is a solid foundation for building agents well.
Limitations: What the Metaphor Hides
The layered architecture analogy is a useful lens—but like any analogy, it papers over a few real-world gotchas worth keeping in mind.
Tokens Are the New RAM
On traditional servers, you watch RAM. With agents, you watch tokens.
Context Window = Working MemoryToken Usage = Memory Footprint
CLAUDE.md, skills, conversation history, MCP responses—it all competes for space in the context window. 200K tokens sounds like a lot, until you're working with a large codebase.
CLAUDE.md (well-structured)
500–2,000
Per project
Single skill
300–1,500
Token cost when included in context
Conversation history
Cumulative
Grows throughout the session
MCP response (e.g. DB query)
Variable
Watch for large payloads
Just as you guard against OOM crashes, you should anticipate token explosions. Before writing "analyze all test files" in CLAUDE.md, picture what that means across 50 test files. You don't need exact counts—a rough sense of files and line count is enough.
A useful trick: ask Claude, "If you ran this workflow, which files would you expect to read?" If the list is longer than expected, it's a signal to narrow your instructions or break the task into steps.
Another way to save tokens: extract deterministic logic into scripts.
From the LLM's perspective, it just runs the script and uses the output. No need to interpret the convention, no tokens wasted re-deriving it. If a task doesn't require reasoning, offload it to a script.
The Skill Separation Dilemma: Class Explosion and the Law of Demeter
In traditional architecture, blindly applying SRP leads to Class Explosion—hundreds of tiny files that are individually correct but collectively hard to navigate.
Skills have a similar problem. In practice, you often pay an upfront context cost for discoverability—names and descriptions loaded so Claude knows what's available—and you pay more when skills are actually invoked. With 20 skills, that overhead adds up.
It's the agent equivalent of this:
Think about the Law of Demeter: "Don't talk to strangers." Objects should only interact with their immediate neighbors.
Applied to skills: SKILL.md should be the entry point. Delegate the heavy content to references/.
This mirrors the Facade pattern:
Claude works the same way. SKILL.md acts as the Facade. The files in references/ only enter the context when Claude actually needs them.
Finding the balance:
Situation
Traditional Architecture
Skill Design
Independent workflow
Separate Service class
Separate skill
Detailed rules within same domain
Private method / inner class
Files in references/
Reusable utility
Common module
scripts/ or MCP
Practical Tips: The Setup & Config Pattern
Enough theory. What does this look like in practice?
Slash commands let you blend HITL with automation naturally. Compare it to a familiar CLI pattern:
HITL shines brightest during setup:
The agent handles what's obvious automatically—and asks only when something's genuinely ambiguous. You don't predefine every option upfront. You just let it pause at the forks.
The open-source claude-hud plugin demonstrates this pattern cleanly:
What /claude-hud:setup does:
- Detects the current environment (terminal type, Claude Code version)
- Auto-configures the statusline
- Registers the required hooks
The core principle: minimize manual configuration, and only interrupt the user when their input is genuinely needed.
Closing Thoughts
Development in the Software 3.0 era is shifting—from writing code to assembling and directing it.
But the principles behind that assembly aren't foreign. They're the same ones we've been working with.
If MCP, skills, sub-agents, and slash commands feel unfamiliar, try mapping them onto the layered architecture you already know. New technology, viewed through the lens of good engineering principles, tends to make a lot more sense.
One more thing worth holding onto: applications can now ask questions. Rather than trying to spec out every edge case upfront, it's worth considering a different approach—build systems that handle ambiguity by simply asking.
Start building by refactoring your mindset.
All images in this article were created using generative AI.
References
- Andrej Karpathy: Software Is Changing (Again) — Y Combinator
- claude-hud — Claude Code plugin example
- Claude Code Official Documentation








English (US) ·