Embracing the Software 3.0 Era

1 month ago 32

This is the English version of a previously published article.

What Is the Software 3.0 Era?

In June 2025, Andrej Karpathy gave a talk at Y Combinator AI Startup School. He broke software's evolution into three stages.

Software 1.0: What we've done for decades. Writing explicit logic in Python, Java, or C++. Branching with if-else, looping with for, abstracting with functions. Telling the computer exactly how to do things—in code.

Software 2.0: Kicked off with the deep learning boom in the 2010s. You stop writing rules by hand. Collect data, train a model, and the neural network weights become the program. Tesla Autopilot, for instance, replaced huge chunks of C++ with neural networks.

Software 3.0: Where we are now. You tell an LLM what you want, in plain language. The prompt is the program.

As Karpathy puts it: "Software 3.0 is eating 1.0/2.0." The new paradigm is swallowing the old ones.

📺 Andrej Karpathy: Software Is Changing (Again) — Y Combinator AI Startup School

Harness: Making LLMs Actually Useful

But the reality is messier.

You can't just tell ChatGPT, "Fix the bug in our service," and expect a patch to ship. LLMs are powerful—but on their own, they can't read your codebase, run commands, or touch a database.

That's where the idea of a harness comes in.

A harness is the gear you put on a horse. It's what lets humans actually use that power. No matter how fast or strong the horse, without a harness, that power goes nowhere.

Same goes for LLMs. Raw capability isn't enough. You need tools and environments that fill the gaps and connect them to real work.

Hallucination

Fact grounding, RAG

Lack of domain knowledge

Knowledge base

No state management

Session management, orchestration

No access to external systems

Tooling, MCP

Claude Code Is Also a Harness

Claude Code is Anthropic's CLI-based coding agent. At its core, it's essentially a harness for Claude.

Here's what it provides:

File system access

Lets Claude read and write code

Terminal execution

Lets Claude run commands

MCP (Model Context Protocol)

Connects to external systems

Sub-agent

Splits and handles complex tasks

Slash command

Routes user intent

Skills

Reusable functional units

Hooks

Event-driven automation

Together, these turn Claude from a model into an agent that can actually ship things.

But look at this structure for a second. Doesn't it feel familiar?

Seeing It Through Software 1.0 Eyes

MCP, skills, sub-agents, slash commands...

New terms pile up fast, and with them, cognitive load. But look closely at this structure, and you might notice something: it maps surprisingly well onto layered architecture—something most of us have been working with for years. At least as a starting point.

Breaking Down Each Layer

Slash command = Controller

Like Spring's @RestController or Express's router.get(), a slash command is the entry point for user requests. Type /review and the review workflow kicks off. Type /refactor and refactoring begins.

# User input /review PR-1234 # Internally → Triggers the review workflow → Executes the appropriate sub-agent and skill combination

Sub-agent = Service Layer

Just as a Service layer coordinates multiple repositories and domain objects, a sub-agent orchestrates multiple skills to complete a workflow. Each sub-agent maintains its own independent context—think of it as a self-contained unit of work, separate from others.

Skills = Domain-level Component (SRP)

A skill is a single-purpose unit that follows the Single Responsibility Principle. "Review code," "Generate tests," "Write docs"—one clear job per skill. Just as classes shouldn't bloat into monoliths, a skill should do exactly one thing, and do it well.

MCP = Infrastructure / Adapter

Think of MCP as the layer that manages connections to external systems—databases, APIs, and similar outside interfaces. Much like the Repository or Adapter pattern, it's an abstraction boundary: internal logic doesn't need to know how the outside world is implemented.

CLAUDE.md = the project's constitution

Think of CLAUDE.md as the project's stable foundation—the norms and principles that don't change often: tech stack, coding conventions, build commands. Less a dependency manifest, more a shared understanding of how this project works.

# Example CLAUDE.md ## Tech Stack - TypeScript + React 18 - Node.js 20+ - pnpm ## Coding Conventions - Use functional components only - Write tests using vitest ## Build Commands - `pnpm build` — Production build - `pnpm test` — Run all tests

One thing worth noting: if you find yourself editing CLAUDE.md often, that content probably doesn't belong there. Dynamic details—current task, today's priorities—should come in through conversation or be injected into the sub-agent's context directly.

Anti-patterns Apply Here Too

The anti-patterns from layered architecture carry over to agent design with surprising fidelity. The names even ring a bell.

Traditional Anti-pattern

Agent Version

Symptom

God Class

God skill

One 3,000-line skill handling everything

Spaghetti Code

Spaghetti CLAUDE.md

All instructions dumped together with no structure

Tight Coupling

Hardcoding without MCP

Direct curl calls; breaks when the API changes

Leaky Abstraction

Sub-agent knows MCP internals

Abstraction boundaries collapse; reusability is lost

Circular Dependency

Circular skill calls

A→B→C→A, risking infinite loops

Code smells apply too:

Feature Envy: A skill excessively references another skill's data
Duplication: Identical prompts copy-pasted across multiple skills
Long Method: One sub-agent sequentially calling 10 skills

The Crucial Difference: What the Metaphor Misses

The layered architecture analogy holds up well. But there's one thing it doesn't quite capture.

Think about a traditional service layer. What happens when inventory runs out mid-order? You throw an OutOfStockException, or fall back to a back-order policy. Payment fails? Retry, or return an error.

Every branch has to be predefined.

public Order processOrder(OrderRequest request) { if (inventory.check(request.getItemId()) < request.getQuantity()) { throw new OutOfStockException(); return backOrderPolicy.apply(request); } }

But in real development, you hit moments like:

"This edge case... I need to check with the PM." "This scenario isn't in the spec. What do I do?"

In traditional architecture, there's no way for the code to pause and ask. It throws an error, makes an arbitrary call, or logs it and moves on.

Agents Can Ask Questions

Agents are different. With Human-in-the-Loop (HITL), there's another option.

Request → Agent → Processing... ↓ 🤔 Uncertain situation ↓ "Would you prefer A or B?" ↓ User: "Let's go with A" ↓ Continues → Done

With tools like UserAskQuestion, an agent can delegate judgment mid-execution.

Exceptions become questions.

All edge cases must be predefined

When unsure, just ask

Exception → error or default

Exception → request user judgment

All-or-nothing automation

Partial automation works fine

Mistakes require rollback

Catch mistakes before they happen

When to Ask, When to Just Do It

HITL is great—but an agent that asks every two seconds is just annoying.

Ask when:

The action is hard to reverse (deletions, deployments, external API calls)
There are multiple valid paths and no clear winner
The stakes are high

Just do it when:

The task is safely repeatable
A convention already covers it
It's easy to undo

A great agent is one that knows when to ask.

The Path from 1.0 to 3.0

The Software 3.0 era is here. But that doesn't make everything we've learned obsolete.

What to Leave Behind

The compulsion to write every piece of logic explicitly
The urge to predefine every conceivable edge case
Seeing LLMs as little more than "smart autocomplete"

What to Carry Forward

Layer separation, SRP, abstraction
Dependency management, interface design
Testability and debugging strategies
Code reviews and iterative improvement

The tools have changed. The principles of good design—cohesion, coupling, abstraction—haven't.

When designing an MCP, think Adapter Pattern. When writing a skill, think SRP. When building a sub-agent, think Service Layer.

The architectural thinking you've built up is a solid foundation for building agents well.

Limitations: What the Metaphor Hides

The layered architecture analogy is a useful lens—but like any analogy, it papers over a few real-world gotchas worth keeping in mind.

Tokens Are the New RAM

On traditional servers, you watch RAM. With agents, you watch tokens.

Context Window = Working MemoryToken Usage = Memory Footprint

CLAUDE.md, skills, conversation history, MCP responses—it all competes for space in the context window. 200K tokens sounds like a lot, until you're working with a large codebase.

CLAUDE.md (well-structured)

500–2,000

Per project

Single skill

300–1,500

Token cost when included in context

Conversation history

Cumulative

Grows throughout the session

MCP response (e.g. DB query)

Variable

Watch for large payloads

Just as you guard against OOM crashes, you should anticipate token explosions. Before writing "analyze all test files" in CLAUDE.md, picture what that means across 50 test files. You don't need exact counts—a rough sense of files and line count is enough.

A useful trick: ask Claude, "If you ran this workflow, which files would you expect to read?" If the list is longer than expected, it's a signal to narrow your instructions or break the task into steps.

Another way to save tokens: extract deterministic logic into scripts.

# Anti-pattern: LLM re-interprets the convention every time "Create a branch name in the format feature/JIRA-{ticket}-{description}. Description should be kebab-case. If it's in Korean, translate to English..." # Better: a script encapsulates the convention ./scripts/create-branch.sh JIRA-1234 "login feature" → feature/JIRA-1234-login-feature

From the LLM's perspective, it just runs the script and uses the output. No need to interpret the convention, no tokens wasted re-deriving it. If a task doesn't require reasoning, offload it to a script.

The Skill Separation Dilemma: Class Explosion and the Law of Demeter

In traditional architecture, blindly applying SRP leads to Class Explosion—hundreds of tiny files that are individually correct but collectively hard to navigate.

Skills have a similar problem. In practice, you often pay an upfront context cost for discoverability—names and descriptions loaded so Claude knows what's available—and you pay more when skills are actually invoked. With 20 skills, that overhead adds up.

# Anti-pattern: Skill Explosion .claude/skills/ ├── review-naming/ │ └── SKILL.md ├── review-types/ │ └── SKILL.md ├── review-complexity/ │ └── SKILL.md ├── review-security/ │ └── SKILL.md └── ... (15 more)

It's the agent equivalent of this:

class NamingValidator { ... } class TypeValidator { ... } class ComplexityValidator { ... } class SecurityValidator { ... } new NamingValidator().validate(code); new TypeValidator().validate(code);

Think about the Law of Demeter: "Don't talk to strangers." Objects should only interact with their immediate neighbors.

Applied to skills: SKILL.md should be the entry point. Delegate the heavy content to references/.

# Recommended: Progressive Disclosure structure .claude/skills/ └── code-review/ ├── SKILL.md # "Review my code" → load only this ├── references/ # Load only when needed │ ├── naming-rules.md # "What are the naming conventions?" → load then │ ├── security-checklist.md │ └── performance-guide.md └── scripts/ └── lint-check.sh

This mirrors the Facade pattern:

class CodeReviewer { private NamingRules namingRules; private SecurityChecklist security; public Review review(Code code) { if (needsNamingCheck) namingRules.check(code); if (needsSecurityCheck) security.check(code); } }

Claude works the same way. SKILL.md acts as the Facade. The files in references/ only enter the context when Claude actually needs them.

Finding the balance:

Situation

Traditional Architecture

Skill Design

Independent workflow

Separate Service class

Separate skill

Detailed rules within same domain

Private method / inner class

Files in references/

Reusable utility

Common module

scripts/ or MCP

Practical Tips: The Setup & Config Pattern

Enough theory. What does this look like in practice?

Slash commands let you blend HITL with automation naturally. Compare it to a familiar CLI pattern:

# Traditional CLI npm init # Generate initial structure npm config set # Adjust settings later # Agent commands /setup # Analyze repo → generate structure /config # Adjust existing settings

HITL shines brightest during setup:

/setup → Detected: TypeScript + React, pnpm → Found both vitest and jest as testing frameworks. Which should be the default? [vitest / jest] > vitest → CLAUDE.md created

The agent handles what's obvious automatically—and asks only when something's genuinely ambiguous. You don't predefine every option upfront. You just let it pause at the forks.

The open-source claude-hud plugin demonstrates this pattern cleanly:

# 1. Install plugin /plugin install claude-hud # 2. Configure for the repo — this is the setup /claude-hud:setup

What /claude-hud:setup does:

Detects the current environment (terminal type, Claude Code version)
Auto-configures the statusline
Registers the required hooks

The core principle: minimize manual configuration, and only interrupt the user when their input is genuinely needed.

Closing Thoughts

Development in the Software 3.0 era is shifting—from writing code to assembling and directing it.

But the principles behind that assembly aren't foreign. They're the same ones we've been working with.

If MCP, skills, sub-agents, and slash commands feel unfamiliar, try mapping them onto the layered architecture you already know. New technology, viewed through the lens of good engineering principles, tends to make a lot more sense.

One more thing worth holding onto: applications can now ask questions. Rather than trying to spec out every edge case upfront, it's worth considering a different approach—build systems that handle ambiguity by simply asking.

Start building by refactoring your mindset.

All images in this article were created using generative AI.