Agents, Skills, and Natural Language as the Next Abstraction Layer

2026-05-05 · Kam Woh Ng

If you're new to all this: an LLM (large language model) is the kind of AI that powers ChatGPT, Claude, and Gemini. It's auto-complete, but enormously bigger. Instead of guessing one word like your phone does, it can guess whole sentences, paragraphs, even entire letters or stories. It does this by predicting, one word at a time, what a sensible answer would look like. It learned how by reading more writing than a person could finish in a thousand lifetimes.

My thought on today's LLM is their ability to understand natural language and to generate the required output — and it doesn't really matter how accurate the first attempt is.

That's the bit people skip past. The argument against LLMs is usually "it hallucinates, it gets things wrong". Sure. But the interesting capability isn't zero-shot perfection. It's that you describe what you want in plain English, and something comes back you can correct in plain English. The syntax wall is just gone.

A new programming paradigm

This is a new paradigm of programming, where we finally get rid of memorizing the syntax and focus on higher-level thinking. So don't think Claude is taking over our jobs — it is helping us achieve things faster and better.

The thing being automated is the part that was already a tax: remembering which method on which library takes which arguments in which order. The thing that's not being automated is knowing what you actually want, and being able to tell whether you got it.

All we need is to shift our mindset toward programming with natural language. That sounds easy. Being precise in English is its own skill, and it's probably the skill the next decade of software work will run on.

What is an agent?

An agent is an LLM with hands. That's it.

You take a model that can read and write text, you give it tools — read a file, run a command, search the web, call an API — and you put it in a loop. Now it can do things. Take a step. See what happened. Decide what to do next. Repeat until the job's done, or it's stuck, or it's out of budget.

That's the whole idea. The model is the brain, the tools are the hands, the loop is the heartbeat. Everything fancier — multi-agent setups, planners, critics, review passes — is just more brains, more hands, more loops, arranged differently. This is what people mean by AI programming: an LLM doing work in a loop, with tools, against a goal.

So yes — Claude Code, the thing you're talking to when you open a terminal, is an agent. The base model is the brain; the harness gives it tools and runs the loop.

You can also spawn specific agents — same brain, different briefing. One Claude with the system prompt "you are a careful security reviewer" and read-only file tools is a security-review agent. Another with "you are a test writer, output pytest cases" is a test-writing agent. Same model, different personality, different scope. The agent isn't the model; it's the model plus the role you wrap around it.

What is a skill?

A skill is a saved workflow. A small, focused set of instructions the agent pulls in when it sees a matching task — "when the user asks to publish a blog, do these steps."

If an agent is the running program, a skill is a function in its library. You don't teach the agent how to publish a blog every time. You write it down once, save it, and the agent reaches for it when it's needed.

A skill might be one paragraph: "When the user asks to publish a blog, read original.md, render markdown to HTML, update the sitemap, update the RSS feed, add an entry to the home page." That's a function. It just doesn't have parentheses.

Side by side, a Python function and a skill are the same shape:

def publish_blog(slug):
    md = read(f"blog/{slug}/original.md")
    html = render_markdown(md)
    write(f"blog/{slug}/index.html", html)
    update_sitemap(slug)
    update_feed(slug)
    add_to_home(slug)

When the user asks to publish a blog:
1. Read original.md from the blog folder.
2. Render the markdown to HTML.
3. Write the HTML to index.html.
4. Update sitemap.xml.
5. Update feed.xml.
6. Add an entry to the home page.

Same logic, same control flow, same I/O — different language. The first runs on a Python interpreter; the second runs on an LLM. Both describe the same procedure. One uses parentheses, the other uses paragraphs.

This is how natural-language code starts to look like real software. You stop pasting the same long prompt over and over. You name a thing, save it, reuse it. Same instinct that turned shell one-liners into shell scripts.

Here's the loop that closes the system: you don't even have to write skills by hand. You tell Claude what you want — "I want a skill that publishes a blog from a markdown file" — and Claude writes the skill markdown for you, structured in a way Claude itself will follow more reliably than your original description. LLMs writing instructions for LLMs. The system writes its own programs.

Agents vs skills

Short version:

Agent — the runner. Holds the tools, calls the model, keeps going until the job is done.
Skill — the routine. A reusable "here's how to do this kind of task" the agent can grab.

An agent without skills still works — it just figures out everything from scratch each time. A skill without an agent is just a markdown file. You need both for the system to feel like programming, not chatting.

Once you see it this way — agents as programs, skills as functions — the rest follows.

Under the hood: it's all text prediction

Here's what actually happens when you run Claude Code.

The model reads a big block of text — your CLAUDE.md, the conversation so far, the contents of any file that's been opened, the descriptions of every available tool. All of that is just text in its context window.

Then it predicts the next text. That's the only thing an LLM does — predict what comes next.

But the predicted text isn't always a reply to you. It might be:

a tool call (e.g. "read this file", "run this command")
a block of thinking — reasoning to itself before acting
a plain message back to you

The harness — Claude Code itself — looks at the predicted text, sees "this is a tool call", actually executes it, and feeds the result back as more text in the next prediction. Then the model predicts again. Loop until the job is done.

That's the agent. Strip away the marketing and it's a text predictor in a loop, with a harness that turns some of its predictions into real actions.

This also explains why skills work. When you write a skill, you're not programming a robot — you're writing text that, when included in the model's context, biases its next prediction toward the steps you want. CLAUDE.md is the same. A user prompt is the same. Everything is just text going in and text coming out.

We program a programmer

There's a recursion buried in all this that's worth naming. When you write a skill, tune a system prompt, or set up an agent loop, you're not programming a machine in the old sense. You're programming a programmer.

The thing you're configuring (the LLM) is the thing that produces the actual code. You're one layer up. You're not the author of the code; you're the author of the thing that authors the code.

This isn't entirely unprecedented. Programs have been writing programs since the 60s — compilers, code generators, macros. Code that produces code is older than most of us.

What's new is the meta-language. For the first time, the layer above the code is plain English instead of more code. That's the actual shift. Recursion isn't new; doing it in a non-formal language is.

Which means the bottleneck moves. It's no longer your typing speed or your knowledge of which library handles dates correctly. It's how well you can specify intent. That's the skill that compounds in this paradigm. Everything else is downstream.

The model is replaceable

Claude, Codex, Gemini — they all work the same way. A model in a loop, with tools, predicting text. The teams behind them train differently — different internal formats, different strategies, different data — so the behaviour differs. But the shape doesn't.

What that means: a good harness is mostly model-agnostic. The skill of designing prompts, building verification loops, structuring tools, writing skills — that doesn't get thrown out when the next model lands. It transfers. A better model just makes the same harness produce better results.

It's like Python and C. A good engineer can build the same thing in either language — Python just makes everything faster. A better model is the same: it doesn't replace the engineer, it just makes them faster.

This is why "which model is best?" is the wrong question. What matters more is the harness around the model — Claude Code, Cursor, the agent loops you write yourself. Give two engineers the same model and you'll get wildly different results, just based on how well each one set up the work.

Look at OpenClaw — a self-hosted personal AI assistant with channel adapters (WhatsApp, Slack, Discord, iMessage), a skills runtime, and an onboarding wizard, all built around a model you bring yourself. Pure harness. No training. The README puts it bluntly: "the Gateway is just the control plane — the product is the assistant." That's a real product competing with closed AI assistants on harness alone — because that's the part anyone can build.

Bet on the harness, not the model. Models get replaced. Harness skill keeps growing.

Just another abstraction layer

Although today's LLMs are not 100% accurate, with proper harness engineering — tools, verification, retries, agent orchestration — we are still essentially programming. It has just become natural language programming.

A workflow is a program. A program is a workflow. We've just stopped writing them in Python.

This is like how machine code evolved into assembly code, assembly code evolved into C, and C into Python. Take the same operation — adding two numbers — at each layer:

; machine code (x86-64)
B8 05 00 00 00
83 C0 03

; assembly
mov eax, 5
add eax, 3

// C
int x = 5 + 3;

# Python
x = 5 + 3

# Natural language
"Add 5 and 3."

Each step, the syntax shrinks and the intent gets louder. With each one, humans can focus on better describing what they want instead of thinking about low-level syntax (though I am not saying that's not important — someone still has to write the compiler, and someone still has to write the harness).

Each layer never killed the one below it. It just changed what most people spend their day on. I think natural language is the next one.

Anyone can build now

And this changes who gets to build. Not just professional engineers shipping faster — anyone with an idea and the patience to describe it. A personal website. A small tool that saves ten minutes a day. A weird little game.

The barrier was never having ideas. The barrier was syntax. And syntax just got abstracted away.

Plot twist: this whole site was built by Claude. Every paragraph above. The HTML. The CSS. The RSS feed. The publishing skill itself. I described what I wanted in plain English. Claude wrote it.

This post was written the same way. That's the proof.