Building Your First Agentic AI Workflow: A Step-by-Step Guide

For the last couple of years, "using AI" mostly meant one thing for most of us: type a prompt, get back some text. Genuinely useful, but passive. The model waits for you, answers once, and forgets the whole thing the moment it's done.

Agentic AI changes the shape of that. Instead of a model that answers a question, you build a system that pursues a goal — one that decides what to do next, uses tools, checks its own work, and keeps going until the task is actually finished. That jump, from "answer me" to "handle this for me," is the entire reason 2026 is being called the year of the agent.

This guide walks you through building your first agentic workflow from scratch. We'll get clear on what "agentic" actually means (without the buzzword fog), the loop that makes every agent tick, the building blocks you'll assemble, the handful of patterns that cover most real needs, and a proper step-by-step build with working code. At the end, we'll look at two agents running in production right now — one that worked beautifully, and one that taught the whole industry an expensive lesson.

Let's get into it.

First — what does "agentic" even mean?

Cut the hype and an AI agent is easy to define: it's a language model running in a loop, with access to tools, working toward a goal you handed it.

That's the whole thing. The magic isn't some new species of model — it's the loop and the tools wrapped around a model you already know.

It helps to compare three things you could build with the exact same LLM underneath:

A chatbot answers one message at a time. You ask, it replies, conversation over.
A workflow runs the model through a fixed set of steps that you designed. Predictable, like following a recipe.
An agent decides the steps itself. You give it a goal and a toolbox, and it works out the sequence on its own.

Anthropic's widely-read guide on the subject describes the foundational unit as the "augmented LLM" — a model boosted with three things: tools (so it can act), retrieval (so it can look things up), and memory (so it can remember what happened). A plain model can only talk. An augmented one can search, call an API, read a file, and carry what it learned forward. Everything in this guide is built on that one idea.

Workflows vs. agents: the distinction that saves you money

Here's the single most important call you'll make, and most people new to this get it backwards. They reach for a fully autonomous agent when a plain workflow would have done the job cheaper, faster, and far more reliably.

The difference in plain terms:

A workflow is when you decide the path. The model still does the thinking at each step, but the steps are wired together in code. Step one always leads to step two.
An agent is when the model decides the path. It reads the situation and dynamically chooses what to do next, in what order, and when to stop.

Workflows are predictable, testable, and cheap. Agents are flexible and handle situations you never anticipated — but they're harder to debug and can quietly run up a big token bill if they start wandering.

Nearly everyone who has actually shipped these gives the same advice: start with a workflow, and only graduate to a full agent when the task truly needs to make decisions you can't script ahead of time. A good gut check: if you can draw the flowchart yourself, you probably don't need an agent yet.

How an agent actually thinks: the core loop

Every agent, under every framework, runs some version of the same loop. It's usually called the ReAct pattern — short for "Reason and Act" — and once you see it, it shows up everywhere:

Observe — the agent looks at the goal and everything it knows so far.
Reason — it decides what to do next ("I need to look up this customer's order").
Act — it calls a tool: queries a database, hits an API, runs some code.
See the result — the tool returns something, and that becomes new information.
Repeat — it loops back, now slightly better informed, until the goal is met.

That loop is the entire engine. A weather agent might go around twice (find location, then get the forecast). A deep-research agent might loop forty times. The loop is identical; only the number of turns and the choice of tools change.

The building blocks you'll actually assemble

An agent is made of five parts. Get these right and the rest is detail.

1. The model — the brain. The LLM doing the reasoning and deciding which tool to reach for. Bigger models reason better but cost more and run slower. A smart, common move is to mix them: a small fast model for routine steps, a stronger one for the hard reasoning.

2. Tools — the hands. Tools are how the agent touches the real world. A tool is just a function you expose to the model: "search the web," "look up an order," "send an email," "run this SQL." You describe what each one does and what inputs it needs, and the model decides when to call it. Good tools are narrow, clearly named, and well described — a vague tool is a tool the model will misuse.

3. Memory. Short-term memory is the conversation so far — everything sitting in the model's context window right now. Long-term memory is what survives across sessions: past interactions, learned facts, retrieved documents. Without memory, your agent is an amnesiac solving the same problem from scratch every single time.

4. Orchestration — the loop. The code that actually runs the observe-reason-act cycle: feeding the model, catching its tool calls, running them, feeding the results back, and deciding when to stop. This is the part frameworks exist to handle for you.

5. Guardrails. The seatbelts. A cap on how many times it can loop, checks on what it's allowed to do, and a human checkpoint before anything irreversible — spending money, deleting data, sending a message. Skip these and you'll find out why they matter the hard way. (Klarna did. More on that below.)

The five patterns worth knowing

Before building anything complicated, it pays to know the small set of patterns that cover most real-world needs. Anthropic catalogued five that have become the shared vocabulary for this work. In plain language:

1. Prompt chaining. Break a task into a fixed sequence where each step's output feeds the next. Write a draft, improve the draft, then translate it. Simple and reliable when you know the steps up front.

2. Routing. Classify the input first, then send it down the right path. A billing question goes to the billing prompt; a technical one goes to the technical prompt. Each prompt stays focused instead of trying to do everything at once.

3. Parallelization. Run independent subtasks at the same time and combine the results. Either split the work into chunks (five agents each review one section) or run the same task several times and take a vote on the best answer.

4. Orchestrator-workers. A lead agent breaks the task into pieces on the fly and hands each to a worker. This is how coding agents handle a bug spread across many files — the orchestrator doesn't know the subtasks in advance, it discovers them as it goes.

5. Evaluator-optimizer. One agent produces the work, another critiques it, and the first revises. The loop repeats until the critic is happy. Perfect when there's a clear quality bar, like code that has to pass a test suite.

Notice these run from "barely an agent" (prompt chaining) to "genuinely autonomous" (orchestrator-workers). Reach for the simplest one that solves your problem.

Step-by-step: building your first agent

Enough theory. Let's build a real one — a small research assistant that takes a question, searches for information, and writes a grounded answer. It's the "hello world" of agents because it needs exactly what every agent needs: a tool, a loop, and a way to know when to stop.

Step 1 — Pick one narrow, real task

The biggest rookie mistake is building an agent that tries to do everything. Don't. Pick a single task with a clear finish line. Ours: answer a factual question using web search, and back it up with what you found. Narrow tasks are easier to build, easier to test, and much easier to trust.

Step 2 — Choose your model

For a first agent, use a solid mid-tier model — capable enough to reason about when to use a tool, cheap enough that a runaway loop won't sting. You can always route the genuinely hard steps to a stronger model later.

Step 3 — Define your tools

Our agent needs exactly one tool: web search. You describe it to the model with a name, a description, and the inputs it expects. Think of that description as the tool's instruction manual — the model reads it to decide when and how to use it.

```python tools = [ { "name": "web_search", "description": "Search the web for current information. Use this whenever you need facts you don't already know for certain.", "input_schema": { "type": "object", "properties": { "query": { "type": "string", "description": "The search query to run" } }, "required": ["query"] } } ] ```

That `description` field is doing more work than it looks. That one plain-English sentence is the only thing the model has to go on when deciding whether this tool fits the moment. Write it like you're explaining the tool to a new teammate: what it does, and when to reach for it.

Step 4 — Write the loop

This is the heart of the whole thing. The loop sends the goal to the model, checks whether the model wants to use a tool, runs the tool if so, feeds the result back, and repeats until the model returns a final answer instead of another tool request.

```python import anthropic

client = anthropic.Anthropic()

def run_agent(question, max_steps=10): messages = [{"role": "user", "content": question}]

prompt

for _ in range(max_steps):
    response = client.messages.create(
        model="claude-sonnet-5",
        max_tokens=1024,
        tools=tools,
        messages=messages,
    )

    messages.append({"role": "assistant", "content": response.content})

    if response.stop_reason != "tool_use":
        return response.content[-1].text

    tool_results = []
    for block in response.content:
        if block.type == "tool_use":
            output = run_tool(block.name, block.input)
            tool_results.append({
                "type": "tool_result",
                "tool_use_id": block.id,
                "content": output,
            })

    messages.append({"role": "user", "content": tool_results})

return "Stopped: reached the step limit without a final answer."

```

Here is what each part is doing, walked through slowly:

We keep a running list called `messages` — the full conversation the model sees on every turn. This is the agent's short-term memory.
The `for` loop with `max_steps` is our first guardrail. Instead of looping forever, the agent gets a fixed budget of turns.
Each turn, we call the model with the history so far plus the tools it's allowed to use.
We append whatever the model said back onto the history, so nothing gets lost between turns.
Then we check `stop_reason`. If it is not `tool_use`, the model is done thinking and has produced a final answer — so we return it and exit.
If it is `tool_use`, the model is asking to call one or more tools. We run each one, wrap the output in a `tool_result` that points back to the request via `tool_use_id`, and hand those results back as the next message.
Then we loop again. The model now sees the tool results and decides its next move.

That is the entire engine. Frameworks dress this up with nicer syntax, but somewhere underneath, every one of them is running this exact cycle.

Step 5 — Add memory

Right now the agent only remembers within a single run. To make it remember across sessions — a returning user, a long-running project — you store the important bits somewhere persistent (a database, or a vector store for semantic recall) and load the relevant pieces back into `messages` at the start of the next run. The rule of thumb: don't cram everything back in. Retrieve only what's relevant to the current task, or you'll blow through your context window and your budget at the same time.

Step 6 — Add guardrails and a human in the loop

You already met the first guardrail: the step cap in the loop. The second one matters even more — a confirmation step before anything irreversible. If a tool spends money, deletes data, or sends a message on someone's behalf, pause and get a human "yes" first. That single line of judgment is the difference between a helpful agent and an expensive incident.

Step 7 — Observe and evaluate

You can't fix what you can't see. Log every step — every reason, every tool call, every result — so when the agent does something strange (and it will), you can trace exactly where it went sideways. Then build a small set of test cases with known-good answers and re-run them whenever you change a prompt or a tool. Agents fail in subtle ways; evaluation is how you catch a regression before your users do.

Do you actually need a framework?

Short answer: not to start. Anthropic's own guidance is to begin with the model's API directly, because many of these patterns are only a few lines of code — and frameworks can hide the prompts and logic in ways that make debugging harder. Build the loop yourself once. You'll understand every agent you touch afterward.

That said, once you're coordinating multiple agents, juggling complex state, or shipping to production, a framework saves real work. The 2026 landscape, in brief:

LangGraph — the production default for complex, stateful workflows. Models your agent as a graph of nodes and edges, with built-in checkpointing and human-in-the-loop steps. Powers agents at companies like Klarna, Uber, and LinkedIn. Reach for it when you need control and auditability.
CrewAI — the fastest way to a working multi-agent prototype. You define a "crew" of role-based agents (researcher, writer, reviewer) and it handles the coordination. Roughly twenty lines to something running.
OpenAI Agents SDK — low-friction if you're already in the OpenAI ecosystem. Built around explicit "handoffs" between agents. Tied to OpenAI's models.
Claude Agent SDK — Anthropic's own, the same architecture behind Claude Code. Strong on tool use, subagents, and native MCP support.
Microsoft Agent Framework — the merged successor to AutoGen and Semantic Kernel, the obvious pick for .NET and Azure teams.

Two protocols are worth knowing, because they're quietly becoming the plumbing of this whole space. MCP (the Model Context Protocol) is a standard way to connect agents to tools and data — write a tool once, and any MCP-aware agent can use it. A2A (Agent-to-Agent) is a standard for agents to talk to each other, even across different frameworks. Between them, they're turning a pile of incompatible agents into something that can actually interoperate.

The honest advice: match the framework to your real constraint, not to GitHub stars. Building a quick multi-agent demo? CrewAI. Need auditability and human approvals in a regulated setting? LangGraph. Already all-in on one model provider? Their SDK. And if a plain loop and an API call solve your problem — which is more often than you'd expect — you don't need any of them yet.

Best practices, learned the expensive way

Start simple. Workflow before agent, one tool before ten, one model before a swarm. Add complexity only when the task forces you to.
Scope it tightly. An agent that does one thing well beats one that does five things unreliably.
Make tools boringly clear. Narrow, well-named, well-described. Most agent failures trace back to a tool the model misunderstood.
Keep a human near anything irreversible. Money, data, messages to real people — get a confirmation.
Watch cost and latency. Every loop is another model call. Cap the loops, use cheaper models where you can, and measure both.
Observe everything, then evaluate. Log every step and keep a test set. This is the difference between an agent you hope works and one you know works.

Two agents from the real world

1. Klarna's support agent — and the lesson underneath it

The fintech company Klarna built one of the most-cited agents in the business: an AI customer support assistant, launched in early 2024, that within its first month was handling around 2.3 million conversations — roughly two-thirds of the company's support chats. It cut average resolution time from about eleven minutes to under two, worked across dozens of markets and languages, and by 2025 was credited with tens of millions of dollars in savings.

And then came the part everyone should study. Through 2025, Klarna quietly walked it back — rehiring human agents and shifting to a hybrid model. The volume numbers had looked fantastic, but on complex cases (disputes, fraud, hardship situations), quality had slipped, and customer satisfaction slipped with it. The company's own CEO admitted the cost-first push had gone too far.

The takeaway isn't "AI support doesn't work" — it clearly does, at enormous scale, for the high-volume tier. The takeaway is about scope and guardrails. Agents are brilliant at the routine and shaky on the exceptional, and the teams that win decide up front which conversations belong to the agent and which belong to a human. Klarna paid a lot to learn the lesson that Step 6 above tries to teach for free.

2. AI coding agents — the orchestrator-workers pattern in the wild

The clearest agentic success story is the one a lot of developers now lean on daily: AI coding agents like Claude Code and Cursor. Hand one a task — "fix this failing test," "add OAuth login" — and it plans the work, reads the relevant files, edits several of them, runs the tests, and iterates until things pass. Under the hood, that's the orchestrator-workers pattern: a lead loop breaks the task into pieces it discovers as it goes, sometimes spinning up subagents to explore parts of the codebase in parallel.

The results are measurable. A 2025 McKinsey study across large engineering teams found AI coding agents lifted developer output by roughly 40%, with the biggest gains on the repetitive middle of the job — boilerplate, tests, documentation, code review — rather than the net-new architecture that senior engineers still own. It's the honest version of the agent promise: not replacing the engineer, but handing them a tireless junior who never gets bored of the grunt work.

The bottom line

Agentic AI isn't a smarter chatbot — it's a shift from software that answers to software that acts. And the barrier to building it is far lower than the hype suggests. Strip it down and you have a model, a few well-described tools, a loop, and some guardrails. Everything else is refinement.

The momentum is real: the AI agent market was worth around $7.8 billion in 2025 and is projected to pass $50 billion by 2030, and Gartner expects 40% of enterprise applications to ship with task-specific agents by the end of 2026, up from under 5% a year earlier. But you don't need enterprise scale to start. You need one narrow task and the loop from Step 4.

So build the little research agent. Watch it reason, call a tool, read the result, and answer. Once that click happens — once you've seen the loop run with your own eyes — every agent, framework, and breathless headline afterward will suddenly make sense. Start simple, keep a human in the loop, and let the thing surprise you.

If you want the deeper dives — full framework comparisons, code for the more advanced patterns, and the build logs — they're on the blog at blog.lakshjain.com. And if you're building something agentic and want to compare notes, come find me on Instagram at @techwithlaksh.