Intro: The Shift from Prompts to Agents
For the past few years, our relationship with Large Language Models (LLMs) has been largely transactional: we write a prompt, and the model predicts a response. If the output is incomplete, incorrect, or requires external data, we must manually copy-paste the output, fetch the missing information, refine our prompt, and try again. As developers, we act as the manual glue connecting static reasoning engines to the real world.
But a paradigm shift is underway. We are moving from basic prompting to autonomous agentic systems.
An AI Agent is a software system designed to think, plan, use tools, observe outcomes, and iterate autonomously until it achieves a specific goal.
If you are looking to build in this space, terms like Agents, Tools, MCP, ADKs, and SDKs can quickly become a confusing alphabet soup. In this 3-part series, we will demystify these concepts from the ground up.
Let’s begin with the foundations: What is an agent, how does it use tools under the hood, and how does it “think” programmatically?
1. The Isolated Reasoner (The Brain in a Jar)
To understand how an agent works programmatically, we must first look at what an LLM actually is: an autoregressive next-token predictor. It has no active memory, no direct connection to the operating system, and no built-in ability to run code.
Think of a standalone LLM as a highly advanced cerebral cortex suspended in isolation—a “brain in a jar.”
The brain possesses deep logical reasoning, vocabulary, and planning capabilities. However, because it is isolated, it has no physical sensory nerves to read your database, and no motor nerves (appendages) to edit a local file or ping a web API. It can only read text streams, predict subsequent text streams, and stop.
2. Exposing the Appendages (Tools as Arms and Senses)
If the LLM is the brain, then Tools are the sensory and motor appendages we attach to it.
- A tool is simply a native code function (written in JavaScript, Python, etc.) that executes an action in the real world.
- Exposing a
readFilefunction gives the brain eyes to inspect a workspace. - Exposing a
writeDatabaseorsendEmailAPI gives the brain hands to modify state.
How the Brain Learns about its Arms: The JSON Schema
An LLM doesn’t natively “know” what functions exist on your server. When you initialize an agent system, you must supply a Tool Definition Schema alongside your prompt. This schema, written in JSON Schema format, describes the function’s name, purpose, and required arguments in plain text:
{
"name": "fetch_user_details",
"description": "Queries the database for user profile details using their system ID.",
"parameters": {
"type": "object",
"properties": {
"userId": { "type": "string", "description": "The unique system identifier." }
},
"required": ["userId"]
}
}
By reading this schema, the LLM incorporates the metadata into its vocabulary, mapping out what “appendages” it has access to during execution.
3. The Control Flow: Loop and Steps
How does a text engine actually “reach out” and trigger these appendages? It happens through a continuous, structured conversation cycle known as the ReAct (Reason + Action) loop.
Think of this process in two layers:
- The Big Picture (The Macro Loop): The continuous circuit showing how data and instructions circulate between the LLM Brain, the Toolbox, and the Environment.
- The 4 Steps (The Micro Milestones): What actually happens at a code level during a single trip around that loop.
Layer 1: The Big Picture Map
This auto-playing animation shows the continuous cycle of information traveling around the loop: Brain (thinking) → Toolbox (requesting an action) → Environment (executing it and returning the result) → Brain (observing the result).
Layer 2: The Step-by-Step Breakdown
To understand exactly how control shifts dynamically between your code (the Host) and the AI (the Brain) at each point of the cycle, let’s walk through a single trip around the loop in 4 core steps.
Thought (Reasoning)
The Brain processes the user's query and compares it against its registered toolbox descriptions. It realizes it lacks direct computational data and plans to contract a muscle.
Action (The Intercept)
The LLM spits out a specialized Tool Call request (JSON block) and ceases generation, effectively shifting the CPU runtime control back to your hosting server.
{ "tool_call": "fetch_user_details", "args": { "userId": "usr_99" } }
Execution
The Host application intercepts the JSON instruction, triggers your native database connector function, and fetches the profile variables from the environment.
const data = await db.query("SELECT * FROM users WHERE id = 'usr_99'");
// Returns: { name: "Alice", role: "Developer", active: true }
Observation (Feedback)
The Host app feeds the returned raw query results back into the LLM's chronological input array. The brain observes this feedback as a fresh token stream and synthesizes the final answer.
[System] + [User] + [Tool Call] + [Observation: Alice is Developer, active]
💬 Synthesized output: "Alice is an active Developer under ID usr_99."
Layer 3: Interactive ReAct Sandbox
To truly understand how this cycle operates programmatically, try running it yourself! Use the interactive dashboard below to select a target goal and step through the ReAct loop.
Watch the AI Brain Console show active control shifts and see how the chronological, stateless JSON Message Memory Array grows dynamically in real-time to accumulate memory.
4. State Management: The Message History Array
A common misconception is that AI agents possess an active, running “consciousness” or background thread that stays alive between steps.
In reality, the agent has no internal running state. Every iteration of the ReAct loop is completely stateless. The “memory” of an agent is represented entirely by a standard, appending JSON Message Array that grows chronologically.
Here is the exact message state array representing our database query:
const messageHistory = [
// 1. System Prompt (The core operational guidelines for the brain)
{
role: "system",
content: "You are an agent with access to user database tools. Always check database before answering."
},
// 2. User Stimulus (The initial input trigger)
{
role: "user",
content: "Find the profile details of user 'usr_99' and check if she has access."
},
// 3. Assistant Tool Call Request (Brain deciding to contract a muscle)
{
role: "assistant",
tool_calls: [
{
id: "call_t99",
type: "function",
function: { name: "fetch_user_details", arguments: "{\"userId\":\"usr_99\"}" }
}
]
},
// 4. Tool Observation Response (Sensory nerve sending tactile feedback back to brain)
{
role: "tool",
tool_call_id: "call_t99",
content: "{\"name\":\"Alice\",\"role\":\"Developer\",\"active\":true}"
},
// 5. Final Answer (Brain synthesizes the entire chronological context)
{
role: "assistant",
content: "User 'usr_99' belongs to Alice, who is a Developer. Her account is active, meaning she has access."
}
];
Every single turn of the loop re-sends this entire historical array back to the LLM. The model reads the whole timeline, predicts the next action or final response, and the loop continues.
The Next Challenge: The Integration Bottleneck
By linking an isolated reasoning engine (the brain) with exposed code functions (the arms) through a stateless context loop, we have created an agent capable of autonomous real-world actions.
However, as you build more complex systems, you will run into a major engineering bottleneck: integration scaling.
If you build three different agent clients (a CLI assistant, a web dashboard, and a Slack bot) and want each to connect to five custom data sources (GitHub, local files, a SQL database, Google Search, and Jira), you have to write custom tool integration, schema handling, and authentication wrapper code for all fifteen combinations.
Every new tool and client requires custom engineering.
How do we standardise this connection so that any agent client can connect to any data source instantly, without custom wrappers? We need a universal port—an “AI USB-C.”
In Part 2, we will dive into the Model Context Protocol (MCP), see how it solves this standardization challenge, and write a complete, hands-on MCP server to connect our agent directly to live data.
Stay tuned!