Skip to content
Sumeesh Nagisetty
Go back

Beyond the Prompt (Part 1): Demystifying AI Agents and Tools

Intro: The Shift from Prompts to Agents

For the past few years, our relationship with Large Language Models (LLMs) has been largely transactional: we write a prompt, and the model predicts a response. If the output is incomplete, incorrect, or requires external data, we must manually copy-paste the output, fetch the missing information, refine our prompt, and try again. As developers, we act as the manual glue connecting static reasoning engines to the real world.

But a paradigm shift is underway. We are moving from basic prompting to autonomous agentic systems.

An AI Agent is a software system designed to think, plan, use tools, observe outcomes, and iterate autonomously until it achieves a specific goal.

If you are looking to build in this space, terms like Agents, Tools, MCP, ADKs, and SDKs can quickly become a confusing alphabet soup. In this 3-part series, we will demystify these concepts from the ground up.

Let’s begin with the foundations: What is an agent, how does it use tools under the hood, and how does it “think” programmatically?


1. The Isolated Reasoner (The Brain in a Jar)

To understand how an agent works programmatically, we must first look at what an LLM actually is: an autoregressive next-token predictor. It has no active memory, no direct connection to the operating system, and no built-in ability to run code.

Think of a standalone LLM as a highly advanced cerebral cortex suspended in isolation—a “brain in a jar.”

Input Token Stream 🧠 LLM REASONER "Brain in a Jar" Next-Token Prediction

The brain possesses deep logical reasoning, vocabulary, and planning capabilities. However, because it is isolated, it has no physical sensory nerves to read your database, and no motor nerves (appendages) to edit a local file or ping a web API. It can only read text streams, predict subsequent text streams, and stop.


2. Exposing the Appendages (Tools as Arms and Senses)

If the LLM is the brain, then Tools are the sensory and motor appendages we attach to it.

How the Brain Learns about its Arms: The JSON Schema

An LLM doesn’t natively “know” what functions exist on your server. When you initialize an agent system, you must supply a Tool Definition Schema alongside your prompt. This schema, written in JSON Schema format, describes the function’s name, purpose, and required arguments in plain text:

{
  "name": "fetch_user_details",
  "description": "Queries the database for user profile details using their system ID.",
  "parameters": {
    "type": "object",
    "properties": {
      "userId": { "type": "string", "description": "The unique system identifier." }
    },
    "required": ["userId"]
  }
}

By reading this schema, the LLM incorporates the metadata into its vocabulary, mapping out what “appendages” it has access to during execution.


3. The Control Flow: Loop and Steps

How does a text engine actually “reach out” and trigger these appendages? It happens through a continuous, structured conversation cycle known as the ReAct (Reason + Action) loop.

Think of this process in two layers:

  1. The Big Picture (The Macro Loop): The continuous circuit showing how data and instructions circulate between the LLM Brain, the Toolbox, and the Environment.
  2. The 4 Steps (The Micro Milestones): What actually happens at a code level during a single trip around that loop.

Layer 1: The Big Picture Map

This auto-playing animation shows the continuous cycle of information traveling around the loop: Brain (thinking) → Toolbox (requesting an action) → Environment (executing it and returning the result) → Brain (observing the result).

THE AGENTIC REACTION CYCLE

An auto-playing minimalist loop demonstrating data flows during a ReAct (Reason + Action) iteration.

🧠 LLM CORE "The Brain" TOOLBOX "The Arms" ENVIRONMENT "The World" 1. Tool Call 2. Execute 3. Observe

Layer 2: The Step-by-Step Breakdown

To understand exactly how control shifts dynamically between your code (the Host) and the AI (the Brain) at each point of the cycle, let’s walk through a single trip around the loop in 4 core steps.

1

Thought (Reasoning)

The Brain processes the user's query and compares it against its registered toolbox descriptions. It realizes it lacks direct computational data and plans to contract a muscle.

🧠 Brain State: "User wants profile data for ID usr_99. I do not have this in my static pre-trained weights. I must trigger the fetch_user_details arm."
2

Action (The Intercept)

The LLM spits out a specialized Tool Call request (JSON block) and ceases generation, effectively shifting the CPU runtime control back to your hosting server.

➡️ Generated Action Token:
{ "tool_call": "fetch_user_details", "args": { "userId": "usr_99" } }
3

Execution

The Host application intercepts the JSON instruction, triggers your native database connector function, and fetches the profile variables from the environment.

⚙️ Host execution running:
const data = await db.query("SELECT * FROM users WHERE id = 'usr_99'");
// Returns: { name: "Alice", role: "Developer", active: true }
4

Observation (Feedback)

The Host app feeds the returned raw query results back into the LLM's chronological input array. The brain observes this feedback as a fresh token stream and synthesizes the final answer.

📥 Context stream update:
[System] + [User] + [Tool Call] + [Observation: Alice is Developer, active]
💬 Synthesized output: "Alice is an active Developer under ID usr_99."

Layer 3: Interactive ReAct Sandbox

To truly understand how this cycle operates programmatically, try running it yourself! Use the interactive dashboard below to select a target goal and step through the ReAct loop.

Watch the AI Brain Console show active control shifts and see how the chronological, stateless JSON Message Memory Array grows dynamically in real-time to accumulate memory.

REACT LOOP INTERACTIVE SIMULATOR

Select an agent goal, then control the execution flow step-by-step to observe how stateless memory accumulates.

Select Agent Goal:

📺 AI Brain Console IDLE

1. Thought
2. Action
3. Execution
4. Observe
// Simulator initialized. Click "Next Step" to start.

📦 JSON Message Memory Array 0 messages

[ // Chronological system memory is currently empty. ]

4. State Management: The Message History Array

A common misconception is that AI agents possess an active, running “consciousness” or background thread that stays alive between steps.

In reality, the agent has no internal running state. Every iteration of the ReAct loop is completely stateless. The “memory” of an agent is represented entirely by a standard, appending JSON Message Array that grows chronologically.

Here is the exact message state array representing our database query:

const messageHistory = [
  // 1. System Prompt (The core operational guidelines for the brain)
  { 
    role: "system", 
    content: "You are an agent with access to user database tools. Always check database before answering." 
  },
  
  // 2. User Stimulus (The initial input trigger)
  { 
    role: "user", 
    content: "Find the profile details of user 'usr_99' and check if she has access." 
  },
  
  // 3. Assistant Tool Call Request (Brain deciding to contract a muscle)
  { 
    role: "assistant", 
    tool_calls: [
      {
        id: "call_t99",
        type: "function",
        function: { name: "fetch_user_details", arguments: "{\"userId\":\"usr_99\"}" }
      }
    ]
  },
  
  // 4. Tool Observation Response (Sensory nerve sending tactile feedback back to brain)
  { 
    role: "tool", 
    tool_call_id: "call_t99", 
    content: "{\"name\":\"Alice\",\"role\":\"Developer\",\"active\":true}" 
  },
  
  // 5. Final Answer (Brain synthesizes the entire chronological context)
  { 
    role: "assistant", 
    content: "User 'usr_99' belongs to Alice, who is a Developer. Her account is active, meaning she has access." 
  }
];

Every single turn of the loop re-sends this entire historical array back to the LLM. The model reads the whole timeline, predicts the next action or final response, and the loop continues.


The Next Challenge: The Integration Bottleneck

By linking an isolated reasoning engine (the brain) with exposed code functions (the arms) through a stateless context loop, we have created an agent capable of autonomous real-world actions.

However, as you build more complex systems, you will run into a major engineering bottleneck: integration scaling.

If you build three different agent clients (a CLI assistant, a web dashboard, and a Slack bot) and want each to connect to five custom data sources (GitHub, local files, a SQL database, Google Search, and Jira), you have to write custom tool integration, schema handling, and authentication wrapper code for all fifteen combinations.

Every new tool and client requires custom engineering.

How do we standardise this connection so that any agent client can connect to any data source instantly, without custom wrappers? We need a universal port—an “AI USB-C.”

In Part 2, we will dive into the Model Context Protocol (MCP), see how it solves this standardization challenge, and write a complete, hands-on MCP server to connect our agent directly to live data.

Stay tuned!


Share this post:

Next Post
Automate Swagger docs for Express Server