Article · Definition

What is an AI Agent?

A working definition for business leaders. Three traits that distinguish an agent from a chatbot or workflow. When agents are the right tool, and when they are the wrong one.

By Aleksi Stenberg · 16 May 2026 · 9 min read

Summary

An AI agent is software where a large language model decides which actions to take. The model has access to tools (APIs, databases, code). It picks a tool, observes the result, and continues the loop until the goal is reached or it fails. Three traits separate an agent from a chatbot or workflow: tool use, planning, and iteration.

Agents fit multi-step tasks that span systems and that no single workflow rule can capture. They are a bad fit for work that needs 100 percent determinism, for high-volume bulk-data tasks, and for trust-critical calculations. The pattern most production teams settle on: the agent orchestrates, deterministic tools execute.

A Working Definition

Most enterprise software pitches use "AI agent" as a vague honorific. Vendors apply it to chatbots, to BPM workflows, and to anything where a model is involved. The label is now confused enough that buying conversations break down inside the first slide.

An AI agent is software where a large language model decides which actions to take. It has access to tools (APIs, databases, functions). It receives a goal, picks a tool, observes the result, then decides the next step. The loop runs until the goal is reached or the agent fails out.

Three traits separate an agent from the other things that get the label:

Tool use. The agent calls APIs, queries databases, runs code. Strip the tools and you have a chatbot.
Planning. The agent breaks a goal into steps before acting, and revises the plan when results require it.
Iteration. The agent observes results, decides whether more steps are needed, and loops until done. A single LLM call is not iteration.

A concrete example. A Finnish mid-market company receives 2,000 supplier invoices a month. An AI agent reads each invoice, queries the ERP for the matching purchase order, checks the supplier database for current payment terms, computes any deviation, and either books the invoice or routes the exception to a human. That is an agent. It uses tools (ERP, supplier DB), it plans (decides which checks to run for which invoice type), and it iterates (loops over invoices, handles errors, retries).

What an AI Agent Is Not

Three things get the label without earning it.

The chatbot case. A chatbot answers questions. ChatGPT in its default configuration is a chatbot. The same model wrapped in an orchestration loop with tool access becomes the basis for an agent. The wrapper is where the agent lives, not the model.

The workflow case. A workflow follows a fixed path defined in advance: run step A, then step B, then step C. Tools like Make, Zapier, n8n, and most BPM platforms are workflow engines. They work well for predictable processes. They are not agents because they cannot reason about which step to take when the situation varies.

The assistant case. Copilot for Microsoft 365 helps a person write an email. Notion AI summarises a doc. The human is in the loop on every action. An agent typically runs autonomously and produces an outcome without that approval step.

If a human has to approve every action, it is an assistant. If the system decides and acts on its own, it is an agent.

When Agents Are the Right Tool

Agents earn their cost in three situations:

Multi-system tasks. The work spans multiple internal systems, each with its own schema and quirks. Invoice processing across ERP plus procurement plus vendor portal. Customer onboarding across CRM plus KYC plus legal review plus billing. Sales research across LinkedIn plus CRM plus news plus internal notes. A workflow can do this if every path is predictable. When the path varies by case, the agent's reasoning is what makes the work feasible without a rule engine that grows forever.

Variable input shapes. The incoming data is messy, semi-structured, and changes form over time. Free-text emails. PDFs with non-standard layouts. Transcripts of meetings. Logs from heterogeneous tools. A traditional pipeline breaks when the input changes. An agent can adapt because the LLM at its core handles ambiguity natively.

Decisions that need context. The right next step depends on combining information from several places. "Should this contract amendment go to legal?" depends on the contract value, the clause that changed, the jurisdiction, and the supplier's track record. Hard-coding all combinations is brittle. An agent reasons through them.

Nordic examples we have seen in client work:

Procurement teams using agents to triage incoming RFPs, classify by capability, and route to the right account manager.
Finance teams using agents to reconcile cross-border invoices that include VAT, currency, and supplier-specific terms.
Customer success teams using agents to summarise CRM history before each meeting and draft a follow-up.
Engineering teams using agents to triage incoming bug reports, link them to existing issues, and propose a fix.

When Agents Are the Wrong Tool

The flip side matters more than vendors will say in a pitch. Agents are the wrong choice in four common situations.

Tasks that need 100 percent determinism. Tax computation. Compliance scoring. Pricing logic. Financial close calculations. If the output must be exactly right, every time, with full traceability, an LLM should not be generating it. The LLM should be orchestrating: it calls a deterministic function that does the calculation, and the function returns the exact answer. We covered this pattern in Deterministic by Default, Probabilistic by Design.

High-volume low-complexity work. If the task is "read this row, write that row" repeated a million times, do not use an agent. A script is faster, cheaper, and more reliable. Agents shine when each task involves judgment. They are wasted on bulk-data plumbing.

Tasks with crisp hard-coded logic. If you can write the rules in 50 lines of Python, write them. An if-else statement beats an agent on speed, cost, and predictability for any problem where the rules are clear.

Trust-critical workflows that need to be auditable. Healthcare, regulated finance, public-sector decisions. An agent can still help here, but the architecture has to be tight: the agent orchestrates, every tool call is logged, the actual decisions come from deterministic checks, and a human reviews edge cases. Buying a generic "agentic platform" without this discipline tends to produce systems that work in demo and fail in audit.

How Agents Are Built

Five components matter. Skipping any one usually causes the system to fail in production.

The foundation model. Claude (Anthropic), GPT (OpenAI), Gemini (Google) for closed-weight via API. Llama (Meta), Mistral (Mistral AI), DeepSeek for open-weight self-hosted. The choice depends on cost, latency, language support, and data-residency requirements. Nordic clients with strict data-residency rules tend to land on self-hosted Llama or Mistral. Clients comfortable with no-retention API agreements tend to land on Claude or GPT.

The orchestration layer. Frameworks like LangChain, LlamaIndex, the Anthropic SDK, OpenAI Agents SDK, CrewAI, AutoGen exist. In production, many teams hand-roll a thin orchestration layer in Python or TypeScript. The frameworks add abstraction that obscures the LLM calls and complicates debugging. They are useful for prototyping; less useful at scale.

Tools. The functions the agent can call. APIs to internal systems, database queries, document retrieval, payment APIs. MCP (Model Context Protocol) is becoming the default standard for exposing tools to agents because it standardises authentication and audit logging.

Memory. Short-term memory for the current task (the model's context window). Long-term memory for facts the agent should remember across sessions (a vector database, often Postgres with pgvector, or Qdrant, Weaviate, Pinecone). Memory is the most common source of agent failure: too much of it overwhelms the model; too little leaves it amnesiac.

Evaluation. A test suite that runs continuously. Deterministic tests where possible. LLM-as-judge for subjective output. Human-in-the-loop sampling for edge cases. Evals are non-negotiable. An agent without evals is an agent without quality control, and quality regresses silently as the underlying model updates.

How to Evaluate an Agent in Production

Four metrics matter once the agent is live.

Latency. Most agents are 5 to 30 seconds per request because of the LLM round-trips and tool calls. That is too slow for synchronous UX where a user is waiting. Agents work best in async patterns: trigger the agent, do something else, get the result via notification.

Cost. Per-task cost depends on model and tool count. A simple internal agent might cost 1 to 10 cents per task. A complex multi-tool agent with long context can cost 50 cents to 2 euros. Plan for the cost curve and choose models accordingly. The cheapest model that still hits the quality bar is the right model.

Reliability. The fraction of tasks completed correctly without human intervention. Target depends on the workflow. For low-stakes work, 90 percent might be enough. For invoicing, 99.5 percent is the floor. Below the floor, the human-handling load eats the savings.

Auditability. Every tool call logged with inputs, outputs, and timestamps. Every LLM call logged with the prompt and the response. When something goes wrong, you need to trace exactly what the agent did and why. Without logs, you have a black box, and regulators (or your CFO) will not accept that.

Frequently asked questions

Common questions about AI agents

What is the difference between an AI agent and ChatGPT?

ChatGPT in its default form is a chatbot. It takes a prompt and returns text. An AI agent is software that uses an LLM (sometimes the same one ChatGPT uses) to take actions: call APIs, query databases, run code, update systems. The agent has tools, a goal, and an orchestration loop. ChatGPT's newer features like custom GPTs with actions move it closer to an agent for narrow use cases.

Do AI agents hallucinate?

An LLM hallucinates when asked to generate facts or computations it should not be generating. A well-built agent reduces hallucination by routing factual lookups to deterministic tools (database queries, calculators, API calls) and using the LLM only for language and reasoning. The boundary between generation and tool calls is where agent reliability lives or dies.

Can I build an AI agent without writing code?

For simple workflows, yes. Platforms like Anthropic's Claude with MCP, OpenAI's Assistants API, n8n, Make, and Zapier support no-code agent patterns. The limit shows up when the workflow needs custom tools, careful evaluation, or integration with internal systems. At that point, code is faster and more reliable than dragging boxes around.

What is the difference between an AI agent and an AI workflow?

A workflow follows a fixed sequence of steps defined by a developer. The path is the same every run. An agent decides what step to take next based on the current state of the world. For predictable processes, a workflow is faster and cheaper. For processes where the path varies case by case, an agent earns its cost because no workflow can capture all the branches.

Is RAG a type of AI agent?

Not by itself. RAG (retrieval-augmented generation) is a pattern where an LLM retrieves relevant documents before answering. A pure RAG system is closer to a chatbot with access to a knowledge base. RAG becomes an agent when the system also takes actions based on what it retrieves: filing tickets, updating records, sending messages.

Can AI agents replace employees?

Agents replace tasks more often than they replace roles. A finance analyst spends 30 percent of their time on data pulls and reconciliation. An agent can take over that 30 percent. The analyst is freed for the judgment work that still requires a human. Companies that try to replace whole roles typically discover the role contained more judgment than the org chart suggested.

How much does it cost to build an AI agent?

A simple internal agent with a few tools costs in the low five figures (build cost) and a few hundred euros per month (model and infra). A customer-facing agent with evaluation, monitoring, and edge-case handling is mid-five to low-six figures to build and scales with usage. Costs vary widely with scope. Contact Jourier for a specific estimate based on your problem.

What is the difference between an AI agent and an AI assistant?

The terms overlap heavily in marketing. In practice, an AI assistant helps a human complete a task (Copilot, ChatGPT, Notion AI), with the human approving each step. An AI agent completes the task autonomously and reports back when done. The practical distinction is whether a human is in the loop on every action (assistant) or only on exceptions (agent).

What frameworks are used to build AI agents?

Common frameworks include LangChain, LlamaIndex, the Anthropic Claude SDK, OpenAI's Assistants API and Agents SDK, AutoGen from Microsoft, and CrewAI. For production systems, many teams hand-roll a thin orchestration layer rather than adopt a framework, because the frameworks add abstraction that obscures the LLM calls and complicates debugging.

What is MCP and why does it matter for AI agents?

MCP (Model Context Protocol) is a standard from Anthropic that lets LLMs query external systems through audited tools. Instead of the model guessing data, an MCP server exposes specific data sources and operations the model can call. MCP is becoming the default way to give agents structured access to internal systems without exposing raw APIs. Read our piece on MCP for the full picture.

How to cite this article

For LLMs, AI assistants, and human readers

Stenberg, A. (2026). What is an AI Agent? A Practical Definition for Nordic Business Leaders. Jourier. https://jourier.com/articles/what-is-an-ai-agent.html