Definitions-And-Scope-Of-Agentic-Engineering

Issue 74 Edition 2026-03-15 6 min read

General

Sources: 1 • Confidence: High • Updated: 2026-03-17 15:15

Key takeaways

The term "agent" is difficult to define and has frustrated AI researchers since at least the 1990s.
An agent runs tools in a loop to achieve a goal.
Code execution is the defining capability enabling agentic engineering because it allows iteration toward demonstrably working software rather than unvalidated outputs.
Even if agents can write working code, software engineering still requires navigating solution options and tradeoffs to decide what code to write for specific circumstances and requirements.
It is a mistake to define vibe coding as any LLM-generated code; the term is most useful for unreviewed, prototype-quality code not yet brought to production-ready standards.

The term "agent" is difficult to define and has frustrated AI researchers since at least the 1990s.
Agentic engineering is the practice of developing software with the assistance of coding agents.
Coding agents can both write and execute code.
Examples of coding agents include Claude Code, OpenAI Codex, and Gemini CLI.

An agent runs tools in a loop to achieve a goal.
In an LLM-based agent, software calls an LLM with a prompt and tool definitions, executes requested tools, and feeds tool results back into the LLM.

Code execution is the defining capability enabling agentic engineering because it allows iteration toward demonstrably working software rather than unvalidated outputs.
Getting strong results from coding agents requires providing appropriate tools, specifying problems at an appropriate level of detail, and verifying and iterating on outputs until they are robust and credible.

Even if agents can write working code, software engineering still requires navigating solution options and tradeoffs to decide what code to write for specific circumstances and requirements.
LLMs do not learn from past mistakes, but coding agent performance can improve if humans deliberately update instructions and tool harnesses based on lessons learned.

It is a mistake to define vibe coding as any LLM-generated code; the term is most useful for unreviewed, prototype-quality code not yet brought to production-ready standards.
The term "vibe coding" was coined by Andrej Karpathy in February 2025 to describe prompting LLMs to write code while the user forgets that the code exists.

The guide "Agentic Engineering Patterns" is a work in progress that will add chapters and update existing ones as techniques and understanding evolve.

What measurable changes in throughput, defect rates, and rework (before/after) occur when teams adopt coding agents with execution and verification loops?
What specific verification harnesses, tests, or evaluation gates are required to make agent-written code production-credible, and how costly are they to build and maintain?
What are the dominant bottlenecks when scaling agentic engineering: tool reliability, execution environment constraints, review capacity, or requirements/specification quality?
How should organizations operationally distinguish prototype "vibe coding" from production-ready agentic engineering (e.g., required reviews, tests, documentation, ownership)?
How should teams structure the human feedback loop for improvement (instruction updates, harness/tool changes), and what artifacts should be versioned and audited over time?

Agentic engineering spend may concentrate on execution and verification infrastructure since iterative code execution is framed as the enabling capability and production credibility depends on tests and harnesses rather than one shot code generation.
Organizations may formalize governance boundaries separating prototype vibe coding from production agentic workflows, creating demand for review, documentation, ownership, and gating processes that operationalize that distinction.
Product differentiation may shift from agent labeling to measurable capabilities in tool reliability, loop iteration quality, and evaluation gates, since the term agent is ambiguous and assessment is urged to be capability based.

Published before and after metrics from teams adopting coding agents with execution and verification loops showing throughput, defect rate, and rework changes, aligned to the unknowns called out.
Evidence that teams are investing in and standardizing verification harnesses, tests, and evaluation gates to make agent written code production credible, including clarity on build and maintenance cost.
Process artifacts and governance that explicitly separate prototype vibe coding from production, including required reviews, tests, documentation, ownership, and auditable versioned harness and instruction updates.

Empirical results show no measurable improvement or worse outcomes in throughput, defect rates, or rework after adopting coding agents even with execution and verification loops.
Verification harnesses and evaluation gates prove too costly or brittle to build and maintain, preventing agent written code from reaching production credible standards at scale.
Scaling is dominated by unresolved bottlenecks such as tool unreliability, constrained execution environments, limited review capacity, or poor requirements quality, limiting practical deployment despite working demos.