Enabling Condition And Workflow Requirements: Execution And Verification

Issue 74 Edition 2026-03-15 6 min read

General

Sources: 1 • Confidence: High • Updated: 2026-04-12 10:26

Key takeaways

LLMs do not learn from past mistakes during usage, but coding-agent performance can improve if humans update instructions and tool harnesses based on lessons learned.
Agentic engineering is the practice of developing software with the assistance of coding agents.
An agent runs tools in a loop to achieve a goal.
Even if agents can write working code, software engineering still requires navigating many solution options and tradeoffs to decide what code to write for specific circumstances and requirements.
The term "vibe coding" was coined by Andrej Karpathy in February 2025 to describe prompting LLMs to write code while the user "forgets that the code even exists."

LLMs do not learn from past mistakes during usage, but coding-agent performance can improve if humans update instructions and tool harnesses based on lessons learned.
Code execution is the defining capability that enables agentic engineering by allowing iteration toward demonstrably working software rather than unvalidated outputs.
Getting strong results from coding agents requires providing appropriate tools, specifying problems at the right level of detail, and verifying and iterating on outputs until they are robust and credible.

Agentic engineering is the practice of developing software with the assistance of coding agents.
Coding agents can both write and execute code; examples include Claude Code, OpenAI Codex, and Gemini CLI.

An agent runs tools in a loop to achieve a goal.
In an LLM-based agent loop, software calls an LLM with a prompt and tool definitions, executes the requested tools, and feeds the tool results back into the LLM.

Even if agents can write working code, software engineering still requires navigating many solution options and tradeoffs to decide what code to write for specific circumstances and requirements.

The term "vibe coding" was coined by Andrej Karpathy in February 2025 to describe prompting LLMs to write code while the user "forgets that the code even exists."

The guide "Agentic Engineering Patterns" is a work in progress and will add chapters and update existing ones as techniques and understanding evolve.

What measurable changes in delivery throughput, defect rates, and maintainability occur when teams adopt coding agents under the described verification-oriented workflow?
What concrete verification harness designs (tests, sandboxes, evals) are necessary and sufficient to make agent-driven code execution safe and reliable in practice?
What are the dominant failure modes in the tool-use loop (e.g., incorrect tool invocation, compounding errors across iterations), and how should they be detected and mitigated?
How much ongoing human effort is required to maintain and improve instruction sets and tool harnesses over time?
What taxonomy or criteria would resolve the ambiguity around what counts as an "agent" and what should be labeled "vibe coding" versus production-oriented practice?

Verification oriented workflows become a gating requirement for coding agents, increasing demand for automated testing, sandboxing, eval tooling, and tool execution infrastructure tied to agent loops.
Agentic engineering shifts effort toward problem specification and evaluation, creating services and tooling opportunities around instruction set maintenance, harness updates, and workflow integration rather than purely model capability.
Terminology ambiguity around agent versus vibe coding drives market demand for clearer standards, taxonomies, and measurable performance reporting for agent tools used in production settings.

Teams report measurable improvements after adopting coding agents specifically when paired with repeated execution and verification, with metrics tracking throughput, defect rates, and maintainability.
Publishers and vendors standardize verification harness designs, including tests, sandboxes, and evals, and show they are necessary and sufficient for safe, reliable agent driven code execution.
Clear failure mode monitoring for tool use loops becomes common, with documented detection and mitigation for incorrect tool invocation and compounding errors across iterations.

Measured outcomes show no sustained throughput or quality gains from coding agents even with strong verification workflows, or maintainability worsens due to instruction and harness drift.
Verification harness requirements prove too costly or complex to maintain, with high ongoing human effort needed to keep instructions and tool interfaces effective over time.
Dominant tool loop failure modes remain unresolved or hard to detect, preventing safe, reliable execution at scale despite iterative validation.