Training-Data Representation Vs Agent-Harness Mitigation

Issue 68 Edition 2026-03-09 6 min read

General

Sources: 1 • Confidence: Medium • Updated: 2026-03-10 08:29

Key takeaways

If a coding agent is prompted to run a new tool's "--help" and similar commands, the returned text can provide enough documentation within modern context windows for the agent to use brand new tools effectively.
In the author's experience, coding agents did not materially constrain their technology choices toward a "Choose Boring Technology" stack.
A rapidly adopted "Skills" mechanism in coding-agent tools enables projects to publish official skills that help agents use their products.
With the latest models running in strong coding-agent harnesses, training-data representation may no longer be the dominant determinant of how well models help with a given tool or language.
In codebases that use private or very new libraries not present in training data, coding agents can still make progress by learning patterns from existing code examples and then iterating and testing to close gaps.

If a coding agent is prompted to run a new tool's "--help" and similar commands, the returned text can provide enough documentation within modern context windows for the agent to use brand new tools effectively.
In codebases that use private or very new libraries not present in training data, coding agents can still make progress by learning patterns from existing code examples and then iterating and testing to close gaps.
A couple of years ago, LLMs appeared to perform better on questions about Python or JavaScript than on questions about less widely used programming languages.

In the author's experience, coding agents did not materially constrain their technology choices toward a "Choose Boring Technology" stack.
A referenced study titled "What Claude Code Actually Chooses" reports that after over 2,000 prompts, Claude Code showed a strong build-over-buy bias and a preferred stack in which GitHub Actions, Stripe, and shadcn/ui had a near monopoly within their categories.
Technology recommendation bias by LLMs and agent performance under a human-imposed technology choice are distinct questions and should be evaluated separately.

A rapidly adopted "Skills" mechanism in coding-agent tools enables projects to publish official skills that help agents use their products.

Do modern coding agents still show systematically higher success rates on widely represented languages/tools compared to low-representation or novel tools, when evaluated under the same harness and workflow?
How much of agent competence on brand-new tools can be explained by pulling local documentation into context (e.g., "--help"/manpages) versus reliance on pretrained knowledge?
For private or novel internal libraries, what iteration counts and test-pass rates do coding agents achieve in practice, and what failure modes dominate?
How stable are tool/vendor recommendation biases (including build-over-buy bias and stack monopolies) across model versions, different coding-agent products, and different prompt/harness defaults?
What is the measurable impact of publishing official "Skills" on agent success rates and on downstream tool/vendor selection frequency?

Coding agent performance may become less dependent on pretrained coverage as harnesses pull documentation into context and iterate against tests, potentially improving tooling parity for newer or niche developer products.
A Skills distribution layer could become a meaningful channel for developer tools to improve agent usability and influence which tools get used, independent of training-data representation.
Agent recommendation concentration and build-over-buy bias may shape tool adoption even when agents can execute on diverse stacks, impacting competitive dynamics among vendors.

Benchmarks under the same harness show reduced performance gaps between widely represented and low-representation languages or tools, especially after adding local docs and iterative testing.
Measured lift in agent task success rates after vendors publish official Skills, and an observable increase in tool usage frequency in agent workflows tied to those Skills.
Stable evidence across models and products that agent-generated stack recommendations concentrate on a small set of vendors and favor building over buying.

Controlled evaluations show training-data representation remains the dominant driver of agent success even with strong harnesses, doc retrieval, and test-driven iteration.
Skills publication shows little or no improvement in agent success rates or does not change downstream tool selection frequency in real workflows.
Recommendation concentration and build-over-buy bias fail to replicate across model versions, agent products, or default harness settings, indicating weak or transient effects.