Training-Data Bias Vs Agent-Era Mitigation Mechanisms

Issue 68 Edition 2026-03-09 6 min read

General

Sources: 1 • Confidence: High • Updated: 2026-04-12 10:23

Key takeaways

The author reports being uncertain that training-data representation still determines how well models help with a given tool or language when using strong coding-agent harnesses.
A referenced study titled "What Claude Code Actually Chooses" reports that after more than 2,000 prompts, Claude Code showed a strong build-over-buy bias and a preferred stack where GitHub Actions, Stripe, and shadcn/ui had a near monopoly in their categories.
A rapidly adopted "Skills" mechanism in coding-agent tools is enabling projects to publish official skills that help agents use their products.
The author reports being surprised that coding agents do not materially constrain their technology choices or push them toward a "Choose Boring Technology" approach.
If a coding agent is prompted to run a new tool's "--help" and similar commands, the resulting documentation can fit within modern context windows and be sufficient for the agent to use the tool effectively.

The author reports being uncertain that training-data representation still determines how well models help with a given tool or language when using strong coding-agent harnesses.
The author reports being surprised that coding agents do not materially constrain their technology choices or push them toward a "Choose Boring Technology" approach.
If a coding agent is prompted to run a new tool's "--help" and similar commands, the resulting documentation can fit within modern context windows and be sufficient for the agent to use the tool effectively.
In codebases using private or very new libraries absent from training data, coding agents can still work by learning patterns from existing examples and iterating and testing their output.
LLM-assisted programming may steer technology choices toward tools that are well represented in training data, making adoption harder for newer tools.
A couple of years ago, models appeared to perform better when asked about Python or JavaScript than when asked about less widely used languages.

A referenced study titled "What Claude Code Actually Chooses" reports that after more than 2,000 prompts, Claude Code showed a strong build-over-buy bias and a preferred stack where GitHub Actions, Stripe, and shadcn/ui had a near monopoly in their categories.
The author distinguishes between what technology LLMs recommend and how well agents perform when humans choose a different technology than the model or harness would prefer.

A rapidly adopted "Skills" mechanism in coding-agent tools is enabling projects to publish official skills that help agents use their products.

Across model generations and harnesses, how strongly does training-data representation predict agent task success when forced to use low-representation or novel tools?
How much do agent tool recommendations (preferred stacks, build-over-buy bias) actually influence real-world technology adoption, procurement, or architecture decisions?
When agents are bootstrapped via CLI help/manpages alone, what are the observed error modes and task boundaries (e.g., configuration, authentication, edge-case flags)?
In repositories with private or new libraries, what iteration count and test pass rates are typical for agent-generated changes, and how does this vary by codebase quality and test coverage?
Does publishing official Skills measurably improve agent success rates and/or increase selection of the corresponding tool compared to alternatives without Skills?

Coding agent ecosystems may concentrate tool and vendor selection around a few default stacks, potentially shaping demand toward those vendors if agent recommendations influence real adoption decisions.
Publishable official Skills could become a new distribution channel for developer tooling, with vendors that ship high quality Skills gaining better agent success and potentially higher selection rates.
If agents can reliably bootstrap unfamiliar tools via CLI help and repo examples, training data coverage may matter less for execution, reducing moat from language or tool representation and increasing competition among tools.

Independent measurements show agent tool recommendations strongly correlate with real world tool adoption, procurement, or architecture choices across teams using coding agents.
Usage data from agent platforms shows official Skills measurably increase task success rates and increase selection of the corresponding tool versus alternatives without Skills.
Benchmarks across model generations show stable success on low representation or novel tools when constrained, supported by CLI help and tests with acceptable iteration counts and pass rates.

Studies show agent recommendation biases do not translate into real tool adoption decisions, with humans overriding agent suggested stacks most of the time.
Official Skills adoption fails to improve agent outcomes or does not change tool selection, indicating minimal distribution impact.
When limited to CLI help and repo examples, agents show frequent failure modes in configuration, authentication, or edge cases, implying training data coverage still dominates execution quality.