Training-Data Representation Vs Agent Execution Capability

Issue 68 Edition 2026-03-09 7 min read

General

Sources: 1 • Confidence: High • Updated: 2026-04-13 03:57

Key takeaways

The author reports being unsure whether training-data representation still determines how well current models, running in strong coding-agent harnesses, can help with a given tool or language.
A referenced study titled "What Claude Code Actually Chooses" reportedly found that, across more than 2,000 prompts, Claude Code showed a strong build-over-buy bias and a preferred stack where GitHub Actions, Stripe, and shadcn/ui had near-monopoly positions in their categories.
A rapidly adopted "Skills" mechanism in coding-agent tools is enabling projects to publish official skills that help agents use their products.
The author reports being surprised that coding agents do not seem to push them toward a "Choose Boring Technology" approach and instead do not materially constrain their technology choices.
Prompting a coding agent to run a new tool's "--help" (and similar commands) can provide enough documentation inside modern context windows for the agent to effectively use brand new tools.

The author reports being unsure whether training-data representation still determines how well current models, running in strong coding-agent harnesses, can help with a given tool or language.
The author reports being surprised that coding agents do not seem to push them toward a "Choose Boring Technology" approach and instead do not materially constrain their technology choices.
Prompting a coding agent to run a new tool's "--help" (and similar commands) can provide enough documentation inside modern context windows for the agent to effectively use brand new tools.
In codebases using private or very new libraries absent from training data, coding agents can still work by learning patterns from existing examples and iterating and testing their output.
A couple of years ago, LLMs appeared to perform better when asked about Python or JavaScript than when asked about less widely used programming languages.

A referenced study titled "What Claude Code Actually Chooses" reportedly found that, across more than 2,000 prompts, Claude Code showed a strong build-over-buy bias and a preferred stack where GitHub Actions, Stripe, and shadcn/ui had near-monopoly positions in their categories.
The author distinguishes between what technologies LLMs recommend and how well agents perform when humans choose different technologies than the model or harness would prefer.

A rapidly adopted "Skills" mechanism in coding-agent tools is enabling projects to publish official skills that help agents use their products.

Across current coding-agent harnesses, how much does training-data representation still affect task success rates for low-popularity languages and tools when controlling for documentation availability and test coverage?
When agents recommend a concentrated default stack, how often do teams follow those recommendations in real workflows, and does that meaningfully change procurement or architecture decisions?
Do official "Skills" measurably improve agent success rates (time-to-first-working-change, iterations to pass tests, error rates) compared with relying on CLI help and in-repo examples alone?
Under what conditions do the described mechanisms fail (e.g., sparse docs, missing tests, ambiguous CLI output), and how frequently do those conditions occur in typical enterprise or open-source repositories?
Is the reported build-over-buy bias stable across prompt wording, organizational constraints (approved vendors), and model versions, or is it highly sensitive to evaluation setup?

Agent tool vendors may gain leverage by shipping official Skills integrations that make their products easier for agents to use, shifting competition from training data representation toward agent consumable interfaces and guidance.
If coding agents recommend a concentrated default stack, the most frequently suggested tools may see increased mindshare and potential usage through workflow defaulting, even if agents can execute on alternative stacks when constrained.
Strong agent harnesses plus CLI help and in repo pattern learning may reduce switching costs for adopting new or less popular tools, potentially weakening incumbency advantages tied to training data representation.

Independent measurements show official Skills materially improve agent outcomes such as time to first working change, iterations to pass tests, and error rates versus relying on CLI help and repository examples.
Workflow data shows teams frequently follow agent recommended stacks and that this changes procurement or architecture choices, rather than remaining advisory or being overridden by governance constraints.
Controlled evaluations show similar task success across low popularity and mainstream tools when documentation is accessible in context and tests are present, indicating execution capability dominates training data representation.

Skills adoption stalls or does not improve measurable agent success rates versus baseline methods, suggesting integrations are not a durable distribution or differentiation channel.
Teams routinely ignore agent stack recommendations, or governance constraints dominate choices, resulting in no observable shift in tool adoption or procurement tied to agent suggestions.
Under common conditions such as sparse docs, missing tests, or ambiguous CLI output, agent performance degrades sharply and correlates strongly with prior training data representation, limiting practical impact.