Training-Data Representation Vs Agent Execution Capability
Sources: 1 • Confidence: High • Updated: 2026-04-13 03:57
Key takeaways
- The author reports being unsure whether training-data representation still determines how well current models, running in strong coding-agent harnesses, can help with a given tool or language.
- A referenced study titled "What Claude Code Actually Chooses" reportedly found that, across more than 2,000 prompts, Claude Code showed a strong build-over-buy bias and a preferred stack where GitHub Actions, Stripe, and shadcn/ui had near-monopoly positions in their categories.
- A rapidly adopted "Skills" mechanism in coding-agent tools is enabling projects to publish official skills that help agents use their products.
- The author reports being surprised that coding agents do not seem to push them toward a "Choose Boring Technology" approach and instead do not materially constrain their technology choices.
- Prompting a coding agent to run a new tool's "--help" (and similar commands) can provide enough documentation inside modern context windows for the agent to effectively use brand new tools.
Sections
Training-Data Representation Vs Agent Execution Capability
- The author reports being unsure whether training-data representation still determines how well current models, running in strong coding-agent harnesses, can help with a given tool or language.
- The author reports being surprised that coding agents do not seem to push them toward a "Choose Boring Technology" approach and instead do not materially constrain their technology choices.
- Prompting a coding agent to run a new tool's "--help" (and similar commands) can provide enough documentation inside modern context windows for the agent to effectively use brand new tools.
- In codebases using private or very new libraries absent from training data, coding agents can still work by learning patterns from existing examples and iterating and testing their output.
- A couple of years ago, LLMs appeared to perform better when asked about Python or JavaScript than when asked about less widely used programming languages.
Recommendation Bias Vs Performance-Under-Constraints
- A referenced study titled "What Claude Code Actually Chooses" reportedly found that, across more than 2,000 prompts, Claude Code showed a strong build-over-buy bias and a preferred stack where GitHub Actions, Stripe, and shadcn/ui had near-monopoly positions in their categories.
- The author distinguishes between what technologies LLMs recommend and how well agents perform when humans choose different technologies than the model or harness would prefer.
Agent Integrations As An Ecosystem Distribution Channel
- A rapidly adopted "Skills" mechanism in coding-agent tools is enabling projects to publish official skills that help agents use their products.
Unknowns
- Across current coding-agent harnesses, how much does training-data representation still affect task success rates for low-popularity languages and tools when controlling for documentation availability and test coverage?
- When agents recommend a concentrated default stack, how often do teams follow those recommendations in real workflows, and does that meaningfully change procurement or architecture decisions?
- Do official "Skills" measurably improve agent success rates (time-to-first-working-change, iterations to pass tests, error rates) compared with relying on CLI help and in-repo examples alone?
- Under what conditions do the described mechanisms fail (e.g., sparse docs, missing tests, ambiguous CLI output), and how frequently do those conditions occur in typical enterprise or open-source repositories?
- Is the reported build-over-buy bias stable across prompt wording, organizational constraints (approved vendors), and model versions, or is it highly sensitive to evaluation setup?