Agentic Coding Outputs And Scope Expansion

Issue 58 Edition 2026-02-27 6 min read

Not accepted General

Sources: 1 • Confidence: Medium • Updated: 2026-04-13 03:43

Key takeaways

Max Woolf describes a sequence of coding-agent projects that increase in ambition from simple YouTube metadata scrapers to substantially larger builds.
Max Woolf states that he believes Opus 4.5 and later models are an order of magnitude better at coding than models from just months earlier, while also stating that making that claim publicly can sound like hype.
Max Woolf claims that, using agents, he is developing a Rust crate named "rustlearn" that implements fast versions of standard machine-learning algorithms including logistic regression and k-means.
Max Woolf reports attempting to break Opus and Codex with complex tasks that would take him months alone, and reports that the models kept completing those tasks correctly.
The post is positioned within a growing genre asserting that coding agents became notably effective around November, implying a perceived recent inflection in capability.

Max Woolf describes a sequence of coding-agent projects that increase in ambition from simple YouTube metadata scrapers to substantially larger builds.
Simon Willison reports that Claude Code successfully produced a Rust word-cloud CLI tool after he asked it to build one.
Max Woolf frames porting scikit-learn to Rust with comparable features as an extremely ambitious task.

Max Woolf states that he believes Opus 4.5 and later models are an order of magnitude better at coding than models from just months earlier, while also stating that making that claim publicly can sound like hype.
The post is positioned within a growing genre asserting that coding agents became notably effective around November, implying a perceived recent inflection in capability.

Max Woolf claims that, using agents, he is developing a Rust crate named "rustlearn" that implements fast versions of standard machine-learning algorithms including logistic regression and k-means.
Max Woolf asserts that his described three-step pipeline can outperform scikit-learn implementations even for simpler algorithms.

Max Woolf reports attempting to break Opus and Codex with complex tasks that would take him months alone, and reports that the models kept completing those tasks correctly.

The post is positioned within a growing genre asserting that coding agents became notably effective around November, implying a perceived recent inflection in capability.

Are the referenced artifacts (the "rustlearn" crate and the Rust word-cloud CLI) publicly available with reproducible build steps, tests, and CI?
What exactly is the described three-step pipeline, and under what conditions does it outperform scikit-learn (datasets, metrics, hardware, hyperparameters, preprocessing)?
What are the acceptance criteria for "completing correctly" on the months-scale tasks, and how often do such tasks fail without human patching?
Is there objective evidence for the claimed timing and magnitude of a coding-agent capability inflection (e.g., benchmark deltas over time), rather than a narrative impression?
What operational constraints dominate these workflows (tooling setup, context management, rate limits, model costs, review practices), and how do they scale with project size?

If coding agents are materially improving and being used for larger projects, demand could rise for developer tooling that supports agent workflows such as orchestration, testing, and code review, contingent on reproducible outcomes beyond anecdotes.
If agent-assisted porting of ML algorithms into Rust is credible and faster than prior workflows, there may be increased interest in performance oriented systems language ML tooling, contingent on public code and benchmark validation.
If a recent inflection in coding agent capability is real, organizations may expand scope of automation to longer horizon engineering tasks, increasing need for governance and verification tooling, contingent on measured reliability and failure rates.

Public availability of the referenced Rust artifacts with reproducible builds, tests, and CI that demonstrate correctness on defined tasks with minimal human patching.
Transparent benchmarks showing when and how the Rust ML crate outperforms scikit-learn, including datasets, metrics, hardware, preprocessing, and hyperparameters.
Objective time series evidence of coding agent capability improvement around the claimed period, such as benchmark deltas or tracked task success rates across model versions.

Artifacts are not publicly available or lack reproducible build steps, tests, and CI, making performance and correctness claims unverifiable.
Benchmarks fail to reproduce or show no consistent advantage versus scikit-learn once conditions are specified, or require extensive manual intervention.
Measured evaluations show no step change in capability or show high failure rates on long horizon tasks without substantial human steering and correction.