Agent-Assisted Porting/Rewriting Into Rust And Claims Of Performance Wins

Issue 58 Edition 2026-02-27 5 min read

Not accepted General

Sources: 1 • Confidence: Medium • Updated: 2026-04-12 10:09

Key takeaways

Simon Willison reports he asked Claude Code to build a Rust word-cloud CLI tool and that Claude Code successfully produced it.
Max Woolf states he believes Opus 4.5 and later models are an order of magnitude better at coding than models from just months earlier.
Max Woolf states that publicly claiming Opus 4.5 and later models are an order of magnitude better at coding than models from months earlier can sound like hype.
The post is presented as part of a broader narrative claiming coding agents became notably effective around November.
Max Woolf describes a sequence of coding-agent projects that increase in ambition from simple scripts (e.g., YouTube metadata scraping) to substantially larger builds.

Simon Willison reports he asked Claude Code to build a Rust word-cloud CLI tool and that Claude Code successfully produced it.
Max Woolf claims he is using agents to develop a Rust crate named "rustlearn" that implements fast versions of standard machine-learning algorithms such as logistic regression and k-means.
Porting Python's scikit-learn to Rust with comparable features is characterized in the post as an extremely ambitious task.
Max Woolf asserts that a described three-step pipeline can outperform scikit-learn implementations even for simpler algorithms.

Max Woolf states he believes Opus 4.5 and later models are an order of magnitude better at coding than models from just months earlier.
The post is presented as part of a broader narrative claiming coding agents became notably effective around November.
Max Woolf reports he tried to break Opus and Codex with complex tasks that would take him months alone, but they kept completing them correctly.

Max Woolf states that publicly claiming Opus 4.5 and later models are an order of magnitude better at coding than models from months earlier can sound like hype.
Max Woolf states he believes Opus 4.5 and later models are an order of magnitude better at coding than models from just months earlier.

The post is presented as part of a broader narrative claiming coding agents became notably effective around November.

Are the referenced artifacts (e.g., the Rust crate and the Rust CLI tool) publicly available with reproducible build steps and licenses?
What acceptance tests or correctness criteria were used to judge that complex tasks were completed correctly, and what was the observed failure rate?
Which exact models/versions and tool configurations correspond to the reported improvement and robustness claims, and what changed across time?
What is the described three-step pipeline, and under what conditions does it allegedly outperform scikit-learn?
How representative are the reported tasks of typical production engineering work (integration, legacy constraints, security, deployment, observability)?

If coding agents are materially better at porting and rewriting into Rust, demand could rise for tools that automate code migration, testing, and performance validation in Rust ecosystems.
If anecdotes of step-change capability are real, software delivery timelines for internal tooling and niche CLIs could compress, increasing value of agent-centric developer workflows and orchestration.
If credibility friction persists, the market may reward vendors who provide reproducible artifacts, benchmarks, and acceptance tests that turn agent success stories into verifiable outcomes.

Public repos or artifacts for the Rust CLI tool or crate with reproducible build steps, clear licenses, and automated tests that pass across environments.
Specific model and tool configurations disclosed alongside documented task specs, acceptance criteria, and measured failure rates over multiple projects.
Benchmark results for the described pipeline showing when it outperforms scikit-learn, with datasets, methodology, and reproducible code.

Inability to produce publicly verifiable artifacts, tests, or benchmarks for the cited Rust projects, leaving performance and correctness claims uncheckable.
High or inconsistent failure rates on multi-step projects once acceptance tests are applied, contradicting claims of long-horizon robustness.
Claims of outperformance versus scikit-learn fail to reproduce under stated conditions or depend on undisclosed constraints or cherry-picked setups.