Rosa Del Mar

Daily Brief

Issue 58 2026-02-27

Agentic Coding Applied To Non-Trivial Rust Builds

Issue 58 Edition 2026-02-27 5 min read
Not accepted General
Sources: 1 • Confidence: Medium • Updated: 2026-03-02 19:33

Key takeaways

  • Max Woolf describes a sequence of coding-agent projects that increase in ambition from simple scripts to substantially larger builds.
  • Max Woolf says he believes Opus 4.5 and later models are an order of magnitude better at coding than models from months earlier, and that making such a claim publicly is difficult without sounding like hype.
  • Max Woolf reports that he tried to break Opus and Codex with complex tasks that would take him months alone, but that they kept completing them correctly.
  • The post is presented within a broader narrative that coding agents became notably effective around November.
  • Simon Willison reports that he asked Claude Code to build a Rust word-cloud CLI tool and that Claude Code successfully produced it.

Sections

Agentic Coding Applied To Non-Trivial Rust Builds

  • Max Woolf describes a sequence of coding-agent projects that increase in ambition from simple scripts to substantially larger builds.
  • Simon Willison reports that he asked Claude Code to build a Rust word-cloud CLI tool and that Claude Code successfully produced it.
  • Max Woolf states that, using agents, he is developing a Rust crate called "rustlearn" that implements fast versions of standard machine-learning algorithms including logistic regression and k-means.
  • Max Woolf frames porting scikit-learn to Rust with comparable features as an extremely ambitious task.

Perceived Recent Inflection And Skepticism/Credibility Gap

  • Max Woolf says he believes Opus 4.5 and later models are an order of magnitude better at coding than models from months earlier, and that making such a claim publicly is difficult without sounding like hype.
  • The post is presented within a broader narrative that coding agents became notably effective around November.

Claims Of Performance And Long-Horizon Correctness

  • Max Woolf reports that he tried to break Opus and Codex with complex tasks that would take him months alone, but that they kept completing them correctly.
  • Max Woolf asserts that his described three-step pipeline can outperform scikit-learn implementations even for simpler algorithms.

Watchlist

  • The post is presented within a broader narrative that coding agents became notably effective around November.

Unknowns

  • Are the referenced artifacts (e.g., the "rustlearn" crate and the Rust word-cloud CLI) publicly available with reproducible build steps, tests, and CI?
  • What were the exact task specifications, acceptance criteria, and observed failure rates for the reported long-horizon 'months of work' tasks?
  • What is the three-step pipeline referenced for the performance claim, and how does it control for fairness versus scikit-learn (same algorithmic variants, convergence criteria, and preprocessing)?
  • What concrete measurements support the narrative that coding agents became notably effective around November (benchmarks, real-world task completion rates, or internal productivity metrics)?
  • Which exact model versions and environments are being compared when claiming order-of-magnitude coding improvements, and are the comparisons controlled?

Sources