Devtools Operating Constraints: Backlog Pressure And Dual-Track Ai/Non-Ai Support

Issue 58 Edition 2026-02-27 8 min read

General

Sources: 1 • Confidence: Medium • Updated: 2026-03-02 13:06

Key takeaways

Burke Holland states that teams building reliability-critical software must develop internal AI workflows that preserve strict quality bars because they cannot tolerate regressions where core functionality breaks.
Burke Holland reports that the model referred to as Opus 4.5 was an inflection point for his coding workflow because it could one-shot native Windows tooling with well-structured code compared to earlier models.
Burke Holland states his greenfield workflow starts with an interactive plan mode to surface missing requirements before implementation.
Adam Stacoviak argues that critiques of agents failing at "software" often miss that different classes of software have different benchmarks, with personal/toy software differing from SLA-bound production systems.
Chris Kelly disputes media narratives equating fast revenue growth with business health when that growth is driven by pass-through payments to model providers.

Sections

Devtools Operating Constraints: Backlog Pressure And Dual-Track Ai/Non-Ai Support

Burke Holland states that teams building reliability-critical software must develop internal AI workflows that preserve strict quality bars because they cannot tolerate regressions where core functionality breaks.
Burke Holland states VS Code's open issue backlog grew from about 8,000 to about 15,000 year-over-year.
Burke Holland states the GitHub Copilot CLI reached general availability, making it accessible to enterprise customers that avoid preview releases.
Burke Holland states Copilot evolved from autocomplete-style ghost text about three years ago to full agentic workflows, and that VS Code moved from no agent mode a year ago to agentic capabilities including memory, deep research, and fleets of agents via Copilot CLI.
Burke Holland asserts developer tools must both pursue AI-focused updates to remain relevant and still support developers who want to be productive without AI.

Step-Change In Coding Capability Attributed To Opus 4.5

Burke Holland reports that the model referred to as Opus 4.5 was an inflection point for his coding workflow because it could one-shot native Windows tooling with well-structured code compared to earlier models.
Burke Holland reports he built a functioning prototype of a screen-capture-to-GIF tool and extended it toward basic screen video editing within a couple hours using Opus 4.5.
A roughly 2x usage increase was observed associated with Claude, and Opus 4.5 is described as a major step-function improvement.
Adam Stacoviak states he has built more software in the last six months than in his entire life and attributes a major step-function shift to Opus 4.5.

Agentic Development Requires New Process Controls (Planning, Loop Control, Orchestration)

Burke Holland states his greenfield workflow starts with an interactive plan mode to surface missing requirements before implementation.
Burke Holland posits that scaling agent-built software may require an ongoing daily-standup-style review process rather than a single handoff.
Burke Holland asserts that in Copilot Autopilot, specifying an explicit confidence threshold (e.g., 95%) can drive the agent to iterate longer than vague instructions like "until it's done."
Burke Holland describes a custom Copilot agent called Anvil that classifies tasks as easy/medium/hard and changes the workflow accordingly, including use of sub-agents and multiple models for large refactors.

Personal Software Displacing Small Saas Purchases

Adam Stacoviak argues that critiques of agents failing at "software" often miss that different classes of software have different benchmarks, with personal/toy software differing from SLA-bound production systems.
Adam Stacoviak reports he replaced an expensive invoicing service by letting an agent run unattended overnight to produce a working Rails app with authentication, invoices, email, and PDFs by morning.
Burke Holland reports he built an iOS app in an afternoon that generates Facebook post captions from sign photos using Gemini image analysis and conditions generation on the last ten accepted captions.
Burke Holland reports he replaced a paid driving-routing app by building a personal routing app and loading it directly onto a phone without App Store distribution.

Economics And Packaging: Request-Based Billing, Subsidization Risk, And Pass-Through Margins

Chris Kelly disputes media narratives equating fast revenue growth with business health when that growth is driven by pass-through payments to model providers.
Burke Holland asserts Copilot is priced around $40/month using request-based billing of about 1,500 premium requests per month, where a request may trigger many tool actions.
Chris Kelly asserts that selling AI coding assistants as discounted tokens can produce rapid revenue growth while sending most or more of the money to model providers.
Burke Holland asserts that Claude Code at $200/month can be used at extremely high volume (on the order of a billion tokens per month) and that this level of subsidization is not sustainable long-term.

Watchlist

Burke is exploring workflows where an agent works unattended and pings him (e.g., via Telegram) for decisions, but he emphasizes this requires strong checks and balances to avoid constant babysitting.
Burke warns that widespread AI code generation may devalue visible technical accomplishments and contribute to a loss of craftsmanship, raising the risk that the industry forgets how to build 'cathedral'-quality software.
Burke Holland warns that heavy agent use can become all-consuming and psychologically destabilizing, implying a need to monitor how agentic workflows affect attention, discipline, and well-being.

Unknowns

What objective benchmark or task-success data supports the claim that Opus 4.5 is a step-function improvement versus prior models for common dev tasks (web, native, refactors)?
What are the current official Copilot pricing tiers and precise definitions/limits of "premium requests" and how they map to tool actions?
Are Claude Code and comparable offerings meaningfully subsidized at current prices, and what policy or pricing changes would signal a shift toward cost being a binding constraint?
What quantitative evidence exists on quality outcomes (defects, regressions, security issues) for agent-generated code in reliability-critical products versus traditional workflows?
What tooling or practices best close the verification gap for native apps and non-browser contexts when agents generate large changes?

Investor overlay

Read-throughs

Devtools vendors may need dual-track support for AI-forward and traditional workflows, increasing operating complexity and backlog pressure, especially for reliability-critical software where regressions are unacceptable.
Agentic development may shift differentiation from base models to workflow controls such as planning, verification gates, routing, and loop control, favoring products that package process controls for high-reliability contexts.
AI product revenue growth could be overstated when driven by pass-through payments to model providers, making pricing units such as request-based billing and subsidization risk central to sustainability.

What would confirm

Devtools roadmaps and releases emphasize parallel AI and non-AI editing modes plus reliability safeguards, with disclosed metrics showing backlog stabilization or reduced regressions despite AI features.
Documented adoption of process-level controls in agent tooling such as plan-first modes, confidence thresholds, orchestration, and verification features, with reported improvements in defect and regression outcomes.
Clear disclosures of request definitions and limits plus pricing changes that tighten compute economics, alongside improved gross margins that indicate less pass-through and reduced subsidization.

What would kill

No measurable quality improvements emerge for agent-generated code in reliability-critical products, with persistent regressions or security issues leading teams to restrict AI use or revert to traditional workflows.
Request-based billing and premium request packaging fail to align with user value, driving churn, negative feedback, or increased support burden without offsetting monetization.
Revenue growth remains tightly linked to pass-through model costs without margin improvement, suggesting economics are constrained and pricing is not covering compute sustainably.

Sources

Opus 4.5 changed everything (Interview)

2026-02-27 changelog.com