Devtools Operating Constraints: Backlog Pressure And Dual-Track Ai/Non-Ai Support
Sources: 1 • Confidence: Medium • Updated: 2026-03-02 13:06
Key takeaways
- Burke Holland states that teams building reliability-critical software must develop internal AI workflows that preserve strict quality bars because they cannot tolerate regressions where core functionality breaks.
- Burke Holland reports that the model referred to as Opus 4.5 was an inflection point for his coding workflow because it could one-shot native Windows tooling with well-structured code compared to earlier models.
- Burke Holland states his greenfield workflow starts with an interactive plan mode to surface missing requirements before implementation.
- Adam Stacoviak argues that critiques of agents failing at "software" often miss that different classes of software have different benchmarks, with personal/toy software differing from SLA-bound production systems.
- Chris Kelly disputes media narratives equating fast revenue growth with business health when that growth is driven by pass-through payments to model providers.
Sections
Devtools Operating Constraints: Backlog Pressure And Dual-Track Ai/Non-Ai Support
- Burke Holland states that teams building reliability-critical software must develop internal AI workflows that preserve strict quality bars because they cannot tolerate regressions where core functionality breaks.
- Burke Holland states VS Code's open issue backlog grew from about 8,000 to about 15,000 year-over-year.
- Burke Holland states the GitHub Copilot CLI reached general availability, making it accessible to enterprise customers that avoid preview releases.
- Burke Holland states Copilot evolved from autocomplete-style ghost text about three years ago to full agentic workflows, and that VS Code moved from no agent mode a year ago to agentic capabilities including memory, deep research, and fleets of agents via Copilot CLI.
- Burke Holland asserts developer tools must both pursue AI-focused updates to remain relevant and still support developers who want to be productive without AI.
Step-Change In Coding Capability Attributed To Opus 4.5
- Burke Holland reports that the model referred to as Opus 4.5 was an inflection point for his coding workflow because it could one-shot native Windows tooling with well-structured code compared to earlier models.
- Burke Holland reports he built a functioning prototype of a screen-capture-to-GIF tool and extended it toward basic screen video editing within a couple hours using Opus 4.5.
- A roughly 2x usage increase was observed associated with Claude, and Opus 4.5 is described as a major step-function improvement.
- Adam Stacoviak states he has built more software in the last six months than in his entire life and attributes a major step-function shift to Opus 4.5.
Agentic Development Requires New Process Controls (Planning, Loop Control, Orchestration)
- Burke Holland states his greenfield workflow starts with an interactive plan mode to surface missing requirements before implementation.
- Burke Holland posits that scaling agent-built software may require an ongoing daily-standup-style review process rather than a single handoff.
- Burke Holland asserts that in Copilot Autopilot, specifying an explicit confidence threshold (e.g., 95%) can drive the agent to iterate longer than vague instructions like "until it's done."
- Burke Holland describes a custom Copilot agent called Anvil that classifies tasks as easy/medium/hard and changes the workflow accordingly, including use of sub-agents and multiple models for large refactors.
Personal Software Displacing Small Saas Purchases
- Adam Stacoviak argues that critiques of agents failing at "software" often miss that different classes of software have different benchmarks, with personal/toy software differing from SLA-bound production systems.
- Adam Stacoviak reports he replaced an expensive invoicing service by letting an agent run unattended overnight to produce a working Rails app with authentication, invoices, email, and PDFs by morning.
- Burke Holland reports he built an iOS app in an afternoon that generates Facebook post captions from sign photos using Gemini image analysis and conditions generation on the last ten accepted captions.
- Burke Holland reports he replaced a paid driving-routing app by building a personal routing app and loading it directly onto a phone without App Store distribution.
Economics And Packaging: Request-Based Billing, Subsidization Risk, And Pass-Through Margins
- Chris Kelly disputes media narratives equating fast revenue growth with business health when that growth is driven by pass-through payments to model providers.
- Burke Holland asserts Copilot is priced around $40/month using request-based billing of about 1,500 premium requests per month, where a request may trigger many tool actions.
- Chris Kelly asserts that selling AI coding assistants as discounted tokens can produce rapid revenue growth while sending most or more of the money to model providers.
- Burke Holland asserts that Claude Code at $200/month can be used at extremely high volume (on the order of a billion tokens per month) and that this level of subsidization is not sustainable long-term.
Watchlist
- Burke is exploring workflows where an agent works unattended and pings him (e.g., via Telegram) for decisions, but he emphasizes this requires strong checks and balances to avoid constant babysitting.
- Burke warns that widespread AI code generation may devalue visible technical accomplishments and contribute to a loss of craftsmanship, raising the risk that the industry forgets how to build 'cathedral'-quality software.
- Burke Holland warns that heavy agent use can become all-consuming and psychologically destabilizing, implying a need to monitor how agentic workflows affect attention, discipline, and well-being.
Unknowns
- What objective benchmark or task-success data supports the claim that Opus 4.5 is a step-function improvement versus prior models for common dev tasks (web, native, refactors)?
- What are the current official Copilot pricing tiers and precise definitions/limits of "premium requests" and how they map to tool actions?
- Are Claude Code and comparable offerings meaningfully subsidized at current prices, and what policy or pricing changes would signal a shift toward cost being a binding constraint?
- What quantitative evidence exists on quality outcomes (defects, regressions, security issues) for agent-generated code in reliability-critical products versus traditional workflows?
- What tooling or practices best close the verification gap for native apps and non-browser contexts when agents generate large changes?