Workflow-Primitives-For-Multi-Agent-Delivery
Sources: 1 • Confidence: Medium • Updated: 2026-04-12 09:53
Key takeaways
- A tool called Prism accelerates code review by running parallel specialized agents focused on areas such as security, architecture, and style to support faster human review.
- By late 2025, the described AI-assisted development workflow used many parallel agents producing code while humans primarily reviewed and unblocked them.
- By early 2026, manually managing many agent sessions hit limits due to frequent context switching to review progress and keep agents unblocked.
- Using this agent-centric workflow, a five-person engineering team shipped about 200 features per month.
- In use, Gastown exhibited destabilizing behaviors including oddly named branches, unexpected commit identities, and opening or reopening pull requests without explicit requests.
Sections
Workflow-Primitives-For-Multi-Agent-Delivery
- A tool called Prism accelerates code review by running parallel specialized agents focused on areas such as security, architecture, and style to support faster human review.
- Gastown enabled describing multiple tasks, dispatching them for implementation, viewing status, and jumping to stuck agents from a single window.
- A tool called Beantown dispatches work by pulling tickets from Linear, splitting them into agent-sized specs, and assigning them to available agent workers.
- A tool called Lux provides simpler Gastown-inspired primitives that allow customization and extension of how groups of agents coordinate on shared goals.
- Using multiple agents to design a feature or review the same pull request can produce more comprehensive results because different agents catch different classes of issues.
Abstraction-Shift-To-Agent-Orchestration
- By late 2025, the described AI-assisted development workflow used many parallel agents producing code while humans primarily reviewed and unblocked them.
- Recent software engineering progress has extended the abstraction stack to include abstracting the act of programming itself.
- The team concluded they needed an integrated 'apiary' to track work centrally, coordinate multiple agents toward shared goals, run multiple goals in parallel, and review efficiently.
- The author argues that in 2026 the main frontier is infrastructure around coding agents rather than the agents themselves, and that no one has fully solved the 'apiary' yet.
Scaling-Bottleneck-Human-Attention-And-Session-Management
- By early 2026, manually managing many agent sessions hit limits due to frequent context switching to review progress and keep agents unblocked.
- The team concluded they needed an integrated 'apiary' to track work centrally, coordinate multiple agents toward shared goals, run multiple goals in parallel, and review efficiently.
- The team identified bottlenecks in task management, agent management, and review management and used agents to build improved internal tooling for these bottlenecks.
- A tool called Coal Harbour manages the cross-product of features, worktrees, terminals, and agents in a single multiplexing application.
Throughput-Claims-And-Tooling-Ceilings
- Using this agent-centric workflow, a five-person engineering team shipped about 200 features per month.
- To pursue roughly 800 features per month, the team concluded existing tooling was insufficient and began building custom infrastructure.
- Some internal agent-management tools are intended for external release and collectively helped scale operations from 'beehives' to 'apiaries'.
Governance-And-Loss-Of-Control-Inside-Dev-Infrastructure
- In use, Gastown exhibited destabilizing behaviors including oddly named branches, unexpected commit identities, and opening or reopening pull requests without explicit requests.
- Although Gastown was not a fit for the team, it demonstrated what a larger-scale agent organization and coordination layer could look like.
Watchlist
- In use, Gastown exhibited destabilizing behaviors including oddly named branches, unexpected commit identities, and opening or reopening pull requests without explicit requests.
Unknowns
- What operational definition of 'feature' is used in the throughput claims, and what is the distribution of feature sizes/complexity?
- What were the defect rates, rework rates, incident rates, and review-time metrics associated with the high-parallelism workflow over multiple months?
- What were the compute, tooling, and human-time costs (including coordination overhead) required to achieve the claimed throughput?
- How generalizable is the 'beekeeping to apiary' workflow across different codebases (legacy vs greenfield), compliance environments, and team skill levels?
- What specific controls (approvals, allowlists, audit logs) were in place or missing when the tool produced unexpected Git actions, and what mitigations were effective?