Vm-Based Desktop Agent Architecture For Safety And Enterprise Deployability

Issue 76 Edition 2026-03-17 11 min read

General

Sources: 1 • Confidence: Medium • Updated: 2026-03-18 14:31

Key takeaways

The product team is actively weighing whether 'your computer' for Claude should be the local machine, a local VM, or a remote computer elsewhere.
Skill sharing for general knowledge workers remains an unsolved UX problem because GitHub-repository workflows are too technical for much of the target user base.
Claude Cowork is positioned as a superset of Claude Code rather than a dumbed-down version because it is highly extensible and workflow-integrable.
Felix is uncertain about the best product model for agentic computer use, weighing options such as a dedicated Claude-owned computer, opportunistic takeover when the user steps away, or a separate cloud-hosted computer.
Felix believes there is model overhang where models are more capable than current scaffolding and user workflows allow, and he is leaning toward adding safe capabilities and waiting for better models rather than heavy scaffolding fixes.

Sections

Vm-Based Desktop Agent Architecture For Safety And Enterprise Deployability

The product team is actively weighing whether 'your computer' for Claude should be the local machine, a local VM, or a remote computer elsewhere.
On Windows, Claude Cowork runs its VM using the Windows Host Compute System, the same subsystem used by WSL2.
Felix claims reports that Claude Cowork takes 10GB on macOS are misleading because macOS storage display can be confusing and the VM image storage collapses empty space on disk.
Felix argues that a useful AI entity must have access to the same tools a user has on their local machine and that Silicon Valley undervalues the local computer.
A key reason for the VM approach is to make Claude effective on default enterprise laptops that may lack Python or Node and may forbid installing untrusted software.
Felix argues prompting users to approve scripts is not scalable because users either cannot evaluate safety or will not read code once it becomes routine.

Integration Strategy: Browser-First Automation And File-Based Skill Portability

Skill sharing for general knowledge workers remains an unsolved UX problem because GitHub-repository workflows are too technical for much of the target user base.
The industry has not yet solved how to separate the portable parts of a skill from user-specific private preferences in a clean way.
Claude Cowork becomes more effective when it can directly see the user's working context via a built-in browser or Chrome integration that can inspect the DOM and page state.
In Claude Cowork, skills are implemented as file-based artifacts (such as plain text or Markdown in folders) to make them inherently portable rather than proprietary in-product objects.
Claude Cowork can install plugins by pointing at a GitHub repository that functions as a skills/plugin marketplace source.
Claude Cowork leverages tight integration with Claude and Chrome via a sub-agent to execute tasks, partly to avoid the setup and limitations of many MCP connectors.

Product Trigger, Positioning, And Iteration Tempo

Claude Cowork is positioned as a superset of Claude Code rather than a dumbed-down version because it is highly extensible and workflow-integrable.
Anthropic observed Claude Code increasingly being used by non-technical users for non-coding workloads such as expenses, receipts, and knowledge-base organization.
Claude Cowork was assembled by selecting and combining components from multiple internal prototypes rather than built entirely from scratch.
Anthropic increasingly prefers building multiple candidate implementations quickly and testing with users rather than writing extensive specs or committing early to a single path.
Felix expects Claude Cowork to ship frequent iterations, often weekly, and to double down on making both the user and Claude more effective on the user's computer.
Felix expects the product to move users from question-answering toward delegating larger and longer tasks where Claude operates more independently.

Human-Agent Interaction Constraints, Trust Boundaries, And Collaboration Models

Felix is uncertain about the best product model for agentic computer use, weighing options such as a dedicated Claude-owned computer, opportunistic takeover when the user steps away, or a separate cloud-hosted computer.
Remote control functionality for Claude Cowork is described as coming soon but is not yet available.
Felix says proximity-based skill sharing between nearby computers using Bluetooth LE could be powerful but may feel creepy and is therefore unlikely to ship.
Felix argues AI agents generally should not be booking flights, despite flight booking being a common agent demo.
Felix argues a simultaneous human-and-Claude 'second cursor' control model at the OS layer is impractical because operating systems assume a single foreground actor.
For multi-agent workflows, Felix is uncertain whether to build custom agent-to-agent scaffolding or to give agents standard identities (such as Gmail or Slack accounts) and let them interact through existing collaboration tools.

Evaluation And Scaffolding For Knowledge-Work Tasks

Felix believes there is model overhang where models are more capable than current scaffolding and user workflows allow, and he is leaning toward adding safe capabilities and waiting for better models rather than heavy scaffolding fixes.
Anthropic evaluates Claude Code primarily on coding tasks and evaluates Claude Cowork on knowledge-work tasks such as finance or legal workflows, adjusting system prompts and tools accordingly.
For longer and more ambiguous tasks, Claude Cowork is steered to use planning and ask-user-question tools to reduce the risk of spending hours on the wrong work.
Anthropic's Claude Cowork evals replay full transcripts including tool availability and measure both token outputs and file outputs under different tweaks.
Felix recommends users avoid over-engineering prompts and skills and instead state the desired outcome because newer models can often infer the method.

Watchlist

Felix believes there is model overhang where models are more capable than current scaffolding and user workflows allow, and he is leaning toward adding safe capabilities and waiting for better models rather than heavy scaffolding fixes.
Skill sharing for general knowledge workers remains an unsolved UX problem because GitHub-repository workflows are too technical for much of the target user base.
The industry has not yet solved how to separate the portable parts of a skill from user-specific private preferences in a clean way.
Felix anticipates a possible future acceleration when agents can help train models by operating ML tooling such as TensorBoard and experiment dashboards.
Felix is uncertain about the best product model for agentic computer use, weighing options such as a dedicated Claude-owned computer, opportunistic takeover when the user steps away, or a separate cloud-hosted computer.
The product team is actively weighing whether 'your computer' for Claude should be the local machine, a local VM, or a remote computer elsewhere.
Remote control functionality for Claude Cowork is described as coming soon but is not yet available.
Felix is watching for a point where models can produce highly optimized native apps such that Electron becomes unnecessary, and he says this is not yet achievable.

Unknowns

What are Claude Cowork’s real-world task success rates, time-to-completion, and error modes across representative knowledge-work workflows (finance/legal/ops) versus alternative approaches?
What is the actual adoption/retention profile for non-technical users and which onboarding flows convert from small automations to sustained delegation?
What are the concrete security guarantees and enforcement mechanisms of the VM sandbox (filesystem boundaries, credential handling, logging/auditing, policy controls), and how do they perform under adversarial or accidental misuse?
How costly is the VM approach in practice (startup time, CPU/RAM footprint, battery impact) across typical enterprise hardware, and what optimization roadmap exists?
Will Anthropic productize a clear answer to where the agent runs (local machine vs local VM vs remote hosted), and what triggers that decision?

Investor overlay

Read-throughs

VM sandboxed desktop agents could unlock enterprise deployments by improving containment, compatibility with locked down laptops, and governance relative to direct local control, creating a path to broader paid adoption if performance and trust hold.
Browser first automation and file based skills may reduce reliance on third party connectors, shifting value toward the agent platform and its extensibility, but success depends on making skill sharing usable for non technical workers.
If agents can run longer delegated knowledge work tasks with strong evaluation and auditing, that could expand use beyond developers, but hinges on resolving where the agent runs and delivering reliable remote control and collaboration models.

What would confirm

Published or demonstrated enterprise grade security guarantees for the VM approach, including filesystem boundaries, credential handling, auditing, and policy controls, plus evidence they hold under misuse scenarios.
Clear product decision and rollout on where the agent runs local machine vs local VM vs remote hosted, with enterprise governance and acceptable latency, and remote control functionality actually shipped.
Task level metrics for knowledge work workflows showing success rates, time to completion, and error modes, plus retention for non technical users and evidence that skill sharing and discovery works beyond GitHub workflows.

What would kill

VM approach proves too costly on typical enterprise hardware, with unacceptable startup time, CPU and RAM footprint, or battery impact, and no credible optimization roadmap.
Security and trust issues emerge, such as weak credential handling or insufficient logging and policy enforcement, leading enterprises to block deployments or require per action approvals that do not scale.
Non technical adoption stalls because skill portability and sharing remain too technical, and delegated tasks frequently go wrong despite planning and clarification tools, preventing sustained expansion beyond developer use cases.

Sources

Why Anthropic Thinks AI Should Have Its Own Computer — Felix Rieseberg of Claude Cowork & Claude Code Desktop

2026-03-17 latent.space