Vm-Based Desktop Agent Architecture For Safety And Enterprise Deployability
Sources: 1 • Confidence: Medium • Updated: 2026-03-18 14:31
Key takeaways
- The product team is actively weighing whether 'your computer' for Claude should be the local machine, a local VM, or a remote computer elsewhere.
- Skill sharing for general knowledge workers remains an unsolved UX problem because GitHub-repository workflows are too technical for much of the target user base.
- Claude Cowork is positioned as a superset of Claude Code rather than a dumbed-down version because it is highly extensible and workflow-integrable.
- Felix is uncertain about the best product model for agentic computer use, weighing options such as a dedicated Claude-owned computer, opportunistic takeover when the user steps away, or a separate cloud-hosted computer.
- Felix believes there is model overhang where models are more capable than current scaffolding and user workflows allow, and he is leaning toward adding safe capabilities and waiting for better models rather than heavy scaffolding fixes.
Sections
Vm-Based Desktop Agent Architecture For Safety And Enterprise Deployability
- The product team is actively weighing whether 'your computer' for Claude should be the local machine, a local VM, or a remote computer elsewhere.
- On Windows, Claude Cowork runs its VM using the Windows Host Compute System, the same subsystem used by WSL2.
- Felix claims reports that Claude Cowork takes 10GB on macOS are misleading because macOS storage display can be confusing and the VM image storage collapses empty space on disk.
- Felix argues that a useful AI entity must have access to the same tools a user has on their local machine and that Silicon Valley undervalues the local computer.
- A key reason for the VM approach is to make Claude effective on default enterprise laptops that may lack Python or Node and may forbid installing untrusted software.
- Felix argues prompting users to approve scripts is not scalable because users either cannot evaluate safety or will not read code once it becomes routine.
Integration Strategy: Browser-First Automation And File-Based Skill Portability
- Skill sharing for general knowledge workers remains an unsolved UX problem because GitHub-repository workflows are too technical for much of the target user base.
- The industry has not yet solved how to separate the portable parts of a skill from user-specific private preferences in a clean way.
- Claude Cowork becomes more effective when it can directly see the user's working context via a built-in browser or Chrome integration that can inspect the DOM and page state.
- In Claude Cowork, skills are implemented as file-based artifacts (such as plain text or Markdown in folders) to make them inherently portable rather than proprietary in-product objects.
- Claude Cowork can install plugins by pointing at a GitHub repository that functions as a skills/plugin marketplace source.
- Claude Cowork leverages tight integration with Claude and Chrome via a sub-agent to execute tasks, partly to avoid the setup and limitations of many MCP connectors.
Product Trigger, Positioning, And Iteration Tempo
- Claude Cowork is positioned as a superset of Claude Code rather than a dumbed-down version because it is highly extensible and workflow-integrable.
- Anthropic observed Claude Code increasingly being used by non-technical users for non-coding workloads such as expenses, receipts, and knowledge-base organization.
- Claude Cowork was assembled by selecting and combining components from multiple internal prototypes rather than built entirely from scratch.
- Anthropic increasingly prefers building multiple candidate implementations quickly and testing with users rather than writing extensive specs or committing early to a single path.
- Felix expects Claude Cowork to ship frequent iterations, often weekly, and to double down on making both the user and Claude more effective on the user's computer.
- Felix expects the product to move users from question-answering toward delegating larger and longer tasks where Claude operates more independently.
Human-Agent Interaction Constraints, Trust Boundaries, And Collaboration Models
- Felix is uncertain about the best product model for agentic computer use, weighing options such as a dedicated Claude-owned computer, opportunistic takeover when the user steps away, or a separate cloud-hosted computer.
- Remote control functionality for Claude Cowork is described as coming soon but is not yet available.
- Felix says proximity-based skill sharing between nearby computers using Bluetooth LE could be powerful but may feel creepy and is therefore unlikely to ship.
- Felix argues AI agents generally should not be booking flights, despite flight booking being a common agent demo.
- Felix argues a simultaneous human-and-Claude 'second cursor' control model at the OS layer is impractical because operating systems assume a single foreground actor.
- For multi-agent workflows, Felix is uncertain whether to build custom agent-to-agent scaffolding or to give agents standard identities (such as Gmail or Slack accounts) and let them interact through existing collaboration tools.
Evaluation And Scaffolding For Knowledge-Work Tasks
- Felix believes there is model overhang where models are more capable than current scaffolding and user workflows allow, and he is leaning toward adding safe capabilities and waiting for better models rather than heavy scaffolding fixes.
- Anthropic evaluates Claude Code primarily on coding tasks and evaluates Claude Cowork on knowledge-work tasks such as finance or legal workflows, adjusting system prompts and tools accordingly.
- For longer and more ambiguous tasks, Claude Cowork is steered to use planning and ask-user-question tools to reduce the risk of spending hours on the wrong work.
- Anthropic's Claude Cowork evals replay full transcripts including tool availability and measure both token outputs and file outputs under different tweaks.
- Felix recommends users avoid over-engineering prompts and skills and instead state the desired outcome because newer models can often infer the method.
Watchlist
- Felix believes there is model overhang where models are more capable than current scaffolding and user workflows allow, and he is leaning toward adding safe capabilities and waiting for better models rather than heavy scaffolding fixes.
- Skill sharing for general knowledge workers remains an unsolved UX problem because GitHub-repository workflows are too technical for much of the target user base.
- The industry has not yet solved how to separate the portable parts of a skill from user-specific private preferences in a clean way.
- Felix anticipates a possible future acceleration when agents can help train models by operating ML tooling such as TensorBoard and experiment dashboards.
- Felix is uncertain about the best product model for agentic computer use, weighing options such as a dedicated Claude-owned computer, opportunistic takeover when the user steps away, or a separate cloud-hosted computer.
- The product team is actively weighing whether 'your computer' for Claude should be the local machine, a local VM, or a remote computer elsewhere.
- Remote control functionality for Claude Cowork is described as coming soon but is not yet available.
- Felix is watching for a point where models can produce highly optimized native apps such that Electron becomes unnecessary, and he says this is not yet achievable.
Unknowns
- What are Claude Cowork’s real-world task success rates, time-to-completion, and error modes across representative knowledge-work workflows (finance/legal/ops) versus alternative approaches?
- What is the actual adoption/retention profile for non-technical users and which onboarding flows convert from small automations to sustained delegation?
- What are the concrete security guarantees and enforcement mechanisms of the VM sandbox (filesystem boundaries, credential handling, logging/auditing, policy controls), and how do they perform under adversarial or accidental misuse?
- How costly is the VM approach in practice (startup time, CPU/RAM footprint, battery impact) across typical enterprise hardware, and what optimization roadmap exists?
- Will Anthropic productize a clear answer to where the agent runs (local machine vs local VM vs remote hosted), and what triggers that decision?