Rosa Del Mar

Daily Brief

Issue 56 2026-02-25

Alignment Monitorability And Chain Of Thought Risks

  • Chain-of-thought is human-readable primarily because models have a strong prior for English, not because training explicitly reinforces interpretability of those tokens.
  • OpenAI states that data connected in ChatGPT Health is not used to train its foundation models.
  • OpenAI created HealthBench, built from about 5,000 realistic health conversations with approximately 49,000 physician-authored evaluation criteria spanning many performance axes.

Tests As Spec And Ai Accelerated Reimplementation Risk

  • A comprehensive test suite can enable a fresh reimplementation of an open-source library from scratch, potentially in a different language.
  • tldraw is described as not technically open source because its custom license requires a commercial license for production use.
  • An issue that seemed to indicate tldraw would move its test suite to a private repository was later revealed to have been intended as a joke.

Tests As Behavioral Spec And Cloning Risk

  • A comprehensive public test suite can enable a fresh reimplementation of an open-source library from scratch, potentially in a different programming language.
  • tldraw filed a joke issue proposing translating its source code to Traditional Chinese as a purported defense against external AI coding agents replicating the project.
  • A tldraw maintainer argued that moving tests to another repository would complicate and slow development, and that development speed is a higher priority.

Tests As Behavioral Spec And Replication Surface

  • A comprehensive public test suite can enable a fresh reimplementation of an open-source library from scratch (including in a different programming language) by functioning as a behavioral specification.
  • tldraw filed a joke issue proposing translating its source code to Traditional Chinese as a purported defense against external AI coding agents replicating the project.
  • An issue that appeared to indicate tldraw would move its test suite to a private repository was later revealed to have been intended as a joke.

Closed Loop Materials Discovery Compute Plus Experiment

  • Max Welling proposes treating physical experiments as a "physics processing unit" (nature-as-compute) that should be integrated with data-center computation in a materials-discovery workflow.
  • Max Welling states CuspAI was started about 20 months prior to the interview to develop technology for carbon dioxide removal motivated by climate-change concerns.
  • Max Welling describes equivariance as hard-coding symmetry constraints (such as rotations, translations, permutations) into neural network weights to improve generalization across transformed inputs with less data.

Claude Code Remote Control: New Cross-Device Capability And Activation Path

  • Claude Code supports starting a remote control session on a computer and sending prompts to it from Claude Code web interfaces including web, iOS, and the native desktop app.
  • The author reported seeing an error that remote control was not enabled for their account despite being their own administrator, and the issue resolved after logging out and back into the Claude Code terminal app.
  • Claude Code does not have a documented mechanism for running tasks on a schedule.

Scheduling Capability Split Across Claude Code And Cowork (And Its Local Availability Limits)

  • Claude Code does not have a documented mechanism for running tasks on a schedule.
  • Claude Code supports starting a remote control session on a computer and sending prompts to it from Claude Code web interfaces including web, iOS, and the native desktop app.
  • A user may see an error that remote control is not enabled for their account even when they are their own administrator, and logging out and back into the Claude Code terminal app can resolve it.

Cross-Device Remote Control For Local Machines

  • Claude Code supports starting a remote control session on a computer and sending prompts to it from Claude Code web interfaces including web, iOS, and the native desktop app.
  • Claude Code does not have a documented mechanism for running tasks on a schedule.
  • A user encountered an error stating remote control was not enabled for their account, and the issue resolved after logging out and back into the Claude Code terminal app.

Iran Objectives-To-Tools Framework And Escalation Risk

  • H.R. McMaster claimed Iranian elites are moving billions of dollars out of the country and that the U.S. Treasury is tracking these flows with intent to recover assets later for the Iranian people.
  • John Cochrane attributed to Alex Tabarrok the argument that emergency-power statutes omit tariff authority because tariffs are easily abused and selectively applied, whereas emergency actions are intended to be blunt and visibly costly tools.
  • Niall Ferguson claimed the macroeconomic fallout from Trump-era tariff increases has been modest so far, citing no recession and no major stock-market selloff despite a rise in tariff rates.

Tariff Authority Constraints, Substitution Across Statutes, And Deadline-Driven Policy Volatility

  • John Cochrane reports an argument attributed to Alex Tabarrok that emergency-power statutes omit tariff authority because tariffs are easily abused and selectively applied, while emergency measures are intended to be blunt and visibly costly tools that discourage opportunistic emergency declarations.
  • In the discussion, H.R. McMaster presents an Iran decision framework that starts by selecting clear objectives and then integrating military action with diplomatic and financial/economic pressure.
  • Some panelists expect the Epstein scandal to escalate further, while John Cochrane expects it to largely blow over due to news-cycle dynamics and limited illegality in many associations.

Reliability-Driven Requirements And Minimal Data Model

  • A prior presentation workflow was to open a browser window with one tab per web page and advance through the tabs as the deck.
  • A new macOS presentation app was built using vibe coding the night before a talk, with the build time described as approximately 45 minutes.
  • Present added remote control implemented as a web server listening on 0.0.0.0:9123 that serves a mobile-friendly page with controls for slide navigation and starting/stopping the presentation.

Url As Slide Deck And Constrained Editor Model

  • The author sometimes presents using a browser window with one tab per page and advances through tabs.
  • The author stated the talk was delivered using the new macOS app they built in approximately 45 minutes the night before.
  • The app gained a remote-control feature implemented as a web server listening on 0.0.0.0:9123 that serves a mobile-friendly page with left/right buttons and a start/stop presentation toggle.

Presentation Reliability Via Constrained Format And Recovery

  • The author sometimes presents by opening a browser window with one tab per web page and advancing through the tabs.
  • The author delivered the talk using the new macOS app he built in approximately 45 minutes the night before.
  • Present gained a remote-control feature implemented as a web server listening on 0.0.0.0:9123 that serves a mobile-friendly page with left/right buttons and a start/stop presentation toggle.

Leansig Core Construction And Validator Operational Fit

  • LeanSig constructs an L-time signature by committing many one-time public keys in a Merkle tree and attaching the appropriate one-time signature plus Merkle authentication path for each message index.
  • A proposed improvement called 'Top of the Hypercube' may yield both smaller and faster-to-verify signatures, but its encoding step remains difficult to implement efficiently inside SNARK circuits.
  • LeanSig is a hash-based post-quantum signature scheme proposed as a replacement for BLS in Ethereum consensus.

Data-Center Load Growth, Grid Bottlenecks, And Self-Generation/Off-Grid Pathways

  • A key watch item is how much new large load (especially data centers) will choose to go fully off-grid with no intent to interconnect.
  • Self-driving cars are already operating in several cities.
  • Over the next decade, it is uncertain whether utility reform, permitting reform, and transmission buildout will reduce electricity prices in an enduring way.

Electrification Durability Vs Affordability Politics (And Industrial Crowding-Out Risk From Data Centers)

  • Within the next 5 to 10 years, the trajectory of industrial electrification as a viable decarbonization pathway will become clearer and could appear to stall or fail.
  • A key 5-to-10-year uncertainty is how much new large load, especially data centers, will choose to be fully off-grid with no intent to interconnect to the grid.
  • In fast-growing battery demand regimes, recycled material supply reflects demand from roughly a decade earlier and therefore cannot fully solve near-term mineral supply constraints until growth plateaus.

Aggressive Etf Structures Derivatives Feedback And Regulatory Boundary Testing

  • Multiple issuers repeatedly filed for forex-style versions of sector ETFs (and forex-style Bitcoin and Ethereum products), and the SEC stopped them each time.
  • US spot Bitcoin ETFs took in nearly $30B between the April sell-off bottom and October.
  • Fourth-quarter 13F data showed investment advisors were net sellers of Bitcoin ETF exposure, selling about 22,000 BTC in aggregate.

Short Thesis Discovery And Validation (Business Model First)

  • Roberts says successful short-selling research requires validating real-world business prospects by understanding the business model, competition, and customers through on-the-ground checks.
  • Roberts says retail short sellers are vulnerable to meme-stock style squeezes because they often concentrate in crowded, well-known shorts, and he advises generally avoiding popular shorts.
  • Roberts says he runs a discretionary long-volatility hedge using S&P 500 options only when he believes there is an opportunity to capture at least a 5% market decline.

Etf Product Proliferation And Derivatives-Driven Market-Structure Risk

  • Multiple issuers repeatedly file for forex-style versions of sector ETFs (and forex-style Bitcoin and Ethereum products) and the SEC stops them each time.
  • US spot Bitcoin ETFs took in nearly $30B between the April sell-off bottom and October.
  • There is growing issuer and investor pressure to package private assets (private equity and private credit) into the ETF wrapper, including a product (XOVR) that obtains SpaceX exposure via an SPV.

Ai-Enabled Offensive Scaling And Shortened Exploitation Cycles

  • Akamai reported a CVSS 8.8 Internet Explorer/MSHTML (Trident) exploit chain that bypasses Mark-of-the-Web and IE security controls and has been observed exploited in the wild by Russian actors.
  • Defenders are moving from monolithic LLM usage toward agentic decomposition of investigations, which reduces hallucination risk and shifts limiting factors to data quality, agent architecture, and workflow-embedded expertise.
  • The Pentagon is reportedly standing up a new AI network/program with multiple frontier labs participating and Anthropic as the final holdout facing a near-term deadline to join.

Vendor And Platform Risk: Supply Chain, Ownership Incentives, And State Pressure On Communications/Identity Infrastructure

  • A viral blog post claimed Persona sends face-scan data to the government and is closely tied to ICE, and the hosts characterized the post as exaggerated and largely unsupported based on infrastructure fingerprinting.
  • Akamai reported a CVSS 8.8 Internet Explorer/MSHTML exploit chain that bypasses Mark-of-the-Web and IE security controls and has been observed exploited in the wild by Russian actors.
  • Defenders are moving from monolithic LLM usage toward decomposed agentic investigations, reducing hallucination risk and shifting limiting factors to data quality, agent architecture, and workflow-embedded expertise.

Tech-Workforce Sentiment And Cohort Motivation (Agency Vs Enjoyment/Job Security)

  • In the passage attributed to Kellan Elliott-McCrea, it is stated that some people who entered technology in the last couple of decades primarily for a good job or because they enjoyed coding are now experiencing a feeling of loss about the current moment.
  • In the passage attributed to Kellan Elliott-McCrea, it is argued that the web can be simultaneously considered objectively awful as a technology and genuinely amazing in impact or experience.
  • The corpus attributes the quoted passage to Kellan Elliott-McCrea from “Code has always been the easy part.”

Tech-Worker Sentiment And Cohort Motivation (Agency Vs Enjoyment/Job Security)

  • In the cited passage, Kellan Elliott-McCrea states that some people who entered technology in the last couple of decades primarily for a good job or because they enjoyed coding are now experiencing a real feeling of loss about the current moment.
  • In the cited passage, Kellan Elliott-McCrea asserts that the web can be simultaneously considered objectively awful as a technology and genuinely amazing in impact or experience.
  • The corpus attributes the passage to Kellan Elliott-McCrea from the piece titled "Code has always been the easy part."

Tech-Workforce Sentiment And Cohort Motivation

  • In the passage attributed to Kellan Elliott-McCrea, it is asserted that some people who entered technology in the last couple of decades mainly for a good job or because they enjoyed coding are now experiencing a real feeling of loss about the current moment.
  • The passage presents the view that the web can be considered both objectively awful as a technology and genuinely amazing in impact or experience at the same time.
  • The corpus attributes the passage to Kellan Elliott-McCrea from the piece titled "Code has always been the easy part".

Linear Walkthrough Prompting As A Repeatable Documentation/Comprehension Workflow

  • Frontier models paired with an appropriate agent harness can generate detailed, step-by-step walkthroughs that explain how code works.
  • Showboat is a tool built by the author to help coding agents write documents demonstrating their work, and its help output is designed to be sufficient for a model to use the tool.
  • The author built a SwiftUI slide presentation app using Claude Code and Opus 4.6 and later found they did not understand how the generated code worked.

Grounded, Reproducible In-Repo Documentation Via Agent-Friendly Tooling

  • Showboat is a tool the author built to help coding agents write documents demonstrating their work, and its help output is designed to be sufficient for a model to use the tool.
  • Frontier models paired with an appropriate agent harness can generate detailed, step-by-step walkthroughs that explain how code works.
  • The author used Claude Code and Opus 4.6 to vibe code a SwiftUI slide presentation app and later found they did not understand how the generated code worked.

Ai-Assisted Codebase Comprehension Via Linear Walkthroughs

  • Frontier models paired with an appropriate agent harness can generate detailed, step-by-step walkthroughs that explain how a codebase works.
  • Showboat is a tool built to help coding agents write documents demonstrating their work, and its help output is designed to be sufficient for a model to use the tool.
  • The author used Claude Code and Opus 4.6 to vibe-code a SwiftUI slide presentation app and later found they did not understand how the generated code worked.

Serving And Platform Patterns Kubernetes Disaggregation Routing

  • Kubernetes has no fundamental technical limitations for running AI data systems like vector databases, but it has significant usability and psychological adoption barriers.
  • Enterprise AI sovereignty is defined as the ability to control operations, infrastructure, and data while meeting jurisdiction-specific compliance requirements, including geographic constraints on data and staffing.
  • A major gap in current AI tooling and training is practical access to GPUs, which limits who can participate in generative and post-transformer innovation.