Rosa Del Mar

Daily Brief

Issue 75 2026-03-16

Agent-Assisted Data Analysis Packaged As End-To-End Training

Issue 75 Edition 2026-03-16 5 min read
General
Sources: 1 • Confidence: High • Updated: 2026-04-13 03:49

Key takeaways

  • A three-hour NICAR 2026 workshop titled "Coding agents for data analysis" was delivered for data journalists, and a handout for it was prepared.
  • The workshop exercises used Python and SQLite, and some exercises used Datasette.
  • Workshop participants collectively used 23 USD worth of Codex tokens.
  • The workshop demonstrates using Claude Code and OpenAI Codex to explore, analyze, and clean data.
  • The handout includes modules on setting up Claude Code and Codex, querying a database, exploring and cleaning data, creating visualizations, and scraping data with agents.

Sections

Agent-Assisted Data Analysis Packaged As End-To-End Training

  • A three-hour NICAR 2026 workshop titled "Coding agents for data analysis" was delivered for data journalists, and a handout for it was prepared.
  • The workshop demonstrates using Claude Code and OpenAI Codex to explore, analyze, and clean data.
  • The handout includes modules on setting up Claude Code and Codex, querying a database, exploring and cleaning data, creating visualizations, and scraping data with agents.
  • The handout was designed to be useful to non-attendees and was expected by the author to generalize beyond data journalism to anyone exploring data.

File-Based Visualization Prototyping Loop Integrated With A Running Data App

  • The workshop exercises used Python and SQLite, and some exercises used Datasette.
  • A highlighted workflow configured Datasette to serve static content from a visualization folder while Claude Code iteratively generated interactive visualizations directly into that folder.
  • Claude Code generated a heat map visualization for a trees database using Leaflet and Leaflet.heat.

Operational Rollout Pattern With Spend Governance For Hands-On Agent Usage

  • Workshop participants collectively used 23 USD worth of Codex tokens.
  • The workshop used GitHub Codespaces and OpenAI Codex to distribute a budget-restricted Codex API key to attendees as a cost-control and ease-of-distribution approach.

Unknowns

  • How many attendees participated, and what was the per-attendee and per-exercise token spend distribution underlying the 23 USD total?
  • What was the measured effect on task completion time, correctness, or rework rates when using Claude Code/Codex versus a non-agent baseline for the same exercises?
  • What guardrails were used to prevent data leakage or unsafe actions (e.g., secrets handling, scraping constraints, database access limits) in the Codespaces + API key setup?
  • How maintainable and reviewable was the agent-generated visualization code over multiple iterations (dependency management, code structure, performance on larger datasets)?
  • To what extent has the handout been adopted or reused by non-journalism audiences, and what feedback indicates successful transfer beyond the original workshop?

Investor overlay

Read-throughs

  • Demand signal for agent-assisted coding tools in data analysis training, with practical curricula spanning setup, querying, cleaning, visualization, and scraping. Could reflect broader willingness to pay for structured enablement content beyond ad hoc code assistance.
  • Operational pattern for controlled enterprise rollouts of agent tools using managed environments and budget-restricted API keys. Suggests spend governance is a key adoption enabler for hands-on training deployments.
  • Agent-generated visualization code integrated into a running lightweight data app workflow. Indicates agents may drive faster prototyping of deployable analytics front ends, potentially increasing usage of developer tools that support app-oriented iteration.

What would confirm

  • Evidence of reuse of the handout beyond the original workshop, such as adoption by non-journalism groups or repeat trainings, indicating transferability of the curriculum.
  • Measured outcomes versus a non-agent baseline for the same exercises, such as reduced completion time or fewer errors and rework when using Claude Code and Codex.
  • Expanded rollout details showing repeated use of the Codespaces plus budget-restricted API key approach, with larger participant counts or sustained spend tracking across sessions.

What would kill

  • No measurable productivity or quality improvement versus a non-agent baseline, or increased rework due to agent-generated code requiring heavy manual review.
  • Governance or safety issues in the training setup, such as data leakage, unsafe scraping behavior, or inability to enforce API cost limits and access controls.
  • Low adoption or negative feedback indicating the workflow is too fragile or not maintainable, especially for visualization code over multiple iterations or larger datasets.

Sources

  1. 2026-03-16 simonwillison.net