Rosa Del Mar

Daily Brief

Issue 75 2026-03-16

End-To-End Agent-Assisted Data Analysis Workflow Packaging

Issue 75 Edition 2026-03-16 5 min read
General
Sources: 1 • Confidence: High • Updated: 2026-03-17 15:15

Key takeaways

  • A three-hour NICAR 2026 workshop titled "Coding agents for data analysis" was delivered for data journalists, and a handout was prepared for it.
  • Total Codex token spend by workshop participants was 23 US dollars.
  • A highlighted workflow configured Datasette to serve static content from a visualization folder, then used Claude Code to iteratively create interactive visualizations directly in that folder.
  • The handout was designed to be useful to people who did not attend the workshop, and the author expects it to apply beyond data journalism to anyone exploring data.
  • The workshop demonstrated using Claude Code and OpenAI Codex to explore, analyze, and clean data.

Sections

End-To-End Agent-Assisted Data Analysis Workflow Packaging

  • A three-hour NICAR 2026 workshop titled "Coding agents for data analysis" was delivered for data journalists, and a handout was prepared for it.
  • The workshop demonstrated using Claude Code and OpenAI Codex to explore, analyze, and clean data.
  • The handout covered setup for Claude Code and Codex; asking questions against a database; exploring and cleaning data; creating visualizations; and scraping data with agents.
  • Workshop exercises used Python and SQLite, and some exercises used Datasette.

Cost Governance And Low-Friction Rollout Pattern For Agent Tooling

  • Total Codex token spend by workshop participants was 23 US dollars.
  • The workshop used GitHub Codespaces and OpenAI Codex to distribute a budget-restricted Codex API key to attendees for cost control and ease of setup.

Agent-In-The-Loop Visualization Prototyping Integrated With A Served App Directory

  • A highlighted workflow configured Datasette to serve static content from a visualization folder, then used Claude Code to iteratively create interactive visualizations directly in that folder.
  • Claude Code produced a heat map visualization for a trees database using Leaflet and Leaflet.heat.

Expected Transferability Beyond The Initial Audience

  • The handout was designed to be useful to people who did not attend the workshop, and the author expects it to apply beyond data journalism to anyone exploring data.

Unknowns

  • How many participants were in the workshop, and what was the distribution of token spend per attendee and per exercise?
  • What model versions, prompting patterns, and guardrails were used (e.g., constraints, system prompts, tool permissions) when using Claude Code and Codex?
  • What were the observed quality outcomes (correctness, data cleaning accuracy, visualization correctness, hallucination rates) versus a non-agent baseline?
  • What review and testing practices were used for agent-generated code and data transformations in the exercises?
  • How maintainable were the generated visualization artifacts over multiple iterations (dependency management, code structure, performance on larger datasets)?

Investor overlay

Read-throughs

  • Hands-on training using Claude Code and OpenAI Codex for data exploration, cleaning, and visualization suggests growing institutional interest in agent-assisted analytics workflows, which could translate into broader tool adoption across data-centric teams.
  • The described rollout pattern using Codespaces and budget-restricted API keys points to a governance-friendly path for deploying agent tooling in group settings, potentially increasing organizational willingness to trial paid LLM tooling.
  • Generating visualization artifacts directly into a served Datasette folder highlights demand for rapid agent-in-the-loop prototyping tied to deployable outputs, implying value for platforms that tightly integrate coding agents with lightweight data apps.

What would confirm

  • Evidence of sustained adoption of the handout beyond attendees, such as reuse in other trainings, public forks, or reports of teams applying the workflow outside data journalism.
  • Additional cohorts replicating the same low-friction setup, specifically standardized environments plus spend-limited keys, with consistent aggregate costs that remain manageable at larger group sizes.
  • Published evaluation of agent-generated code quality, including correctness of cleaning steps and visualization behavior, plus described review and testing practices that make the workflow reliable and maintainable.

What would kill

  • Reports that agent-generated transformations or visualizations frequently require heavy rework, fail correctness checks, or become hard to maintain across iterations, limiting practical usefulness.
  • Observed token spend per attendee or per exercise materially exceeds expectations, making budget governance less effective for classroom or team rollouts.
  • Little to no follow-on usage outside the workshop context, indicating the workflow does not generalize well or lacks sufficient utility for broader audiences.

Sources

  1. 2026-03-16 simonwillison.net