Agent-Assisted Data Work Packaged As End-To-End Curriculum

Issue 75 Edition 2026-03-16 5 min read

General

Sources: 1 • Confidence: High • Updated: 2026-04-12 10:16

Key takeaways

A three-hour NICAR 2026 workshop titled "Coding agents for data analysis" was delivered and targeted data journalists.
Workshop participants collectively spent 23 US dollars worth of Codex tokens.
A workshop workflow configured Datasette to serve static content from a visualization folder and used Claude Code to iteratively create interactive visualizations directly in that folder.
The workshop demonstrated using Claude Code and OpenAI Codex to explore, analyze, and clean data.
The handout covers setup for Claude Code and Codex, database questioning, data exploration and cleaning, visualization creation, and agent-assisted scraping.

A three-hour NICAR 2026 workshop titled "Coding agents for data analysis" was delivered and targeted data journalists.
The workshop demonstrated using Claude Code and OpenAI Codex to explore, analyze, and clean data.
The handout covers setup for Claude Code and Codex, database questioning, data exploration and cleaning, visualization creation, and agent-assisted scraping.
Workshop exercises used Python and SQLite, and some exercises used Datasette.
The handout was designed for usefulness beyond in-person attendees and is expected by the author to apply beyond data journalism to general data exploration.

Workshop participants collectively spent 23 US dollars worth of Codex tokens.
The workshop used GitHub Codespaces and OpenAI Codex to distribute a budget-restricted Codex API key to attendees.

A workshop workflow configured Datasette to serve static content from a visualization folder and used Claude Code to iteratively create interactive visualizations directly in that folder.
Claude Code generated a heat map visualization for a trees database using Leaflet and Leaflet.heat.

How many participants attended, and what was the per-attendee distribution of token spend across exercises?
What objective outcomes were observed (task completion time, error rates, learning gains, or quality of produced analyses/visualizations) compared to a non-agent baseline?
How often did agent outputs require manual correction, and what kinds of failures (logic errors, data misinterpretation, security issues, dependency problems) occurred during the exercises?
What were the specific parameters of the budget-restricted API key (limits, enforcement behavior, participant friction), and did the restriction affect the learning experience?
Is there evidence of reuse or adoption of the handout beyond the original workshop (downloads, forks, citations, or follow-on trainings)?

Early indicator of demand for agent-assisted training products that package end-to-end data workflows, which could support vendors offering coding agents, developer tools, or data journalism tooling if reused beyond the workshop.
Operational pattern suggests a pathway for enterprises or educators to deploy agent tooling with spend governance using hosted environments and budget-restricted API keys, implying potential demand for cost controls and admin features.
Agent-in-the-loop visualization iteration inside a running data app hints at workflow acceleration for analytics and visualization creation, a potential usage driver for tools that integrate agents with lightweight data stacks such as SQLite and Datasette.

Evidence of adoption beyond the original workshop, such as downloads, forks, citations, or follow-on trainings using the handout and workflow.
Measured outcomes versus a non-agent baseline, including task completion time, error rates, learning gains, or quality of analyses and visualizations produced.
Token spend distribution and cohort size, plus details on budget-restricted API key limits and enforcement, showing the spend governance pattern is scalable without harming the learning experience.

Low reuse of the handout and workflow beyond the initial workshop, indicating limited transferability or insufficient perceived value.
Frequent need for manual correction of agent outputs, with recurring failures such as logic errors, data misinterpretation, security issues, or dependency problems during exercises.
Budget-restricted API keys causing significant participant friction or blocking progress, suggesting governance constraints reduce usability and weaken the case for broader rollout.