Agent-Assisted Data Work Packaged As End-To-End Curriculum
Sources: 1 • Confidence: High • Updated: 2026-04-12 10:16
Key takeaways
- A three-hour NICAR 2026 workshop titled "Coding agents for data analysis" was delivered and targeted data journalists.
- Workshop participants collectively spent 23 US dollars worth of Codex tokens.
- A workshop workflow configured Datasette to serve static content from a visualization folder and used Claude Code to iteratively create interactive visualizations directly in that folder.
- The workshop demonstrated using Claude Code and OpenAI Codex to explore, analyze, and clean data.
- The handout covers setup for Claude Code and Codex, database questioning, data exploration and cleaning, visualization creation, and agent-assisted scraping.
Sections
Agent-Assisted Data Work Packaged As End-To-End Curriculum
- A three-hour NICAR 2026 workshop titled "Coding agents for data analysis" was delivered and targeted data journalists.
- The workshop demonstrated using Claude Code and OpenAI Codex to explore, analyze, and clean data.
- The handout covers setup for Claude Code and Codex, database questioning, data exploration and cleaning, visualization creation, and agent-assisted scraping.
- Workshop exercises used Python and SQLite, and some exercises used Datasette.
- The handout was designed for usefulness beyond in-person attendees and is expected by the author to apply beyond data journalism to general data exploration.
Controlled Rollout Pattern For Hands-On Agent Tooling (Environment + Spend Governance)
- Workshop participants collectively spent 23 US dollars worth of Codex tokens.
- The workshop used GitHub Codespaces and OpenAI Codex to distribute a budget-restricted Codex API key to attendees.
Agent-In-The-Loop Visualization Iteration Within A Running Data App
- A workshop workflow configured Datasette to serve static content from a visualization folder and used Claude Code to iteratively create interactive visualizations directly in that folder.
- Claude Code generated a heat map visualization for a trees database using Leaflet and Leaflet.heat.
Unknowns
- How many participants attended, and what was the per-attendee distribution of token spend across exercises?
- What objective outcomes were observed (task completion time, error rates, learning gains, or quality of produced analyses/visualizations) compared to a non-agent baseline?
- How often did agent outputs require manual correction, and what kinds of failures (logic errors, data misinterpretation, security issues, dependency problems) occurred during the exercises?
- What were the specific parameters of the budget-restricted API key (limits, enforcement behavior, participant friction), and did the restriction affect the learning experience?
- Is there evidence of reuse or adoption of the handout beyond the original workshop (downloads, forks, citations, or follow-on trainings)?