Agent-Assisted Data Analysis Packaged As End-To-End Training
Sources: 1 • Confidence: High • Updated: 2026-04-13 03:49
Key takeaways
- A three-hour NICAR 2026 workshop titled "Coding agents for data analysis" was delivered for data journalists, and a handout for it was prepared.
- The workshop exercises used Python and SQLite, and some exercises used Datasette.
- Workshop participants collectively used 23 USD worth of Codex tokens.
- The workshop demonstrates using Claude Code and OpenAI Codex to explore, analyze, and clean data.
- The handout includes modules on setting up Claude Code and Codex, querying a database, exploring and cleaning data, creating visualizations, and scraping data with agents.
Sections
Agent-Assisted Data Analysis Packaged As End-To-End Training
- A three-hour NICAR 2026 workshop titled "Coding agents for data analysis" was delivered for data journalists, and a handout for it was prepared.
- The workshop demonstrates using Claude Code and OpenAI Codex to explore, analyze, and clean data.
- The handout includes modules on setting up Claude Code and Codex, querying a database, exploring and cleaning data, creating visualizations, and scraping data with agents.
- The handout was designed to be useful to non-attendees and was expected by the author to generalize beyond data journalism to anyone exploring data.
File-Based Visualization Prototyping Loop Integrated With A Running Data App
- The workshop exercises used Python and SQLite, and some exercises used Datasette.
- A highlighted workflow configured Datasette to serve static content from a visualization folder while Claude Code iteratively generated interactive visualizations directly into that folder.
- Claude Code generated a heat map visualization for a trees database using Leaflet and Leaflet.heat.
Operational Rollout Pattern With Spend Governance For Hands-On Agent Usage
- Workshop participants collectively used 23 USD worth of Codex tokens.
- The workshop used GitHub Codespaces and OpenAI Codex to distribute a budget-restricted Codex API key to attendees as a cost-control and ease-of-distribution approach.
Unknowns
- How many attendees participated, and what was the per-attendee and per-exercise token spend distribution underlying the 23 USD total?
- What was the measured effect on task completion time, correctness, or rework rates when using Claude Code/Codex versus a non-agent baseline for the same exercises?
- What guardrails were used to prevent data leakage or unsafe actions (e.g., secrets handling, scraping constraints, database access limits) in the Codespaces + API key setup?
- How maintainable and reviewable was the agent-generated visualization code over multiple iterations (dependency management, code structure, performance on larger datasets)?
- To what extent has the handout been adopted or reused by non-journalism audiences, and what feedback indicates successful transfer beyond the original workshop?