Curvenote Positioning Adoption Wedge And Architecture

Issue 81 Edition 2026-03-22 7 min read

General

Sources: 1 • Confidence: Medium • Updated: 2026-03-25 17:57

Key takeaways

Most Curvenote users are already part of a Jupyter-like computational community and have existing skills with datasets, computation, and coding.
A major barrier to beyond-PDF research communication is social and incentive-driven because researchers still need downloadable PDFs to submit to journals and receive credit.
Continuous Science Foundation emphasizes community and incentive alignment (including earlier attribution and licensing) to reduce fear of being scooped and to increase credit for sharing.
Improving scientific reproducibility requires both (a) integrity via auditable datasets and pipelines and (b) reuse via others being able to access and apply methods and data.
There is an evolution in scientific data formats from HDF5 toward Zarr-based, cloud-optimized formats designed for object storage with metadata and more efficient access.

Sections

Curvenote Positioning Adoption Wedge And Architecture

Most Curvenote users are already part of a Jupyter-like computational community and have existing skills with datasets, computation, and coding.
Curvenote's early product work focused on lowering technical barriers via a WYSIWYG editor integrated with Jupyter, including copying notebook cell outputs inline into documents.
JupyterBook, using MyST Markdown, was created to turn notebooks into publishable narratives packaging environment, code, data, and narrative, and it has been used to build tens of thousands of educational texts and courses.
Curvenote is positioned as a scientific content management system intended to bridge modern research authoring (e.g., notebooks) and legacy publisher workflows (e.g., FTP/XML) that are not designed for data- or compute-rich publishing.
Curvenote does not store large-scale datasets itself and instead integrates with external repositories or partners for storage and access.
Curvenote originated in computational geoscience, and current work is focused mainly on computational bioscience and computational neuroscience sharing contexts.

Publication Layer Mismatch And Static Outputs

A major barrier to beyond-PDF research communication is social and incentive-driven because researchers still need downloadable PDFs to submit to journals and receive credit.
There is a workflow mismatch between computationally reproducible research practices and the paper-publication process, which often forces static screenshots and weak sharing of code and data.
Scientific communication systems have not kept pace with the shift to terabyte-scale datasets and complex processing pipelines.
In some fields (e.g., large imaging datasets), scientists often publish screenshots rather than integrated, zoomable, interrogable views connected to the narrative, limiting verification and exploration.
Common scientific data-sharing practice is to upload an uncurated zip file to repositories (e.g., Zenodo or Dryad) without sufficient context.

Incentives Credit And Upstream Dissemination

Continuous Science Foundation emphasizes community and incentive alignment (including earlier attribution and licensing) to reduce fear of being scooped and to increase credit for sharing.
New journals such as the Journal of Open Source Software emerged to provide career credit for widely reused software labor that was historically undervalued in traditional publication incentives.
Some journals and societies show resistance to beyond-PDF change because PDFs and current workflows are sufficient for their existing business models, reducing their incentive to invest in new approaches.
Policy and funding shifts from major foundations are pushing research dissemination upstream toward preprint repositories.

Reproducibility As Integrity Plus Reuse

Improving scientific reproducibility requires both (a) integrity via auditable datasets and pipelines and (b) reuse via others being able to access and apply methods and data.
Integrating data, code, and visuals into a single computational narrative (e.g., notebook-style tools) is a key lever for improving reuse and comprehension of scientific results.

Cloud Native Data And Compute To Data

There is an evolution in scientific data formats from HDF5 toward Zarr-based, cloud-optimized formats designed for object storage with metadata and more efficient access.
Source Cooperative is presented as an example storage approach built on AWS buckets with a minimal model that supports bringing compute directly to datasets better than many archival systems.

Watchlist

Rowan Cockett and Tracy Teal are attempting to rally stakeholders around an Open Exchange Architecture standard intended to be as widely adopted in science as the PDF while supporting modular, computational publishing with graceful degradation.

Unknowns

What measurable impact do executable/interactive articles have on reuse outcomes (e.g., time-to-first-successful-run, verification time, downstream reuse rates) compared with traditional PDF-plus-repository workflows?
How prevalent are the described failure modes (screenshots for rich datasets, uncurated zip uploads) across fields, and which disciplines experience the largest bottlenecks?
Which specific publisher workflows and constraints (submission formats, archival requirements, compliance needs) most strongly enforce PDF-centric outputs today?
What are Curvenote’s real-world integration patterns with external storage and repositories, and what operational limits result (latency, access control, identity, cost allocation)?
What is the current state of the Open Exchange Architecture effort (spec maturity, governance, reference implementations, and adopters), and what interoperability problems it concretely solves?

Investor overlay

Read-throughs

A standards-driven shift from PDF-centric publishing toward modular computational articles could create demand for tooling that bridges modern authoring with legacy publisher workflows and supports graceful degradation to static outputs.
Growing emphasis on reproducibility as integrity plus reuse may pull research workflows toward auditable datasets, pipelines, and computational narratives, increasing adoption of systems that package data, code, and context for downstream reuse.
Migration from HDF5 toward cloud-optimized formats like Zarr and compute-to-data workflows could increase investment in object-storage-centric data infrastructure, metadata layers, and repository integrations rather than local file-based distribution.

What would confirm

Open Exchange Architecture shows tangible momentum such as governance, spec maturity, reference implementations, and multiple adopters, and it demonstrably improves interoperability across publisher and repository workflows.
Publishers or journals accept and operationalize computational or interactive articles while still meeting PDF or archival requirements, reducing the incentive barrier and validating the CMS integration chokepoint thesis.
Quantitative evidence emerges that executable articles reduce time-to-first-successful-run or verification time and increase downstream reuse compared with PDF plus repository workflows.

What would kill

Publisher constraints remain rigid and continue to enforce PDF-centric submissions without credible pathways for computational outputs, keeping adoption limited to niche communities and education pilots.
Open Exchange Architecture effort stalls or fragments, with unclear governance or minimal adoption, failing to solve concrete interoperability problems and weakening the standardization wedge.
Operational integration with external storage and repositories proves too costly or limited in latency, access control, identity, or cost allocation, undermining compute-to-data and cloud-native workflow viability.

Sources

Beyond the PDF: Rowan Cockett on Reproducible, Composable Science

2026-03-22 dataengineeringpodcast.com