Derivative-Work Dispute And Clean-Room Validity Under Prior Exposure

Issue 64 Edition 2026-03-05 7 min read

General

Sources: 1 • Confidence: High • Updated: 2026-04-12 10:23

Key takeaways

Mark Pilgrim argues that chardet 7.0.0 cannot be relicensed to MIT because it is a modification of LGPL-licensed work and is not a valid clean-room implementation given the maintainers' prior exposure to the code.
The chardet project was originally released in 2006 by Mark Pilgrim under the LGPL.
AI coding agents can produce a fresh codebase from a specification and tests fast enough to approximate a clean-room reimplementation workflow compared to traditional multi-team clean-room processes.
Dan Blanchard reports using the JPlag tool and obtaining low similarity scores for chardet 7.0.0 versus prior versions, while older versions show high similarity with each other.
A key ecosystem watch item is whether low-cost reimplementation from test suites will cause software to re-emerge under more permissive open-source or proprietary licenses at scale.

Mark Pilgrim argues that chardet 7.0.0 cannot be relicensed to MIT because it is a modification of LGPL-licensed work and is not a valid clean-room implementation given the maintainers' prior exposure to the code.
Dan Blanchard asserts that chardet 7.0.0 is an independent work and therefore can be MIT licensed despite the project's LGPL history.
Dan Blanchard acknowledges that a traditional clean-room separation did not exist because he had extensive prior knowledge of the original chardet codebase from maintaining it for over a decade.
The dispute over whether chardet 7.0.0 can be MIT licensed is expected to be difficult to resolve definitively in the near term.

The chardet project was originally released in 2006 by Mark Pilgrim under the LGPL.
chardet has been maintained by others since 2011, and Dan Blanchard has made every release since version 1.1 in July 2012.
Dan Blanchard released chardet 7.0.0 and described it as a ground-up rewrite under the MIT license that keeps the same package name and public API as a drop-in replacement for 5.x/6.x.

AI coding agents can produce a fresh codebase from a specification and tests fast enough to approximate a clean-room reimplementation workflow compared to traditional multi-team clean-room processes.
Dan Blanchard reports that the rewrite process began in an empty repository, included instructing Claude not to use LGPL/GPL-licensed code, and proceeded via iterative review, testing, and refinement.

Dan Blanchard reports using the JPlag tool and obtaining low similarity scores for chardet 7.0.0 versus prior versions, while older versions show high similarity with each other.
Dan Blanchard argues that non-derivation can be supported by measurement of structural independence rather than strict clean-room process separation alone.

A key ecosystem watch item is whether low-cost reimplementation from test suites will cause software to re-emerge under more permissive open-source or proprietary licenses at scale.
Well-funded litigation is expected to emerge around AI-assisted clean-room-like rewrites as commercial firms perceive their IP is threatened by cheap reimplementation.

A key ecosystem watch item is whether low-cost reimplementation from test suites will cause software to re-emerge under more permissive open-source or proprietary licenses at scale.

Is chardet 7.0.0 legally considered an independent work or a derivative of prior LGPL-licensed versions?
Can similarity measurements (as described) serve as persuasive evidence of non-derivation in legal or widely accepted community processes?
Are the reported JPlag similarity results reproducible with independent tools, configurations, and reviewers?
Was the model used in the rewrite trained on the original chardet repository, and if so, how should that training exposure affect clean-room and derivative-work analysis?
What, if any, authoritative third-party positions (legal analyses, foundations, distributors) will emerge that materially change how the ecosystem treats chardet 7.0.0?

AI assisted reimplementation from tests could lower the cost of producing drop in replacements that shift licenses toward more permissive or proprietary terms, potentially changing how software value is captured across ecosystems.
If similarity metrics become accepted as evidence of non derivation, compliance and due diligence workflows may shift toward tooling based arguments rather than provenance based arguments.
High profile disputes over derivative work status in AI assisted rewrites could increase legal uncertainty around relicensing and alter risk tolerance for maintainers and downstream distributors.

Authoritative third party positions emerge on chardet 7.0.0 derivative work status or clean room adequacy, such as legal analyses, foundations, or major distributors adopting a clear stance.
Independent reproducibility of low similarity results using multiple tools and consistent configurations gains community acceptance as a proxy for non derivation in disputes.
Multiple widely used packages successfully re emerge as drop in replacements under new licenses via test driven AI assisted rewrites, indicating the workflow scales beyond a one off.

Chardet 7.0.0 is widely judged or ruled to be a derivative of the prior LGPL work, undermining the premise that AI assisted rewrites can reliably enable license transitions.
Similarity metrics are broadly rejected as persuasive evidence in community processes or legal analysis, reducing their usefulness for establishing independence claims.
Evidence surfaces that the rewrite relied on original code exposure in ways considered disqualifying for clean room approaches, limiting broader adoption of the pattern.