Derivative-Work And Clean-Room Legitimacy Dispute Under Maintainer Prior Exposure

Issue 64 Edition 2026-03-05 7 min read

General

Sources: 1 • Confidence: High • Updated: 2026-04-13 03:56

Key takeaways

Mark Pilgrim argues that chardet 7.0.0 cannot be relicensed to MIT because it is a modification of LGPL-licensed work and is not a valid clean-room implementation given the maintainers’ prior exposure to the code.
A key watch item is whether low-cost reimplementation from test suites will cause software to re-emerge under more permissive, open source, or proprietary licenses at scale.
AI coding agents can rapidly approximate a clean-room implementation workflow by generating a fresh codebase from a specification and tests faster than traditional multi-team clean-room processes.
The software package chardet was originally released in 2006 by Mark Pilgrim under the LGPL and has been maintained by others since 2011, with Dan Blanchard responsible for every release since version 1.1 in July 2012.
Dan Blanchard reports using the JPlag plagiarism-detection tool and obtaining low similarity scores for chardet 7.0.0 versus prior versions (maximum around 1.29% versus recent releases and 0.64% versus 1.1), while older versions show 80–93% similarities with each other.

Mark Pilgrim argues that chardet 7.0.0 cannot be relicensed to MIT because it is a modification of LGPL-licensed work and is not a valid clean-room implementation given the maintainers’ prior exposure to the code.
Dan Blanchard asserts that chardet 7.0.0 is an independent work and therefore can legitimately be MIT licensed despite the project’s LGPL history.
Dan Blanchard acknowledges that a traditional clean-room separation did not exist because he had extensive prior knowledge of the original chardet codebase from maintaining it for over a decade.
A stated complication is that the model used for the rewrite may have been trained on the original chardet repository, raising questions about whether model-mediated reproduction can qualify as a morally or legally defensible clean-room implementation.

A key watch item is whether low-cost reimplementation from test suites will cause software to re-emerge under more permissive, open source, or proprietary licenses at scale.
Simon expects the chardet relicensing dispute to be difficult to resolve definitively and personally leans toward the rewrite being legitimate while considering both sides credible.
Simon anticipates that as commercial firms perceive their closely held IP is threatened by cheap reimplementation, well-funded litigation will emerge around AI-assisted clean-room-like rewrites.

AI coding agents can rapidly approximate a clean-room implementation workflow by generating a fresh codebase from a specification and tests faster than traditional multi-team clean-room processes.
Dan Blanchard describes a rewrite process where he began in an empty repository, instructed Claude not to use LGPL/GPL-licensed code, and then iteratively reviewed, tested, and refined the generated implementation.

The software package chardet was originally released in 2006 by Mark Pilgrim under the LGPL and has been maintained by others since 2011, with Dan Blanchard responsible for every release since version 1.1 in July 2012.
Dan Blanchard released chardet 7.0.0 describing it as a ground-up MIT-licensed rewrite that keeps the same package name and public API as a drop-in replacement for 5.x/6.x.

Dan Blanchard reports using the JPlag plagiarism-detection tool and obtaining low similarity scores for chardet 7.0.0 versus prior versions (maximum around 1.29% versus recent releases and 0.64% versus 1.1), while older versions show 80–93% similarities with each other.
Dan Blanchard argues that clean-room methodology is a means to prove non-derivation and that structural independence can be demonstrated by measurement rather than strict process separation alone.

A key watch item is whether low-cost reimplementation from test suites will cause software to re-emerge under more permissive, open source, or proprietary licenses at scale.

Is chardet 7.0.0 legally considered an independent work eligible for MIT licensing, or a derivative work of the prior LGPL codebase?
Can similarity measurements (as described) serve as persuasive evidence of non-derivation in licensing disputes, and what methodological standards would be required?
Was the model used for the rewrite trained on the original chardet repository, and if so, does that exposure affect the legal or ethical standing of the rewrite as ‘clean-room’?
What independent reproducibility checks (using other tools or configurations) would yield similar or different similarity results for the rewrite versus prior versions?
Will this dispute reach a clear resolution (community consensus, foundation guidance, or formal legal action), or remain unresolved for an extended period?

Low-cost reimplementation from tests and specifications could accelerate shifts from copyleft to permissive or proprietary licensing, increasing licensing uncertainty across widely used open source dependencies.
AI coding agents may make clean-room-like rewrites faster and cheaper, potentially raising the frequency of derivative-work and contamination disputes when maintainers have prior exposure to original code.

Clear resolution emerges through community consensus, foundation guidance, or formal legal action establishing whether chardet 7.0.0 qualifies as an independent work eligible for MIT licensing.
Independent reproducibility checks using multiple similarity tools and configurations broadly replicate low similarity between the rewrite and prior LGPL versions, strengthening non-derivation claims.
Evidence clarifies whether any model used for the rewrite was trained on the original repository and whether that exposure is treated as contamination in clean-room analysis.

Authoritative determination finds the rewrite is derivative of the prior LGPL codebase or that maintainer prior exposure defeats clean-room defenses, limiting relicensing viability.
Similarity measurements are deemed methodologically insufficient or non-persuasive for licensing disputes, reducing their usefulness as evidence of independence.
The dispute remains unresolved for an extended period without accepted guidance, weakening the premise of near-term licensing equilibrium shifts driven by this specific event.