Chardet 7.0.0 Relicensing Event And Legitimacy Dispute

Issue 64 Edition 2026-03-05 7 min read

General

Sources: 1 • Confidence: High • Updated: 2026-03-08 21:23

Key takeaways

Mark Pilgrim argues that chardet 7.0.0 cannot be relicensed to MIT because it is a modification of LGPL-licensed work and is not a valid clean-room implementation given maintainer prior exposure.
A key watch item identified is whether low-cost reimplementation from test suites will cause software to re-emerge under more permissive, open source, or proprietary licenses at scale.
Coding agents can generate a fresh codebase from a specification and tests much faster than traditional multi-team clean-room processes, approximating a clean-room implementation workflow.
Dan Blanchard reports using JPlag and obtaining low similarity scores for chardet 7.0.0 versus prior versions (maximum around 1.29% vs recent releases and 0.64% vs 1.1), while older versions show 80–93% similarities with each other.
Dan Blanchard asserts that chardet 7.0.0 is an independent work and therefore can be MIT licensed despite the project’s LGPL history.

Mark Pilgrim argues that chardet 7.0.0 cannot be relicensed to MIT because it is a modification of LGPL-licensed work and is not a valid clean-room implementation given maintainer prior exposure.
Dan Blanchard asserts that chardet 7.0.0 is an independent work and therefore can be MIT licensed despite the project’s LGPL history.
The chardet project was released in 2006 under the LGPL by Mark Pilgrim and has been maintained by others since 2011, with Dan Blanchard responsible for every release since version 1.1 (July 2012).
Dan Blanchard released chardet 7.0.0 described as a ground-up MIT-licensed rewrite that keeps the same package name and public API as a drop-in replacement for 5.x/6.x.
Dan Blanchard acknowledges that no traditional clean-room separation existed because he had extensive prior knowledge of the original chardet codebase from maintaining it for over a decade.
Simon expects the chardet relicensing dispute to be difficult to resolve definitively and personally leans toward the rewrite being legitimate while considering both sides credible.

A key watch item identified is whether low-cost reimplementation from test suites will cause software to re-emerge under more permissive, open source, or proprietary licenses at scale.
A stated complication is that the model used for the rewrite may have been trained on the original chardet repository, raising questions about whether model-mediated reproduction can qualify as a morally or legally defensible clean-room implementation.
Simon anticipates that as commercial firms perceive their closely held IP is threatened by cheap reimplementation, well-funded litigation will emerge around AI-assisted clean-room-like rewrites.

Coding agents can generate a fresh codebase from a specification and tests much faster than traditional multi-team clean-room processes, approximating a clean-room implementation workflow.
Dan Blanchard describes a rewrite process starting from an empty repository, instructing Claude not to use LGPL/GPL-licensed code, and iteratively reviewing, testing, and refining the generated implementation.

Dan Blanchard reports using JPlag and obtaining low similarity scores for chardet 7.0.0 versus prior versions (maximum around 1.29% vs recent releases and 0.64% vs 1.1), while older versions show 80–93% similarities with each other.
Dan Blanchard argues that clean-room methodology is a means to prove non-derivation and that structural independence can be demonstrated by measurement rather than strict process separation alone.

A key watch item identified is whether low-cost reimplementation from test suites will cause software to re-emerge under more permissive, open source, or proprietary licenses at scale.

Is chardet 7.0.0 legally considered an independent work or a derivative of prior LGPL-licensed chardet code under applicable copyright and licensing interpretations?
Can the reported JPlag similarity results be independently reproduced, and how sensitive are they to configuration, language parsing choices, and tool selection?
Was the model used for the rewrite trained on the original chardet repository or close derivatives, and what evidence (if any) exists about memorization or verbatim reproduction risk for this codebase?
What, if any, third-party legal analyses, foundation positions, or formal actions emerge that clarify acceptable standards for AI-assisted rewrites and measurement-based non-derivation arguments?
How will downstream distributors and major dependents respond in practice (e.g., pinning to older versions, forking, or accepting 7.0.0), and what signals indicate emerging consensus?

If low cost AI assisted rewrites from specs and tests scale, more LGPL or copyleft code could be functionally replaced and reissued under permissive or proprietary licenses, creating licensing churn and new compliance workflows for software distributors.
Measurement based non derivation arguments using similarity tools could gain traction as a practical standard, shifting how enterprises evaluate provenance risk when adopting rewritten open source components.
Downstream reaction to chardet 7.0.0 could become a bellwether for how major dependents treat disputed AI assisted rewrites, affecting upgrade behavior, forking frequency, and acceptance of relicensing claims.

Independent reproductions of low similarity results between chardet 7.0.0 and prior LGPL versions using multiple tooling and configurations, with consistent outcomes and transparent methodology.
Public positions or formal guidance emerging from recognized foundations, distributors, or legal analyses that endorse or operationalize measurement based independence claims for AI assisted rewrites.
Observable downstream acceptance signals such as major dependents upgrading to 7.0.0, reduced pinning to older versions, or broad distribution packaging that treats 7.0.0 as MIT without added exceptions.

Credible evidence that chardet 7.0.0 is legally treated as a derivative work, including formal actions or authoritative analyses that reject the independence claim under applicable copyright and licensing interpretations.
Independent analysis showing similarity metrics are not robust, are highly configuration sensitive, or identify meaningful copied structure, increasing perceived derivation or memorization risk.
Broad downstream refusal signals such as widespread pinning to older LGPL versions, major forks that avoid 7.0.0, or distributors declining to ship 7.0.0 under MIT due to unresolved provenance.