Chardet 7.0.0 Relicensing Event And Legitimacy Dispute
Sources: 1 • Confidence: High • Updated: 2026-03-08 21:23
Key takeaways
- Mark Pilgrim argues that chardet 7.0.0 cannot be relicensed to MIT because it is a modification of LGPL-licensed work and is not a valid clean-room implementation given maintainer prior exposure.
- A key watch item identified is whether low-cost reimplementation from test suites will cause software to re-emerge under more permissive, open source, or proprietary licenses at scale.
- Coding agents can generate a fresh codebase from a specification and tests much faster than traditional multi-team clean-room processes, approximating a clean-room implementation workflow.
- Dan Blanchard reports using JPlag and obtaining low similarity scores for chardet 7.0.0 versus prior versions (maximum around 1.29% vs recent releases and 0.64% vs 1.1), while older versions show 80–93% similarities with each other.
- Dan Blanchard asserts that chardet 7.0.0 is an independent work and therefore can be MIT licensed despite the project’s LGPL history.
Sections
Chardet 7.0.0 Relicensing Event And Legitimacy Dispute
- Mark Pilgrim argues that chardet 7.0.0 cannot be relicensed to MIT because it is a modification of LGPL-licensed work and is not a valid clean-room implementation given maintainer prior exposure.
- Dan Blanchard asserts that chardet 7.0.0 is an independent work and therefore can be MIT licensed despite the project’s LGPL history.
- The chardet project was released in 2006 under the LGPL by Mark Pilgrim and has been maintained by others since 2011, with Dan Blanchard responsible for every release since version 1.1 (July 2012).
- Dan Blanchard released chardet 7.0.0 described as a ground-up MIT-licensed rewrite that keeps the same package name and public API as a drop-in replacement for 5.x/6.x.
- Dan Blanchard acknowledges that no traditional clean-room separation existed because he had extensive prior knowledge of the original chardet codebase from maintaining it for over a decade.
- Simon expects the chardet relicensing dispute to be difficult to resolve definitively and personally leans toward the rewrite being legitimate while considering both sides credible.
Ai-Specific Provenance Uncertainty And Ecosystem-Level Watch Items
- A key watch item identified is whether low-cost reimplementation from test suites will cause software to re-emerge under more permissive, open source, or proprietary licenses at scale.
- A stated complication is that the model used for the rewrite may have been trained on the original chardet repository, raising questions about whether model-mediated reproduction can qualify as a morally or legally defensible clean-room implementation.
- Simon anticipates that as commercial firms perceive their closely held IP is threatened by cheap reimplementation, well-funded litigation will emerge around AI-assisted clean-room-like rewrites.
Ai-Assisted Rewrite Workflow As A Clean-Slate Reimplementation Pattern
- Coding agents can generate a fresh codebase from a specification and tests much faster than traditional multi-team clean-room processes, approximating a clean-room implementation workflow.
- Dan Blanchard describes a rewrite process starting from an empty repository, instructing Claude not to use LGPL/GPL-licensed code, and iteratively reviewing, testing, and refining the generated implementation.
Measurement-Based Evidence Claims For Non-Derivation
- Dan Blanchard reports using JPlag and obtaining low similarity scores for chardet 7.0.0 versus prior versions (maximum around 1.29% vs recent releases and 0.64% vs 1.1), while older versions show 80–93% similarities with each other.
- Dan Blanchard argues that clean-room methodology is a means to prove non-derivation and that structural independence can be demonstrated by measurement rather than strict process separation alone.
Watchlist
- A key watch item identified is whether low-cost reimplementation from test suites will cause software to re-emerge under more permissive, open source, or proprietary licenses at scale.
Unknowns
- Is chardet 7.0.0 legally considered an independent work or a derivative of prior LGPL-licensed chardet code under applicable copyright and licensing interpretations?
- Can the reported JPlag similarity results be independently reproduced, and how sensitive are they to configuration, language parsing choices, and tool selection?
- Was the model used for the rewrite trained on the original chardet repository or close derivatives, and what evidence (if any) exists about memorization or verbatim reproduction risk for this codebase?
- What, if any, third-party legal analyses, foundation positions, or formal actions emerge that clarify acceptable standards for AI-assisted rewrites and measurement-based non-derivation arguments?
- How will downstream distributors and major dependents respond in practice (e.g., pinning to older versions, forking, or accepting 7.0.0), and what signals indicate emerging consensus?