Ai-Text Detection As A Product: Methods And Claimed Metrics
Sources: 1 • Confidence: Medium • Updated: 2026-04-11 19:16
Key takeaways
- Pangram Labs offers a paid product and a free service where users can paste text to receive an estimated probability of human versus AI authorship.
- Major platforms are simultaneously encouraging AI-generated responses in products while also trying to suppress AI slop in search and feeds, reflecting mixed incentives.
- The boundary of unacceptable AI assistance is philosophically unclear because tools like spellcheck or AI copy-editing may be treated differently despite similar functional roles.
- Max Spiro believes detector evasion could become practical by optimizing simultaneously for detector 'human' scores and a separate LLM-based coherency judge, and views this as a valuable red-team target.
- Extending AI-origin detection methods from text to video is constrained by the much higher cost of generating video at scale compared with text.
Sections
Ai-Text Detection As A Product: Methods And Claimed Metrics
- Pangram Labs offers a paid product and a free service where users can paste text to receive an estimated probability of human versus AI authorship.
- Pangram does not use perplexity and instead uses a deep learning approach that it claims exceeds perplexity-based detection performance.
- In Pangram's initial human-baseline testing, a human evaluator could classify AI versus human text with about 90% accuracy.
- Pangram reports a false-positive rate of about 1 in 10,000 on human writing.
- Pangram reports roughly a 1% false-negative rate for detecting straightforward AI-generated outputs, with worse performance under adversarial prompting.
- Pangram's detector infers AI authorship by learning many small writing-choice patterns across a passage rather than relying on a few explicit 'tells'.
Content Ecosystem Shift: Ai Slop, Incentives, And Platform Structure
- Major platforms are simultaneously encouraging AI-generated responses in products while also trying to suppress AI slop in search and feeds, reflecting mixed incentives.
- A downside scenario is that open internet spaces become flooded by bots, pushing authentic communication into walled gardens where identity can be enforced.
- A practical motivation for detecting AI writing is deciding whether to engage with social media replies that may be bots rather than real people.
- Max Spiro estimates roughly 40% of internet pages are AI-written, driven largely by SEO-focused article production switching to AI for cost reasons.
- Pangram's scan of Medium found that over 50% of newly written Medium articles were AI-generated about a year and a half prior to recording.
- One incentive for AI posting on Reddit is paid 'organic mention' campaigns where bot-like accounts blend in with normal replies and occasionally insert brand recommendations.
Deployment Risk: False Positives, Governance, And Policy Boundary Disputes
- The boundary of unacceptable AI assistance is philosophically unclear because tools like spellcheck or AI copy-editing may be treated differently despite similar functional roles.
- Perplexity-based detectors can produce false positives for non-native English writers.
- If black-box AI-detection models falsely label human writing as AI-generated and are treated as authoritative judges, they can create reputational and career risk.
- Actively trying to identify whether everyday incoming writing is AI-generated can impose a large ongoing cognitive burden on journalists.
- AI-writing concerns are expected to be especially acute in education and legal work where authorship and accountability matter.
Generator–Detector Arms Race And Evasion Pathways
- Max Spiro believes detector evasion could become practical by optimizing simultaneously for detector 'human' scores and a separate LLM-based coherency judge, and views this as a valuable red-team target.
- In Jill Weisenthal's initial tests, Pangram correctly classified her own writing versus AI outputs and still flagged AI after multiple language translations intended to obfuscate style.
- A black-box evasion attempt that iteratively searched for prompts scoring as 'human' on Pangram succeeded only by producing largely incoherent or grammatically incorrect text.
- An adversary could attempt to evade detection by iteratively prompting an LLM to jointly optimize for a detector 'human' score and an LLM-based coherence judge.
- As LLMs become more capable, detector models may need to become larger or more powerful to keep pace with more complex output distributions.
Provenance Vs Detection And The Hardware-Signing Approach
- Extending AI-origin detection methods from text to video is constrained by the much higher cost of generating video at scale compared with text.
- The C2PA initiative is working with hardware and phone makers to embed signatures that prove images and videos were captured by real devices rather than generated.
- Pangram anticipates a future data-provenance problem as internet text increasingly contains AI content, and plans to rely more on pre-2023 corpora and on identifying trusted actors for contemporary human text.
- Solving provenance or detection for images and possibly audio could be sufficient to generalize to video in a zero-shot manner.
Watchlist
- Some books are beginning to include explicit disclaimers stating they were written only by humans with no AI used.
- Max Spiro believes detector evasion could become practical by optimizing simultaneously for detector 'human' scores and a separate LLM-based coherency judge, and views this as a valuable red-team target.
Unknowns
- What are Pangram's independently verified false-positive and false-negative rates across diverse domains (student writing, journalism, forums), languages, and demographics (including ESL)?
- How robust is Pangram-style detection to systematic adversarial optimization that maintains coherence and style (e.g., joint optimization against detector score plus a coherence judge)?
- How quickly does detector performance drift as 'clean' contemporary human corpora become harder to source due to AI contamination, and what mitigations work best?
- What is the current, independently estimated share of AI-written content on the web and on major UGC platforms, and how sensitive are estimates to methodology?
- Where will institutions draw and enforce the line between acceptable AI-assisted editing and unacceptable AI-generated drafting, and what evidence standards will they require?