Rosa Del Mar

Daily Brief

Issue 92 2026-04-02

Ai-Text Detection As A Product: Methods And Claimed Metrics

Issue 92 Edition 2026-04-02 9 min read
General
Sources: 1 • Confidence: Medium • Updated: 2026-04-11 19:16

Key takeaways

  • Pangram Labs offers a paid product and a free service where users can paste text to receive an estimated probability of human versus AI authorship.
  • Major platforms are simultaneously encouraging AI-generated responses in products while also trying to suppress AI slop in search and feeds, reflecting mixed incentives.
  • The boundary of unacceptable AI assistance is philosophically unclear because tools like spellcheck or AI copy-editing may be treated differently despite similar functional roles.
  • Max Spiro believes detector evasion could become practical by optimizing simultaneously for detector 'human' scores and a separate LLM-based coherency judge, and views this as a valuable red-team target.
  • Extending AI-origin detection methods from text to video is constrained by the much higher cost of generating video at scale compared with text.

Sections

Ai-Text Detection As A Product: Methods And Claimed Metrics

  • Pangram Labs offers a paid product and a free service where users can paste text to receive an estimated probability of human versus AI authorship.
  • Pangram does not use perplexity and instead uses a deep learning approach that it claims exceeds perplexity-based detection performance.
  • In Pangram's initial human-baseline testing, a human evaluator could classify AI versus human text with about 90% accuracy.
  • Pangram reports a false-positive rate of about 1 in 10,000 on human writing.
  • Pangram reports roughly a 1% false-negative rate for detecting straightforward AI-generated outputs, with worse performance under adversarial prompting.
  • Pangram's detector infers AI authorship by learning many small writing-choice patterns across a passage rather than relying on a few explicit 'tells'.

Content Ecosystem Shift: Ai Slop, Incentives, And Platform Structure

  • Major platforms are simultaneously encouraging AI-generated responses in products while also trying to suppress AI slop in search and feeds, reflecting mixed incentives.
  • A downside scenario is that open internet spaces become flooded by bots, pushing authentic communication into walled gardens where identity can be enforced.
  • A practical motivation for detecting AI writing is deciding whether to engage with social media replies that may be bots rather than real people.
  • Max Spiro estimates roughly 40% of internet pages are AI-written, driven largely by SEO-focused article production switching to AI for cost reasons.
  • Pangram's scan of Medium found that over 50% of newly written Medium articles were AI-generated about a year and a half prior to recording.
  • One incentive for AI posting on Reddit is paid 'organic mention' campaigns where bot-like accounts blend in with normal replies and occasionally insert brand recommendations.

Deployment Risk: False Positives, Governance, And Policy Boundary Disputes

  • The boundary of unacceptable AI assistance is philosophically unclear because tools like spellcheck or AI copy-editing may be treated differently despite similar functional roles.
  • Perplexity-based detectors can produce false positives for non-native English writers.
  • If black-box AI-detection models falsely label human writing as AI-generated and are treated as authoritative judges, they can create reputational and career risk.
  • Actively trying to identify whether everyday incoming writing is AI-generated can impose a large ongoing cognitive burden on journalists.
  • AI-writing concerns are expected to be especially acute in education and legal work where authorship and accountability matter.

Generator–Detector Arms Race And Evasion Pathways

  • Max Spiro believes detector evasion could become practical by optimizing simultaneously for detector 'human' scores and a separate LLM-based coherency judge, and views this as a valuable red-team target.
  • In Jill Weisenthal's initial tests, Pangram correctly classified her own writing versus AI outputs and still flagged AI after multiple language translations intended to obfuscate style.
  • A black-box evasion attempt that iteratively searched for prompts scoring as 'human' on Pangram succeeded only by producing largely incoherent or grammatically incorrect text.
  • An adversary could attempt to evade detection by iteratively prompting an LLM to jointly optimize for a detector 'human' score and an LLM-based coherence judge.
  • As LLMs become more capable, detector models may need to become larger or more powerful to keep pace with more complex output distributions.

Provenance Vs Detection And The Hardware-Signing Approach

  • Extending AI-origin detection methods from text to video is constrained by the much higher cost of generating video at scale compared with text.
  • The C2PA initiative is working with hardware and phone makers to embed signatures that prove images and videos were captured by real devices rather than generated.
  • Pangram anticipates a future data-provenance problem as internet text increasingly contains AI content, and plans to rely more on pre-2023 corpora and on identifying trusted actors for contemporary human text.
  • Solving provenance or detection for images and possibly audio could be sufficient to generalize to video in a zero-shot manner.

Watchlist

  • Some books are beginning to include explicit disclaimers stating they were written only by humans with no AI used.
  • Max Spiro believes detector evasion could become practical by optimizing simultaneously for detector 'human' scores and a separate LLM-based coherency judge, and views this as a valuable red-team target.

Unknowns

  • What are Pangram's independently verified false-positive and false-negative rates across diverse domains (student writing, journalism, forums), languages, and demographics (including ESL)?
  • How robust is Pangram-style detection to systematic adversarial optimization that maintains coherence and style (e.g., joint optimization against detector score plus a coherence judge)?
  • How quickly does detector performance drift as 'clean' contemporary human corpora become harder to source due to AI contamination, and what mitigations work best?
  • What is the current, independently estimated share of AI-written content on the web and on major UGC platforms, and how sensitive are estimates to methodology?
  • Where will institutions draw and enforce the line between acceptable AI-assisted editing and unacceptable AI-generated drafting, and what evidence standards will they require?

Investor overlay

Read-throughs

  • Demand tailwind for AI text detection products and services as platforms and institutions try to suppress AI slop and manage authenticity disputes, creating workflow and governance needs beyond pure model performance.
  • Arms race dynamic could favor vendors that can continuously improve detectors via active learning and scaling capacity, but also raises ongoing cost and drift risk as clean human corpora become harder to source.
  • Shift from detection toward provenance approaches such as device level capture signatures could become a parallel or substitute authenticity route, especially where high stakes and false positives make probabilistic detection hard to govern.

What would confirm

  • Independent evaluations publish false positive and false negative rates across domains, languages, and demographics, and results are strong enough for high stakes workflows without unacceptable fairness failures such as ESL misclassification.
  • Demonstrated robustness against systematic adversarial optimization that targets both human scores and a coherence judge, with bounded quality loss and reasonable defender cost to maintain performance over time.
  • Clear institutional and platform policies define acceptable AI assisted editing versus drafting, with specified evidence standards that incorporate detectors or provenance tooling into routine enforcement at scale.

What would kill

  • Independent tests show high or unstable false positives, especially in education, journalism, or ESL cohorts, leading to governance backlash and limited willingness to rely on black box adjudication.
  • Practical evasion becomes reliable through joint optimization against detector scoring and coherence judging, reducing detector utility or forcing costs and friction that users and platforms will not accept.
  • Detector performance drifts quickly as human corpora are contaminated by AI text, and mitigations fail, shifting attention and budgets toward provenance solutions or identity enforced walled gardens instead of detection.

Sources