Operational Pipeline From Public Text To Llm-Produced Profiles
Sources: 1 • Confidence: High • Updated: 2026-03-25 17:54
Key takeaways
- A described profiling workflow is: fetch a user's last roughly 1000 HN comments via a purpose-built tool, copy them, and paste them into an LLM with the instruction to profile the user.
- The Algolia Hacker News API can list a user's most recent comments sorted by date by querying tags of the form "comment,author_<username>" with hitsPerPage up to 1000.
- The author states he finds it creepy that substantial information about someone can be derived easily from public content accessible via an API.
- The author hypothesizes that a model inferred his real name because his HN comments often link to his own website, providing URLs that could connect the username to a public persona.
- The author reports using generated profiles mainly to avoid being drawn into extended arguments with people he believes have a history of bad-faith debate.
Sections
Operational Pipeline From Public Text To Llm-Produced Profiles
- A described profiling workflow is: fetch a user's last roughly 1000 HN comments via a purpose-built tool, copy them, and paste them into an LLM with the instruction to profile the user.
- The author reports that LLM-generated profiles from recent HN comments can be startlingly effective.
- The author reports using incognito mode when running profiling prompts in an attempt to reduce the chance the model recognizes him and produces overly flattering responses.
Low-Friction Public Data Access For Per-User Comment History
- The Algolia Hacker News API can list a user's most recent comments sorted by date by querying tags of the form "comment,author_<username>" with hitsPerPage up to 1000.
- The Algolia HN API is served with open CORS headers, allowing cross-origin calls from JavaScript on arbitrary web pages.
Privacy, Ethics, And Publication Constraints For Llm-Based Profiling From Public Forums
- The author states he finds it creepy that substantial information about someone can be derived easily from public content accessible via an API.
- The author states he considers it invasive to publish LLM-generated profiles about other users and therefore only shares an example profile about himself.
Identity Linkage And De-Anonymization Uncertainty
- The author hypothesizes that a model inferred his real name because his HN comments often link to his own website, providing URLs that could connect the username to a public persona.
- The author reports not having seen LLM profiling outputs guess real names for other accounts he profiled.
Defensive Use-Case: Engagement Avoidance And Reputational Triage
- The author reports using generated profiles mainly to avoid being drawn into extended arguments with people he believes have a history of bad-faith debate.
Watchlist
- The author states he finds it creepy that substantial information about someone can be derived easily from public content accessible via an API.
Unknowns
- How accurate and reproducible are LLM-generated user profiles when evaluated against consenting users' ground truth, under blinded assessment and across multiple models/prompts?
- What are the practical scaling limits and failure modes of the Algolia HN API approach (rate limits, pagination behavior beyond 1000, completeness for high-volume users)?
- How stable are the permissive CORS headers over time, and are there policy or technical changes that could remove browser-based access?
- Under what conditions do models attempt explicit identity resolution (e.g., real-name guessing), and how strongly does self-linking behavior increase that risk?
- Does using incognito or otherwise removing account history materially change profiling outputs in content (not just tone), and which evaluation controls are needed to isolate the effect?