Inference Capability And Evaluation Contamination Risks
Sources: 1 • Confidence: High • Updated: 2026-04-13 03:52
Key takeaways
- The author reports that asking an LLM to profile a user based on recent Hacker News comments can produce results the author finds startlingly effective.
- The Algolia Hacker News API can list a user's most recent comments sorted by date using tags of the form "comment,author_<username>" with hitsPerPage up to 1000.
- The author states that it feels creepy that substantial information about someone can be derived easily from publicly shared content available via an API.
- A described profiling workflow is: fetch a user's last approximately 1000 Hacker News comments, copy them via a tool, and paste them into an LLM with the prompt "profile this user".
- The Algolia Hacker News API is served with open CORS headers that allow cross-origin calls from JavaScript on any web page.
Sections
Inference Capability And Evaluation Contamination Risks
- The author reports that asking an LLM to profile a user based on recent Hacker News comments can produce results the author finds startlingly effective.
- The author ran the profiling prompt in incognito mode to try to reduce the chance the model recognizes the author and responds with overly flattering sycophancy.
- The author expects the model inferred the author's real name because the author's Hacker News comments frequently link to the author's website, providing URLs that connect to a public persona.
- The author reports not having seen the profiling outputs guess real names for other accounts the author has profiled.
Low-Friction Data Access Enabling Browser-Native Profiling
- The Algolia Hacker News API can list a user's most recent comments sorted by date using tags of the form "comment,author_<username>" with hitsPerPage up to 1000.
- A described profiling workflow is: fetch a user's last approximately 1000 Hacker News comments, copy them via a tool, and paste them into an LLM with the prompt "profile this user".
- The Algolia Hacker News API is served with open CORS headers that allow cross-origin calls from JavaScript on any web page.
Ethical Constraints And Defensive Use-Case
- The author states that it feels creepy that substantial information about someone can be derived easily from publicly shared content available via an API.
- The author mainly uses generated profiles to avoid getting drawn into extended arguments with people who have a history of bad-faith debate.
- The author considers it invasive to quote profiles generated about other users and therefore only shares an example profile generated about himself.
Watchlist
- The author states that it feels creepy that substantial information about someone can be derived easily from publicly shared content available via an API.
Unknowns
- How accurate are LLM-generated profiles from ~1000 public comments when judged against consented, ground-truth self reports or known biographical facts?
- How sensitive are profile outputs to prompt wording, model choice, and account/session context (including personalization and recognition effects)?
- What is the true rate of identity inference or real-name guessing across a large, diverse sample of accounts, and how much is explained by self-linking behavior?
- What operational limits exist in practice (API rate limits, pagination completeness, account history depth beyond 1000 comments, and stability of the endpoint behavior over time)?
- Will platforms or API providers introduce policy, technical restrictions, or enforcement actions in response to automated profiling from public forum data?