Llm-Driven Profiling Pipeline And Evaluation Caveats
Sources: 1 • Confidence: High • Updated: 2026-04-12 10:18
Key takeaways
- A described profiling workflow is to fetch roughly a user's last 1000 HN comments, copy them via a tool, and paste them into an LLM with the prompt "profile this user".
- The Algolia Hacker News API can list a user's most recent comments sorted by date using tags formatted as "comment,author_<username>" with hitsPerPage up to 1000.
- The author finds it creepy that substantial information about someone can be derived easily from publicly shared content available via an API.
- The author expects the model inferred his real name because his HN comments frequently link to his own website, providing URLs that connect the account to a public persona.
- The author mainly uses generated profiles to avoid getting drawn into extended arguments with people who have a history of bad-faith debate.
Sections
Llm-Driven Profiling Pipeline And Evaluation Caveats
- A described profiling workflow is to fetch roughly a user's last 1000 HN comments, copy them via a tool, and paste them into an LLM with the prompt "profile this user".
- The author reports that LLM profiling based on a user's recent HN comments can be startlingly effective.
- The author runs the profiling prompt in incognito mode as an attempt to reduce the chance the model recognizes him and responds with overly flattering output.
Low-Friction Access To Per-User Public Comment History
- The Algolia Hacker News API can list a user's most recent comments sorted by date using tags formatted as "comment,author_<username>" with hitsPerPage up to 1000.
- The Algolia Hacker News API is served with open CORS headers, allowing it to be called from JavaScript on any web page.
Privacy, Publication Norms, And Governance Pressure From Inference (Not Collection)
- The author finds it creepy that substantial information about someone can be derived easily from publicly shared content available via an API.
- The author considers it invasive to quote profiles generated about other users and therefore only shares an example profile produced about himself.
Potential Deanonymization Via Self-Linkage And Observed Limits
- The author expects the model inferred his real name because his HN comments frequently link to his own website, providing URLs that connect the account to a public persona.
- The author reports not having seen profiling outputs guess real names for other accounts he has profiled.
Defensive Reputation Assessment As An Individual Moderation Behavior
- The author mainly uses generated profiles to avoid getting drawn into extended arguments with people who have a history of bad-faith debate.
Watchlist
- The author finds it creepy that substantial information about someone can be derived easily from publicly shared content available via an API.
Unknowns
- What are the practical API constraints (rate limits, pagination behavior beyond 1000, retention window, availability guarantees) for at-scale per-user comment retrieval?
- How accurate and consistent are LLM-generated profiles when evaluated against consenting ground truth across many users and multiple models/prompts?
- How large is the effect of subject recognition and account/context leakage (including sycophancy) on profiling outputs?
- What is the true frequency of real-name or identity guesses, and how strongly is it correlated with self-linking behavior (links to personal websites, handles, etc.)?
- What norms, policies, or technical restrictions (platform-level or tool-level) will emerge regarding republishing or operationalizing profiles derived from public forum text?