Hacker News Scraper - Stories, Comments & AI Digest avatar

Hacker News Scraper - Stories, Comments & AI Digest

Pricing

from $1.00 / 1,000 item scrapeds

Go to Apify Store
Hacker News Scraper - Stories, Comments & AI Digest

Hacker News Scraper - Stories, Comments & AI Digest

Scrape HN top stories, search, Ask HN, Show HN, job posts and user profiles via Algolia & Firebase APIs. Zero anti-bot, no API key. AI topic digest (themes, trends, TLDR) via 5 LLM providers. x402-ready. $0.001/item.

Pricing

from $1.00 / 1,000 item scrapeds

Rating

0.0

(0)

Developer

Nick

Nick

Maintained by Community

Actor stats

0

Bookmarked

4

Total users

2

Monthly active users

2 days ago

Last modified

Share

Scrape Hacker News at $0.001/item. Top stories, full-text search, Ask HN, Show HN, job posts, and user profiles via the official Algolia HN Search API and Firebase HN API. Zero anti-bot, no account or API key required. AI topic digest (themes, trends, hot topics, TLDR) via 5-provider LLM router. pay-per-result - no cookies - no rental tax.

Built for: developer-relations teams, VC scouts, technical founders, AI agent builders, content writers, and researchers. Pairs with harvestlab/github-trending-scraper for a complete "developer intelligence" pipeline.


What this does

Hacker News is the highest-signal feed for early-stage tech, AI launches, and indie-dev culture. This actor exposes six scraping modes through two official, freely available APIs:

  • Algolia HN Search API (hn.algolia.com/api/v1) for full-text search and user profiles - no API key, paginated to any depth.
  • Firebase HN API (hacker-news.firebaseio.com/v0) for live feeds (top, ask, show, jobs) - the same backend the official HN site uses.

Every item is normalized to a portfolio-standard shape: id, title, url, text, author, points, commentCount, type, createdAt, hackerNewsUrl. The optional AI analysis step summarizes the batch using your choice of 5 LLM providers.

Modes

ModeAPI usedWhat you get
topFirebaseToday's front-page top stories
searchAlgoliaFull-text search results (stories, comments, Ask HN, etc.)
askFirebaseAsk HN posts
showFirebaseShow HN posts
jobsFirebaseJob posts (Who's Hiring threads)
userAlgoliaUser profile + recent submissions

Why this beats the alternatives

ApproachCostSearchAI digestLive feeds
harvestlab/hacker-news-scraper$0.001/itemYes (Algolia)Yes (5 LLMs)Yes (Firebase)
HN Algolia API direct$0YesNoNo
HN RSS feeds$0NoNoLimited
Third-party HN services$5-29/moLimitedNoLimited
Custom Firebase + Algolia DIYEngineering costDIYDIYDIY

The wedge: search + live feeds + AI analysis in one pay-per-event actor. The Algolia API gives you full-text search and user profiles; Firebase gives you real-time feed data with full score/comment counts. Add enableAiAnalysis=true for a structured AI digest covering themes, trends, hot topics, and a newsletter-ready TLDR.

Use cases

  1. Developer-relations teams - monitor show feed for competitor launches and early adopters.
  2. VC scouts - search for specific technology keywords + AI digest = pre-seed deal flow before the post hits 100 points.
  3. Indie founders - track top + ask for validation signals and competitive intel.
  4. Content writers / Substackers - auto-curated story shortlist for "this week in AI/dev" newsletters.
  5. AI agent builders - feed top-10 items + AI analysis into a research agent for trend detection.
  6. Recruiters - jobs mode to find startups actively hiring in your tech stack.
  7. Product managers - competitive intelligence on AI tools shipping in your category.
  8. Researchers / academics - longitudinal study of HN content trends with reproducible queries.

Inputs

FieldTypeDefaultNotes
modeenumtopOne of: top, search, ask, show, jobs, user
searchQuerystring-Required when mode=search. Aliases: q, query
usernamestring-Required when mode=user. Alias: user
searchTagsstringstoryAlgolia tags filter for search mode (e.g., comment, ask_hn)
maxItemsinteger30Items to fetch (1-500)
minPointsinteger0Skip stories below this score (feed modes only)
enableAiAnalysisbooleanfalseAI digest (themes, trends, TLDR). Adds ~$0.05/run
llmProviderenumopenrouterAI provider: openrouter / anthropic / google / openai / ollama
llmModelstring-Model override (uses provider default if blank)
openrouterApiKeystring-OpenRouter key (or OPENROUTER_API_KEY env var)
anthropicApiKeystring-Anthropic key (or ANTHROPIC_API_KEY env var)
googleApiKeystring-Google AI key (or GOOGLE_API_KEY env var)
openaiApiKeystring-OpenAI key (or OPENAI_API_KEY env var)
ollamaBaseUrlstringhttp://localhost:11434Self-hosted Ollama URL

Input aliases

To reduce friction for programmatic use, several fields accept shorter aliases:

  • q or query -> searchQuery
  • user -> username
  • feed -> mode

Alias resolution order (lesson #19): alias is checked before canonical, so q=rust overrides a schema default on searchQuery.

Outputs

Standard item (all modes except user)

{
"id": "39481275",
"title": "Show HN: My open-source HN scraper",
"url": "https://github.com/example/hn-scraper",
"text": null,
"author": "alice",
"points": 142,
"commentCount": 38,
"type": "story",
"createdAt": "2024-04-27T09:00:00+00:00",
"hackerNewsUrl": "https://news.ycombinator.com/item?id=39481275"
}

User item (mode=user)

{
"id": "pg",
"title": "User: pg",
"url": "https://news.ycombinator.com/user?id=pg",
"text": "Lisp hacker. Co-founder of YC.",
"author": "pg",
"points": 155000,
"commentCount": 0,
"type": "user",
"createdAt": "2006-10-09T00:00:00+00:00",
"hackerNewsUrl": "https://news.ycombinator.com/user?id=pg",
"submissions": [
{
"id": "39481275",
"title": "...",
"url": "...",
"points": 142,
"commentCount": 38,
"createdAt": "2024-04-27T09:00:00+00:00"
}
]
}

AI analysis item (appended when enableAiAnalysis=true)

{
"report_type": "ai_analysis",
"analysis": {
"top_themes": [
{"label": "AI/LLM tooling", "description": "Multiple new open-source LLM inference projects launched."}
],
"notable_stories": [
{"title": "...", "points": 420, "comments": 120, "domain": "github.com", "why_it_matters": "..."}
],
"tech_trends": [
{"name": "Rust", "context": "Three high-upvote stories about Rust systems projects."}
],
"community_mood": {"tone": "excited", "explanation": "Several major AI launches drove high engagement."},
"hot_topics": [
{"title": "...", "note": "High comment-to-points ratio indicates controversy."}
],
"tldr": "Today's HN front page was dominated by AI infrastructure launches and Rust systems projects, with an undercurrent of privacy skepticism in the comments."
}
}

Pricing

EventPriceWhen
item-scraped$0.001Per item successfully scraped (story, user, comment, etc.)
ai-analysis-completed$0.05Per AI analysis digest (once per run)

Typical run costs:

  • 30 top stories, no AI: $0.03
  • 30 top stories + AI digest: $0.08
  • 100 search results: $0.10
  • 500 top stories + AI: $0.55
  • Daily watch (top 30 + AI digest): ~$2.40/month

vs. commercial alternatives: NewsAPI charges $449+/month for commercial use. This actor is pay-per-event with zero subscription fees.

AI Analysis (5 providers)

When enableAiAnalysis=true, the actor sends the scraped items to your chosen LLM and returns a structured digest with:

  • top_themes - 3-6 dominant technology or industry themes
  • notable_stories - top 5 by combined points + comments signal
  • tech_trends - specific technologies, frameworks, or companies appearing prominently
  • community_mood - overall tone (excited/skeptical/mixed/critical/optimistic)
  • hot_topics - stories with high comment-to-points ratio (controversy signals)
  • tldr - 2-3 sentence morning-briefing summary
ProviderDefault modelAPI key env varInput field
OpenRouter (recommended)google/gemini-2.0-flash-001OPENROUTER_API_KEYopenrouterApiKey
Anthropicclaude-sonnet-4-20250514ANTHROPIC_API_KEYanthropicApiKey
Google AIgemini-2.0-flashGOOGLE_API_KEYgoogleApiKey
OpenAIgpt-4o-miniOPENAI_API_KEYopenaiApiKey
Ollama (self-hosted)llama3.1-ollamaBaseUrl

Code examples

Python - search for AI stories

from apify_client import ApifyClient
client = ApifyClient("YOUR_APIFY_TOKEN")
run = client.actor("harvestlab/hacker-news-scraper").call(run_input={
"mode": "search",
"searchQuery": "large language models",
"searchTags": "story",
"maxItems": 50,
})
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
print(item["title"], item["points"], item["url"])

Python - top stories with AI digest

run = client.actor("harvestlab/hacker-news-scraper").call(run_input={
"mode": "top",
"maxItems": 30,
"enableAiAnalysis": True,
"llmProvider": "openrouter",
"openrouterApiKey": "sk-or-...",
})
items = list(client.dataset(run["defaultDatasetId"]).iterate_items())
stories = [i for i in items if i.get("type") != "ai_analysis"]
digest = next((i for i in items if i.get("report_type") == "ai_analysis"), None)
print(digest["analysis"]["tldr"] if digest else "No digest")

LangChain tool integration

from langchain.tools import Tool
from apify_client import ApifyClient
apify = ApifyClient("YOUR_APIFY_TOKEN")
def search_hacker_news(query: str) -> list:
run = apify.actor("harvestlab/hacker-news-scraper").call(run_input={
"mode": "search",
"q": query,
"maxItems": 20,
})
return list(apify.dataset(run["defaultDatasetId"]).iterate_items())
hn_search_tool = Tool(
name="search_hacker_news",
description="Search Hacker News for stories, comments, and discussions about any topic.",
func=search_hacker_news,
)
# agent.invoke({"input": "What are people saying about Rust on HN?"})

n8n / Zapier workflow

Configure a scheduled trigger calling this actor with mode=top, maxItems=30, enableAiAnalysis=true. The tldr field from the AI analysis item is ready to paste directly into a Slack message or email digest.

Scheduling for daily watch

  1. Set mode: "top" + enableAiAnalysis: true + preferred LLM provider.
  2. Schedule via Apify Scheduler (hourly or daily).
  3. Estimated monthly cost at daily cadence: ~$2.40/month for top 30 + AI digest.
  4. Estimated monthly cost at hourly cadence: ~$57/month for top 30 + AI digest every hour.

Pair with other harvestlab actors

  • harvestlab/github-trending-scraper - pair HN's "what's launched today" with GitHub's "what's accelerating in stars" for complete developer intelligence.
  • harvestlab/news-monitor - combine HN front page with Google News for cross-source coverage of AI launches.
  • harvestlab/google-search-scraper - when an HN story hits #1, scrape the SERP for backlinks and downstream coverage.
  • harvestlab/contact-extractor - feed each new Show HN URL into the contact extractor to build a founder lead list.

The Hacker News Firebase API is public, documented, and maintained by Y Combinator (https://github.com/HackerNews/API). The Algolia HN Search API is officially provided by Algolia under agreement with YC (https://hn.algolia.com/api). Both are intended for third-party use with no terms of service restrictions on read access.

Users are responsible for:

  • Respecting rate norms: avoid aggressive concurrent fetching beyond maxItems=500 per run.
  • GDPR / CCPA compliance when storing usernames, karma, or comment text (HN user data is public but may constitute PII under some jurisdictions).
  • Attribution when redistributing data: link back to https://news.ycombinator.com/item?id=<id> per fair-use norms.
  • Not using the data for spam, harassment, or other violations of YC/HN community guidelines.

This actor does not scrape authenticated routes, bypass any security measures, or violate any robots.txt directives.

Contact

Built by harvestlab - 25+ monetized Apify Actors covering EU/US e-commerce, B2B intelligence, jobs, government procurement, social, and developer tools. Bug reports and feature requests via the Apify Store issue tracker on this actor's listing page.