Hacker News Scraper - Stories, Comments & AI Digest avatar

Hacker News Scraper - Stories, Comments & AI Digest

Pricing

Pay per usage

Go to Apify Store
Hacker News Scraper - Stories, Comments & AI Digest

Hacker News Scraper - Stories, Comments & AI Digest

Scrape HN top stories, search, Ask HN, Show HN, job posts and user profiles via Algolia & Firebase APIs. Zero anti-bot, no API key. AI topic digest (themes, trends, TLDR) via 5 LLM providers. x402-ready. $0.001/item.

Pricing

Pay per usage

Rating

0.0

(0)

Developer

Nick

Nick

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

21 hours ago

Last modified

Share

Hacker News Scraper โ€” Top/Show/Ask + Velocity + AI Digest

๐Ÿงฉ Part of the harvestlab MCP suite โ€” 36 RAG-ready, AI-agent-payment-ready Apify actors covering ecommerce, social, travel, news, jobs, EU B2B, dev-tools, and government data. See the full suite โ†’

Track Hacker News stories at $0.001/story. Fetches top, new, best, show, ask, and jobs feeds via the official Hacker News Firebase API โ€” no API key, no rate limits, no anti-bot. Within-run velocity tracking + cross-run score-delta snapshot flags trending stories. AI executive digest (themes, trends, TLDR) via 5-provider LLM router. Webhook alerts on trending detection. pay-per-result ยท no-cookies ยท no-rental-tax.

Built for: developer-relations teams, VC scouts, technical founders, content writers, AI agent builders. Pairs with harvestlab/github-trending-scraper for a complete "developer intelligence" pipeline.


What this does

The Hacker News front page is the single highest-signal feed for early-stage tech, AI launches, and indie-dev culture. But polling it manually is tedious, and existing scrapers either hit Algolia's downstream API (lossy) or scrape the HTML (fragile). This actor uses the official Hacker News Firebase API โ€” the same backend the YC site uses โ€” which means:

  • Zero rate limits (Firebase API is unmetered for read traffic)
  • No API key required
  • No anti-bot to bypass
  • Full fidelity: scores, descendants, kids, raw item structure
  • Six feeds in one input: top, new, best, show, ask, jobs

Every story is normalized to a portfolio-standard shape (id, title, url, score, by, time, descendants, kids, text, trending). Optional comment fan-out fetches the top-N comments per story. Optional AI digest summarizes the top 10 stories using your choice of 5 LLM providers (OpenRouter, Anthropic, Google AI, OpenAI, or self-hosted Ollama).

Why this beats the alternatives

ApproachCostLimitsAI digestTrending detection
harvestlab/hacker-news-scraper$0.001/storyNoneYes (5 LLMs)Yes (within-run + cross-run score-delta)
HN Algolia API (free)$010k/hr soft capNoNo
Algolia HN third-party scrapers$5-29/moPlan-tieredNoNo
Custom Firebase fetchDIYNoneDIYDIY
RSS aggregators$5-15/moLimited fieldsNoNo

The wedge: dual-layer velocity tracking in a pay-per-event scraper. Within-run velocity (trending) fires at โ‰ฅ30 pts/hr after the first 30 min; cross-run KV-store snapshot (trending_cross_run) fires when a story gains โ‰ฅ20 points since the last run โ€” together they catch both fast-rising newcomers and stories re-entering the front page. Algolia gives you a search index but no velocity signal โ€” you can't tell if a story is accelerating in real time. Add enableAiAnalysis=true for an AI executive digest (themes, trends, TLDR) via your choice of 5 LLM providers.

Use cases (8 personas)

  1. Developer-relations teams โ€” daily watch on Show HN for early adopters of competing tools.
  2. VC scouts โ€” Show HN + AI digest = pre-seed deal flow before the post hits 100 points.
  3. Indie founders โ€” track competitor launches and AI announcements.
  4. Content writers / Substackers โ€” auto-curated story shortlist for "this week in AI" newsletters.
  5. AI agent builders โ€” feed top-10 + comments into a research agent for trend analysis.
  6. Recruiters โ€” jobstories feed + KvK-style enrichment finds startups actively hiring.
  7. Product managers โ€” competitive intel on AI tools shipping in your category.
  8. Researchers / academics โ€” longitudinal study of HN trending dynamics with reproducible velocity flag.

Inputs

FieldTypeDefaultNotes
feedenumtopOne of: top, new, best, show, ask, jobs
maxStoriesinteger30Stories to fetch (1-500)
includeCommentsbooleanfalseAdds ~$0.0005/comment
maxCommentsPerStoryinteger20Cap top-level comments (cost control)
minPointsinteger0Skip stories below this score
trackVelocitybooleantrueFlag stories with >30 pts/hr (after 30min)
alertWebhookUrlstringโ€”Slack/Zapier/n8n/Discord webhook URL
enableAiAnalysisbooleanfalseAI executive digest (themes, trends, TLDR)
llmProviderenumopenrouterAI provider: openrouter / anthropic / google / openai / ollama
llmModelstringโ€”Model override (uses provider default if blank)
openrouterApiKey / anthropicApiKey / googleApiKey / openaiApiKey / ollamaBaseUrlstringโ€”API key for chosen provider

Outputs

One dataset item per story:

{
"id": 39481275,
"type": "story",
"title": "Show HN: Hacker News Scraper at $0.001/story",
"url": "https://apify.com/harvestlab/hacker-news-scraper",
"score": 142,
"by": "harvestlab",
"time": 1714162800,
"descendants": 38,
"kids": [39481276, 39481289, ...],
"trending": true,
"score_delta": 42,
"trending_cross_run": true,
"comments": [
{"id": 39481276, "by": "alice", "time": 1714163700, "text": "Nice work!"}
]
}

When enableAiAnalysis=true, an additional dataset item with report_type: "ai_digest" is appended containing: top_themes, notable_stories, tech_trends, community_mood, standout_discussions, and a tldr field suitable for newsletter intros. Charged only when the LLM returns parseable JSON.

Pricing

EventPriceWhen
story-scraped$0.001Per story successfully fetched
comment-scraped$0.0005Per comment when includeComments=true
alert-dispatched$0.002Per webhook POST on trending detection
ai-analysis-completed$0.05Per AI digest run

Typical run cost: 30 top stories, no comments, no AI = $0.03. With comments: $0.33. With AI digest: $0.08. Daily watch (top + AI digest) โ‰ˆ $1/month.

vs. commercial alternatives: NewsAPI charges $449+/mo for commercial use, and custom RSS monitoring solutions require ongoing infrastructure. This actor uses pay-per-event with no subscription: $0.001/story and zero monthly fees.

Webhook payload schema

When a trending story fires, the webhook receives:

{
"type": "trending_story",
"story": {
"id": 39481275,
"title": "...",
"url": "https://...",
"score": 142,
"by": "alice"
}
}

Compatible with Slack incoming webhooks (formats text automatically when wrapped in a Slack-shaped payload via Zapier or n8n), Discord, and any HTTP endpoint accepting JSON. Failed dispatches are not charged.

Scheduling for daily watch

  1. Set feed: "top" + enableAiAnalysis: true + alertWebhookUrl: "<your-slack-incoming-webhook>".
  2. Schedule the actor to run hourly via Apify Scheduler.
  3. Two velocity signals work together: trending (within-run โ‰ฅ30 pts/hr) and trending_cross_run (score gained โ‰ฅ20 points vs. the previous run's KV snapshot). Both trigger webhook alerts.
  4. Estimated monthly cost at hourly cadence: ~$1.50/month for top 30 + AI digest + Slack webhook.

This replaces a $5/month RSS aggregator with one that surfaces trending stories instead of all stories โ€” way less noise.

Pair with other harvestlab actors

  • harvestlab/github-trending-scraper โ€” pair HN's "what's launched today" with GitHub's "what's accelerating in stars" for a complete developer-intelligence pipeline.
  • harvestlab/contact-extractor โ€” feed each new HN Show HN URL into the contact extractor to build a Show HN founder lead list.
  • harvestlab/news-monitor โ€” combine HN front page with Google News for cross-source coverage of AI launches.
  • harvestlab/google-search-scraper โ€” when an HN story hits #1, scrape the SERP for backlinks and downstream coverage.

Use with AI agents

hacker-news-scraper outputs HN stories + top-level comments + dual velocity signals (trending within-run + trending_cross_run score-delta across runs) as structured JSON from the official Hacker News Firebase API ($0.001/story, no key, no rate limits). Enable AI digest for a structured briefing: top themes, tech trends, notable stories, community mood, and a newsletter-ready TLDR. RAG-ready for developer-research agents and Show-HN deal-flow scouts.

LangChain โ€” mine_hacker_news tool:

from langchain.tools import Tool
from apify_client import ApifyClient
client = ApifyClient("YOUR_APIFY_TOKEN")
def mine_hacker_news(params: dict) -> list:
run = client.actor("harvestlab/hacker-news-scraper").call(run_input={
"feed": params.get("feed", "top"),
"maxStories": params.get("maxStories", 30),
"includeComments": params.get("fetchComments", False),
"trackVelocity": True,
})
return list(client.dataset(run["defaultDatasetId"]).iterate_items())
mine_hn_tool = Tool(
name="mine_hacker_news",
description="Fetch HN stories (top/new/best/show/ask/jobs) with within-run velocity 'trending' flags and optional comments via the official Firebase API.",
func=mine_hacker_news,
)
# agent.invoke({"input": "What's trending on Show HN right now?"})

LangGraph โ€” node in a developer-research graph:

from langgraph.graph import StateGraph
from apify_client import ApifyClient
client = ApifyClient("YOUR_APIFY_TOKEN")
def hn_node(state: dict) -> dict:
run = client.actor("harvestlab/hacker-news-scraper").call(run_input={
"feed": "top",
"maxStories": 50,
"includeComments": True,
"maxCommentsPerStory": 20,
"trackVelocity": True,
})
items = list(client.dataset(run["defaultDatasetId"]).iterate_items())
keywords = [k.lower() for k in state.get("topic_keywords", [])]
matched = [s for s in items if any(k in (s.get("title") or "").lower() for k in keywords)]
return {**state, "hn_threads": matched, "trending": [s for s in matched if s.get("trending")]}
graph = StateGraph(dict)
graph.add_node("hn", hn_node)
# wire into downstream sentiment-on-comments / summarizer / Slack-digest nodes

See Apify's actor-templates/js-langchain and js-langgraph-agent for full reference setups.

The Hacker News Firebase API is public, documented, and free (https://github.com/HackerNews/API). Y Combinator publishes it explicitly for third-party use. No ToS violation, no scraping of authenticated routes, no fingerprinting.

That said, users are responsible for:

  • Respecting rate norms (don't fan-out 100 concurrent fetches per second; use maxStories โ‰ค 500 and reasonable cadence)
  • GDPR / CCPA compliance when storing usernames + comment text (HN comments contain PII per author choice)
  • Attribution when redistributing data (link back to the HN item via https://news.ycombinator.com/item?id=<id>)

Roadmap

  • v0.3: Algolia downstream search fallback for historical queries (>14 days old).
  • v0.3: User-profile fetching (/user/<by>.json) for author rep + karma.

Contact

Built by harvestlab โ€” 25 monetized Apify Actors covering EU/US e-commerce, B2B intelligence, jobs/salary, government procurement, and developer tools. Bug reports + feature requests via the Apify Store issue tracker on this actor's listing page.