Hacker News Scraper - Stories, Comments & AI Digest
Pricing
Pay per usage
Hacker News Scraper - Stories, Comments & AI Digest
Scrape HN top stories, search, Ask HN, Show HN, job posts and user profiles via Algolia & Firebase APIs. Zero anti-bot, no API key. AI topic digest (themes, trends, TLDR) via 5 LLM providers. x402-ready. $0.001/item.
Pricing
Pay per usage
Rating
0.0
(0)
Developer
Nick
Actor stats
0
Bookmarked
2
Total users
1
Monthly active users
21 hours ago
Last modified
Categories
Share
Hacker News Scraper โ Top/Show/Ask + Velocity + AI Digest
๐งฉ Part of the harvestlab MCP suite โ 36 RAG-ready, AI-agent-payment-ready Apify actors covering ecommerce, social, travel, news, jobs, EU B2B, dev-tools, and government data. See the full suite โ
Track Hacker News stories at $0.001/story. Fetches top, new, best, show, ask, and jobs feeds via the official Hacker News Firebase API โ no API key, no rate limits, no anti-bot. Within-run velocity tracking + cross-run score-delta snapshot flags trending stories. AI executive digest (themes, trends, TLDR) via 5-provider LLM router. Webhook alerts on trending detection. pay-per-result ยท no-cookies ยท no-rental-tax.
Built for: developer-relations teams, VC scouts, technical founders, content writers, AI agent builders. Pairs with harvestlab/github-trending-scraper for a complete "developer intelligence" pipeline.
What this does
The Hacker News front page is the single highest-signal feed for early-stage tech, AI launches, and indie-dev culture. But polling it manually is tedious, and existing scrapers either hit Algolia's downstream API (lossy) or scrape the HTML (fragile). This actor uses the official Hacker News Firebase API โ the same backend the YC site uses โ which means:
- Zero rate limits (Firebase API is unmetered for read traffic)
- No API key required
- No anti-bot to bypass
- Full fidelity: scores, descendants, kids, raw item structure
- Six feeds in one input:
top,new,best,show,ask,jobs
Every story is normalized to a portfolio-standard shape (id, title, url, score, by, time, descendants, kids, text, trending). Optional comment fan-out fetches the top-N comments per story. Optional AI digest summarizes the top 10 stories using your choice of 5 LLM providers (OpenRouter, Anthropic, Google AI, OpenAI, or self-hosted Ollama).
Why this beats the alternatives
| Approach | Cost | Limits | AI digest | Trending detection |
|---|---|---|---|---|
| harvestlab/hacker-news-scraper | $0.001/story | None | Yes (5 LLMs) | Yes (within-run + cross-run score-delta) |
| HN Algolia API (free) | $0 | 10k/hr soft cap | No | No |
| Algolia HN third-party scrapers | $5-29/mo | Plan-tiered | No | No |
| Custom Firebase fetch | DIY | None | DIY | DIY |
| RSS aggregators | $5-15/mo | Limited fields | No | No |
The wedge: dual-layer velocity tracking in a pay-per-event scraper. Within-run velocity (trending) fires at โฅ30 pts/hr after the first 30 min; cross-run KV-store snapshot (trending_cross_run) fires when a story gains โฅ20 points since the last run โ together they catch both fast-rising newcomers and stories re-entering the front page. Algolia gives you a search index but no velocity signal โ you can't tell if a story is accelerating in real time. Add enableAiAnalysis=true for an AI executive digest (themes, trends, TLDR) via your choice of 5 LLM providers.
Use cases (8 personas)
- Developer-relations teams โ daily watch on
Show HNfor early adopters of competing tools. - VC scouts โ
Show HN+ AI digest = pre-seed deal flow before the post hits 100 points. - Indie founders โ track competitor launches and AI announcements.
- Content writers / Substackers โ auto-curated story shortlist for "this week in AI" newsletters.
- AI agent builders โ feed top-10 + comments into a research agent for trend analysis.
- Recruiters โ
jobstoriesfeed + KvK-style enrichment finds startups actively hiring. - Product managers โ competitive intel on AI tools shipping in your category.
- Researchers / academics โ longitudinal study of HN trending dynamics with reproducible velocity flag.
Inputs
| Field | Type | Default | Notes |
|---|---|---|---|
feed | enum | top | One of: top, new, best, show, ask, jobs |
maxStories | integer | 30 | Stories to fetch (1-500) |
includeComments | boolean | false | Adds ~$0.0005/comment |
maxCommentsPerStory | integer | 20 | Cap top-level comments (cost control) |
minPoints | integer | 0 | Skip stories below this score |
trackVelocity | boolean | true | Flag stories with >30 pts/hr (after 30min) |
alertWebhookUrl | string | โ | Slack/Zapier/n8n/Discord webhook URL |
enableAiAnalysis | boolean | false | AI executive digest (themes, trends, TLDR) |
llmProvider | enum | openrouter | AI provider: openrouter / anthropic / google / openai / ollama |
llmModel | string | โ | Model override (uses provider default if blank) |
openrouterApiKey / anthropicApiKey / googleApiKey / openaiApiKey / ollamaBaseUrl | string | โ | API key for chosen provider |
Outputs
One dataset item per story:
{"id": 39481275,"type": "story","title": "Show HN: Hacker News Scraper at $0.001/story","url": "https://apify.com/harvestlab/hacker-news-scraper","score": 142,"by": "harvestlab","time": 1714162800,"descendants": 38,"kids": [39481276, 39481289, ...],"trending": true,"score_delta": 42,"trending_cross_run": true,"comments": [{"id": 39481276, "by": "alice", "time": 1714163700, "text": "Nice work!"}]}
When enableAiAnalysis=true, an additional dataset item with report_type: "ai_digest" is appended containing: top_themes, notable_stories, tech_trends, community_mood, standout_discussions, and a tldr field suitable for newsletter intros. Charged only when the LLM returns parseable JSON.
Pricing
| Event | Price | When |
|---|---|---|
story-scraped | $0.001 | Per story successfully fetched |
comment-scraped | $0.0005 | Per comment when includeComments=true |
alert-dispatched | $0.002 | Per webhook POST on trending detection |
ai-analysis-completed | $0.05 | Per AI digest run |
Typical run cost: 30 top stories, no comments, no AI = $0.03. With comments: $0.33. With AI digest: $0.08. Daily watch (top + AI digest) โ $1/month.
vs. commercial alternatives: NewsAPI charges $449+/mo for commercial use, and custom RSS monitoring solutions require ongoing infrastructure. This actor uses pay-per-event with no subscription: $0.001/story and zero monthly fees.
Webhook payload schema
When a trending story fires, the webhook receives:
{"type": "trending_story","story": {"id": 39481275,"title": "...","url": "https://...","score": 142,"by": "alice"}}
Compatible with Slack incoming webhooks (formats text automatically when wrapped in a Slack-shaped payload via Zapier or n8n), Discord, and any HTTP endpoint accepting JSON. Failed dispatches are not charged.
Scheduling for daily watch
- Set
feed: "top"+enableAiAnalysis: true+alertWebhookUrl: "<your-slack-incoming-webhook>". - Schedule the actor to run hourly via Apify Scheduler.
- Two velocity signals work together:
trending(within-run โฅ30 pts/hr) andtrending_cross_run(score gained โฅ20 points vs. the previous run's KV snapshot). Both trigger webhook alerts. - Estimated monthly cost at hourly cadence: ~$1.50/month for top 30 + AI digest + Slack webhook.
This replaces a $5/month RSS aggregator with one that surfaces trending stories instead of all stories โ way less noise.
Pair with other harvestlab actors
harvestlab/github-trending-scraperโ pair HN's "what's launched today" with GitHub's "what's accelerating in stars" for a complete developer-intelligence pipeline.harvestlab/contact-extractorโ feed each new HNShow HNURL into the contact extractor to build aShow HN founderlead list.harvestlab/news-monitorโ combine HN front page with Google News for cross-source coverage of AI launches.harvestlab/google-search-scraperโ when an HN story hits #1, scrape the SERP for backlinks and downstream coverage.
Use with AI agents
hacker-news-scraper outputs HN stories + top-level comments + dual velocity signals (trending within-run + trending_cross_run score-delta across runs) as structured JSON from the official Hacker News Firebase API ($0.001/story, no key, no rate limits). Enable AI digest for a structured briefing: top themes, tech trends, notable stories, community mood, and a newsletter-ready TLDR. RAG-ready for developer-research agents and Show-HN deal-flow scouts.
LangChain โ mine_hacker_news tool:
from langchain.tools import Toolfrom apify_client import ApifyClientclient = ApifyClient("YOUR_APIFY_TOKEN")def mine_hacker_news(params: dict) -> list:run = client.actor("harvestlab/hacker-news-scraper").call(run_input={"feed": params.get("feed", "top"),"maxStories": params.get("maxStories", 30),"includeComments": params.get("fetchComments", False),"trackVelocity": True,})return list(client.dataset(run["defaultDatasetId"]).iterate_items())mine_hn_tool = Tool(name="mine_hacker_news",description="Fetch HN stories (top/new/best/show/ask/jobs) with within-run velocity 'trending' flags and optional comments via the official Firebase API.",func=mine_hacker_news,)# agent.invoke({"input": "What's trending on Show HN right now?"})
LangGraph โ node in a developer-research graph:
from langgraph.graph import StateGraphfrom apify_client import ApifyClientclient = ApifyClient("YOUR_APIFY_TOKEN")def hn_node(state: dict) -> dict:run = client.actor("harvestlab/hacker-news-scraper").call(run_input={"feed": "top","maxStories": 50,"includeComments": True,"maxCommentsPerStory": 20,"trackVelocity": True,})items = list(client.dataset(run["defaultDatasetId"]).iterate_items())keywords = [k.lower() for k in state.get("topic_keywords", [])]matched = [s for s in items if any(k in (s.get("title") or "").lower() for k in keywords)]return {**state, "hn_threads": matched, "trending": [s for s in matched if s.get("trending")]}graph = StateGraph(dict)graph.add_node("hn", hn_node)# wire into downstream sentiment-on-comments / summarizer / Slack-digest nodes
See Apify's actor-templates/js-langchain and js-langgraph-agent for full reference setups.
Compliance & legal
The Hacker News Firebase API is public, documented, and free (https://github.com/HackerNews/API). Y Combinator publishes it explicitly for third-party use. No ToS violation, no scraping of authenticated routes, no fingerprinting.
That said, users are responsible for:
- Respecting rate norms (don't fan-out 100 concurrent fetches per second; use
maxStories โค 500and reasonable cadence) - GDPR / CCPA compliance when storing usernames + comment text (HN comments contain PII per author choice)
- Attribution when redistributing data (link back to the HN item via
https://news.ycombinator.com/item?id=<id>)
Roadmap
- v0.3: Algolia downstream search fallback for historical queries (>14 days old).
- v0.3: User-profile fetching (
/user/<by>.json) for author rep + karma.
Contact
Built by harvestlab โ 25 monetized Apify Actors covering EU/US e-commerce, B2B intelligence, jobs/salary, government procurement, and developer tools. Bug reports + feature requests via the Apify Store issue tracker on this actor's listing page.