Hacker News Scraper - Stories, Comments & AI Digest avatar

Hacker News Scraper - Stories, Comments & AI Digest

Pricing

from $1.00 / 1,000 item scrapeds

Go to Apify Store
Hacker News Scraper - Stories, Comments & AI Digest

Hacker News Scraper - Stories, Comments & AI Digest

Scrape Hacker News top stories, search results, Ask HN, Show HN, jobs, comments, and user profiles for developer relations, market research, tech scouting, AI digests, and MCP connector summaries.

Pricing

from $1.00 / 1,000 item scrapeds

Rating

0.0

(0)

Developer

Nick

Nick

Maintained by Community

Actor stats

0

Bookmarked

5

Total users

2

Monthly active users

13 hours ago

Last modified

Share

Scrape Hacker News at $0.001/item. Top stories, full-text search, Ask HN, Show HN, job posts, and user profiles via the official Algolia HN Search API and Firebase HN API. Zero anti-bot, no account or API key required. AI topic digest (themes, trends, hot topics, TLDR) via 5-provider LLM router. pay-per-result - no cookies - no rental tax.

Built for: developer-relations teams, VC scouts, technical founders, AI agent builders, content writers, and researchers. Pairs with harvestlab/github-trending-scraper for a complete "developer intelligence" pipeline.


What this does

Hacker News is the highest-signal feed for early-stage tech, AI launches, and indie-dev culture. This actor exposes six scraping modes through two official, freely available APIs:

  • Algolia HN Search API (hn.algolia.com/api/v1) for full-text search and user profiles - no API key, paginated to any depth.
  • Firebase HN API (hacker-news.firebaseio.com/v0) for live feeds (top, ask, show, jobs) - the same backend the official HN site uses.

Every item is normalized to a portfolio-standard shape: id, title, url, text, author, points, commentCount, type, createdAt, hackerNewsUrl. The optional AI analysis step summarizes the batch using your choice of 5 LLM providers.

Modes

ModeAPI usedWhat you get
topFirebaseToday's front-page top stories
searchAlgoliaFull-text search results (stories, comments, Ask HN, etc.)
askFirebaseAsk HN posts
showFirebaseShow HN posts
jobsFirebaseJob posts (Who's Hiring threads)
userAlgoliaUser profile + recent submissions

Why this beats the alternatives

ApproachCostSearchAI digestLive feeds
harvestlab/hacker-news-scraper$0.001/itemYes (Algolia)Yes (5 LLMs)Yes (Firebase)
HN Algolia API direct$0YesNoNo
HN RSS feeds$0NoNoLimited
Third-party HN services$5-29/moLimitedNoLimited
Custom Firebase + Algolia DIYEngineering costDIYDIYDIY

The wedge: search + live feeds + AI analysis in one pay-per-event actor. The Algolia API gives you full-text search and user profiles; Firebase gives you real-time feed data with full score/comment counts. Add enableAiAnalysis=true for a structured AI digest covering themes, trends, hot topics, and a newsletter-ready TLDR.

Use cases

  1. Developer-relations teams - monitor show feed for competitor launches and early adopters.
  2. VC scouts - search for specific technology keywords + AI digest = pre-seed deal flow before the post hits 100 points.
  3. Indie founders - track top + ask for validation signals and competitive intel.
  4. Content writers / Substackers - auto-curated story shortlist for "this week in AI/dev" newsletters.
  5. AI agent builders - feed top-10 items + AI analysis into a research agent for trend detection.
  6. Recruiters - jobs mode to find startups actively hiring in your tech stack.
  7. Product managers - competitive intelligence on AI tools shipping in your category.
  8. Researchers / academics - longitudinal study of HN content trends with reproducible queries.

Inputs

FieldTypeDefaultNotes
modeenumtopOne of: top, search, ask, show, jobs, user
searchQuerystring-Required when mode=search. Aliases: q, query
usernamestring-Required when mode=user. Alias: user
searchTagsstringstoryAlgolia tags filter for search mode (e.g., comment, ask_hn)
maxItemsinteger30Items to fetch (1-500)
minPointsinteger0Skip stories below this score (feed modes only)
outputConnectorsarray-Optional MCP connectors for feed or search summaries. Sends a compact message plus structured payload to authorized Slack, Notion, GitHub, Sheets, CRM, or other connector tools.
connectorAlertTargetstring-Optional connector destination, such as a Slack channel, Notion database ID, sheet name, table, or CRM list.
enableAiAnalysisbooleanfalseAI digest (themes, trends, TLDR). Adds ~$0.05/run
llmProviderenumopenrouterAI provider: openrouter / anthropic / google / openai / ollama
llmModelstring-Model override (uses provider default if blank)
openrouterApiKeystring-OpenRouter key (or OPENROUTER_API_KEY env var)
anthropicApiKeystring-Anthropic key (or ANTHROPIC_API_KEY env var)
googleApiKeystring-Google AI key (or GOOGLE_API_KEY env var)
openaiApiKeystring-OpenAI key (or OPENAI_API_KEY env var)
ollamaBaseUrlstringhttp://localhost:11434Self-hosted Ollama URL

Input aliases

To reduce friction for programmatic use, several fields accept shorter aliases:

  • q or query -> searchQuery
  • user -> username
  • feed -> mode

Alias resolution order (lesson #19): alias is checked before canonical, so q=rust overrides a schema default on searchQuery.

Outputs

Standard item (all modes except user)

{
"id": "39481275",
"title": "Show HN: My open-source HN scraper",
"url": "https://github.com/example/hn-scraper",
"text": null,
"author": "alice",
"points": 142,
"commentCount": 38,
"type": "story",
"createdAt": "2024-04-27T09:00:00+00:00",
"hackerNewsUrl": "https://news.ycombinator.com/item?id=39481275"
}

User item (mode=user)

{
"id": "pg",
"title": "User: pg",
"url": "https://news.ycombinator.com/user?id=pg",
"text": "Lisp hacker. Co-founder of YC.",
"author": "pg",
"points": 155000,
"commentCount": 0,
"type": "user",
"createdAt": "2006-10-09T00:00:00+00:00",
"hackerNewsUrl": "https://news.ycombinator.com/user?id=pg",
"submissions": [
{
"id": "39481275",
"title": "...",
"url": "...",
"points": 142,
"commentCount": 38,
"createdAt": "2024-04-27T09:00:00+00:00"
}
]
}

AI analysis item (appended when enableAiAnalysis=true)

{
"report_type": "ai_analysis",
"analysis": {
"top_themes": [
{"label": "AI/LLM tooling", "description": "Multiple new open-source LLM inference projects launched."}
],
"notable_stories": [
{"title": "...", "points": 420, "comments": 120, "domain": "github.com", "why_it_matters": "..."}
],
"tech_trends": [
{"name": "Rust", "context": "Three high-upvote stories about Rust systems projects."}
],
"community_mood": {"tone": "excited", "explanation": "Several major AI launches drove high engagement."},
"hot_topics": [
{"title": "...", "note": "High comment-to-points ratio indicates controversy."}
],
"tldr": "Today's HN front page was dominated by AI infrastructure launches and Rust systems projects, with an undercurrent of privacy skepticism in the comments."
}
}

Pricing

EventPriceWhen
item-scraped$0.001Per item successfully scraped (story, user, comment, etc.)
connector-alert-dispatched$0.002Per successful feed or search summary delivered through an MCP output connector
ai-analysis-completed$0.05Per AI analysis digest (once per run)

Typical run costs:

  • 30 top stories, no AI: $0.03
  • 30 top stories + AI digest: $0.08
  • 100 search results: $0.10
  • 500 top stories + AI: $0.55
  • Daily watch (top 30 + AI digest): ~$2.40/month

vs. commercial alternatives: NewsAPI charges $449+/month for commercial use. This actor is pay-per-event with zero subscription fees.

AI Analysis (5 providers)

When enableAiAnalysis=true, the actor sends the scraped items to your chosen LLM and returns a structured digest with:

  • top_themes - 3-6 dominant technology or industry themes
  • notable_stories - top 5 by combined points + comments signal
  • tech_trends - specific technologies, frameworks, or companies appearing prominently
  • community_mood - overall tone (excited/skeptical/mixed/critical/optimistic)
  • hot_topics - stories with high comment-to-points ratio (controversy signals)
  • tldr - 2-3 sentence morning-briefing summary
ProviderDefault modelAPI key env varInput field
OpenRouter (recommended)google/gemini-2.0-flash-001OPENROUTER_API_KEYopenrouterApiKey
Anthropicclaude-sonnet-4-20250514ANTHROPIC_API_KEYanthropicApiKey
Google AIgemini-2.0-flashGOOGLE_API_KEYgoogleApiKey
OpenAIgpt-4o-miniOPENAI_API_KEYopenaiApiKey
Ollama (self-hosted)llama3.1-ollamaBaseUrl

Code examples

Python - search for AI stories

from apify_client import ApifyClient
client = ApifyClient("YOUR_APIFY_TOKEN")
run = client.actor("harvestlab/hacker-news-scraper").call(run_input={
"mode": "search",
"searchQuery": "large language models",
"searchTags": "story",
"maxItems": 50,
})
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
print(item["title"], item["points"], item["url"])

Python - top stories with AI digest

run = client.actor("harvestlab/hacker-news-scraper").call(run_input={
"mode": "top",
"maxItems": 30,
"enableAiAnalysis": True,
"llmProvider": "openrouter",
"openrouterApiKey": "sk-or-...",
})
items = list(client.dataset(run["defaultDatasetId"]).iterate_items())
stories = [i for i in items if i.get("type") != "ai_analysis"]
digest = next((i for i in items if i.get("report_type") == "ai_analysis"), None)
print(digest["analysis"]["tldr"] if digest else "No digest")

LangChain tool integration

from langchain.tools import Tool
from apify_client import ApifyClient
apify = ApifyClient("YOUR_APIFY_TOKEN")
def search_hacker_news(query: str) -> list:
run = apify.actor("harvestlab/hacker-news-scraper").call(run_input={
"mode": "search",
"q": query,
"maxItems": 20,
})
return list(apify.dataset(run["defaultDatasetId"]).iterate_items())
hn_search_tool = Tool(
name="search_hacker_news",
description="Search Hacker News for stories, comments, and discussions about any topic.",
func=search_hacker_news,
)
# agent.invoke({"input": "What are people saying about Rust on HN?"})

n8n / Zapier workflow

Configure a scheduled trigger calling this actor with mode=top, maxItems=30, enableAiAnalysis=true. The tldr field from the AI analysis item is ready to paste directly into a Slack message or email digest.

To deliver the daily watch summary without a separate webhook workflow, select a Slack, Notion, GitHub, Sheets, or CRM MCP connector in outputConnectors and set connectorAlertTarget to the destination channel, page, table, or list. The actor will push a compact top-items summary plus structured payload after each successful scrape.

Scheduling for daily watch

  1. Set mode: "top" + enableAiAnalysis: true + preferred LLM provider.
  2. Schedule via Apify Scheduler (hourly or daily).
  3. Estimated monthly cost at daily cadence: ~$2.40/month for top 30 + AI digest.
  4. Estimated monthly cost at hourly cadence: ~$57/month for top 30 + AI digest every hour.

Pair with other harvestlab actors

  • harvestlab/github-trending-scraper - pair HN's "what's launched today" with GitHub's "what's accelerating in stars" for complete developer intelligence.
  • harvestlab/news-monitor - combine HN front page with Google News for cross-source coverage of AI launches.
  • harvestlab/google-search-scraper - when an HN story hits #1, scrape the SERP for backlinks and downstream coverage.
  • harvestlab/contact-extractor - feed each new Show HN URL into the contact extractor to build a founder lead list.

The Hacker News Firebase API is public, documented, and maintained by Y Combinator (https://github.com/HackerNews/API). The Algolia HN Search API is officially provided by Algolia under agreement with YC (https://hn.algolia.com/api). Both are intended for third-party use with no terms of service restrictions on read access.

Users are responsible for:

  • Respecting rate norms: avoid aggressive concurrent fetching beyond maxItems=500 per run.
  • GDPR / CCPA compliance when storing usernames, karma, or comment text (HN user data is public but may constitute PII under some jurisdictions).
  • Attribution when redistributing data: link back to https://news.ycombinator.com/item?id=<id> per fair-use norms.
  • Not using the data for spam, harassment, or other violations of YC/HN community guidelines.

This actor does not scrape authenticated routes, bypass any security measures, or violate any robots.txt directives.

Contact

Built by harvestlab - 25+ monetized Apify Actors covering EU/US e-commerce, B2B intelligence, jobs, government procurement, social, and developer tools. Bug reports and feature requests via the Apify Store issue tracker on this actor's listing page.