Hacker News Scraper - Stories, Comments & AI Digest
Pricing
from $1.00 / 1,000 item scrapeds
Hacker News Scraper - Stories, Comments & AI Digest
Scrape Hacker News top stories, search results, Ask HN, Show HN, jobs, comments, and user profiles for developer relations, market research, tech scouting, AI digests, and MCP connector summaries.
Pricing
from $1.00 / 1,000 item scrapeds
Rating
0.0
(0)
Developer
Nick
Maintained by CommunityActor stats
0
Bookmarked
5
Total users
2
Monthly active users
13 hours ago
Last modified
Categories
Share
Scrape Hacker News at $0.001/item. Top stories, full-text search, Ask HN, Show HN, job posts, and user profiles via the official Algolia HN Search API and Firebase HN API. Zero anti-bot, no account or API key required. AI topic digest (themes, trends, hot topics, TLDR) via 5-provider LLM router. pay-per-result - no cookies - no rental tax.
Built for: developer-relations teams, VC scouts, technical founders, AI agent builders, content writers, and researchers. Pairs with harvestlab/github-trending-scraper for a complete "developer intelligence" pipeline.
What this does
Hacker News is the highest-signal feed for early-stage tech, AI launches, and indie-dev culture. This actor exposes six scraping modes through two official, freely available APIs:
- Algolia HN Search API (
hn.algolia.com/api/v1) for full-text search and user profiles - no API key, paginated to any depth. - Firebase HN API (
hacker-news.firebaseio.com/v0) for live feeds (top, ask, show, jobs) - the same backend the official HN site uses.
Every item is normalized to a portfolio-standard shape: id, title, url, text, author, points, commentCount, type, createdAt, hackerNewsUrl. The optional AI analysis step summarizes the batch using your choice of 5 LLM providers.
Modes
| Mode | API used | What you get |
|---|---|---|
top | Firebase | Today's front-page top stories |
search | Algolia | Full-text search results (stories, comments, Ask HN, etc.) |
ask | Firebase | Ask HN posts |
show | Firebase | Show HN posts |
jobs | Firebase | Job posts (Who's Hiring threads) |
user | Algolia | User profile + recent submissions |
Why this beats the alternatives
| Approach | Cost | Search | AI digest | Live feeds |
|---|---|---|---|---|
| harvestlab/hacker-news-scraper | $0.001/item | Yes (Algolia) | Yes (5 LLMs) | Yes (Firebase) |
| HN Algolia API direct | $0 | Yes | No | No |
| HN RSS feeds | $0 | No | No | Limited |
| Third-party HN services | $5-29/mo | Limited | No | Limited |
| Custom Firebase + Algolia DIY | Engineering cost | DIY | DIY | DIY |
The wedge: search + live feeds + AI analysis in one pay-per-event actor. The Algolia API gives you full-text search and user profiles; Firebase gives you real-time feed data with full score/comment counts. Add enableAiAnalysis=true for a structured AI digest covering themes, trends, hot topics, and a newsletter-ready TLDR.
Use cases
- Developer-relations teams - monitor
showfeed for competitor launches and early adopters. - VC scouts - search for specific technology keywords + AI digest = pre-seed deal flow before the post hits 100 points.
- Indie founders - track
top+askfor validation signals and competitive intel. - Content writers / Substackers - auto-curated story shortlist for "this week in AI/dev" newsletters.
- AI agent builders - feed top-10 items + AI analysis into a research agent for trend detection.
- Recruiters -
jobsmode to find startups actively hiring in your tech stack. - Product managers - competitive intelligence on AI tools shipping in your category.
- Researchers / academics - longitudinal study of HN content trends with reproducible queries.
Inputs
| Field | Type | Default | Notes |
|---|---|---|---|
mode | enum | top | One of: top, search, ask, show, jobs, user |
searchQuery | string | - | Required when mode=search. Aliases: q, query |
username | string | - | Required when mode=user. Alias: user |
searchTags | string | story | Algolia tags filter for search mode (e.g., comment, ask_hn) |
maxItems | integer | 30 | Items to fetch (1-500) |
minPoints | integer | 0 | Skip stories below this score (feed modes only) |
outputConnectors | array | - | Optional MCP connectors for feed or search summaries. Sends a compact message plus structured payload to authorized Slack, Notion, GitHub, Sheets, CRM, or other connector tools. |
connectorAlertTarget | string | - | Optional connector destination, such as a Slack channel, Notion database ID, sheet name, table, or CRM list. |
enableAiAnalysis | boolean | false | AI digest (themes, trends, TLDR). Adds ~$0.05/run |
llmProvider | enum | openrouter | AI provider: openrouter / anthropic / google / openai / ollama |
llmModel | string | - | Model override (uses provider default if blank) |
openrouterApiKey | string | - | OpenRouter key (or OPENROUTER_API_KEY env var) |
anthropicApiKey | string | - | Anthropic key (or ANTHROPIC_API_KEY env var) |
googleApiKey | string | - | Google AI key (or GOOGLE_API_KEY env var) |
openaiApiKey | string | - | OpenAI key (or OPENAI_API_KEY env var) |
ollamaBaseUrl | string | http://localhost:11434 | Self-hosted Ollama URL |
Input aliases
To reduce friction for programmatic use, several fields accept shorter aliases:
qorquery->searchQueryuser->usernamefeed->mode
Alias resolution order (lesson #19): alias is checked before canonical, so q=rust overrides a schema default on searchQuery.
Outputs
Standard item (all modes except user)
{"id": "39481275","title": "Show HN: My open-source HN scraper","url": "https://github.com/example/hn-scraper","text": null,"author": "alice","points": 142,"commentCount": 38,"type": "story","createdAt": "2024-04-27T09:00:00+00:00","hackerNewsUrl": "https://news.ycombinator.com/item?id=39481275"}
User item (mode=user)
{"id": "pg","title": "User: pg","url": "https://news.ycombinator.com/user?id=pg","text": "Lisp hacker. Co-founder of YC.","author": "pg","points": 155000,"commentCount": 0,"type": "user","createdAt": "2006-10-09T00:00:00+00:00","hackerNewsUrl": "https://news.ycombinator.com/user?id=pg","submissions": [{"id": "39481275","title": "...","url": "...","points": 142,"commentCount": 38,"createdAt": "2024-04-27T09:00:00+00:00"}]}
AI analysis item (appended when enableAiAnalysis=true)
{"report_type": "ai_analysis","analysis": {"top_themes": [{"label": "AI/LLM tooling", "description": "Multiple new open-source LLM inference projects launched."}],"notable_stories": [{"title": "...", "points": 420, "comments": 120, "domain": "github.com", "why_it_matters": "..."}],"tech_trends": [{"name": "Rust", "context": "Three high-upvote stories about Rust systems projects."}],"community_mood": {"tone": "excited", "explanation": "Several major AI launches drove high engagement."},"hot_topics": [{"title": "...", "note": "High comment-to-points ratio indicates controversy."}],"tldr": "Today's HN front page was dominated by AI infrastructure launches and Rust systems projects, with an undercurrent of privacy skepticism in the comments."}}
Pricing
| Event | Price | When |
|---|---|---|
item-scraped | $0.001 | Per item successfully scraped (story, user, comment, etc.) |
connector-alert-dispatched | $0.002 | Per successful feed or search summary delivered through an MCP output connector |
ai-analysis-completed | $0.05 | Per AI analysis digest (once per run) |
Typical run costs:
- 30 top stories, no AI: $0.03
- 30 top stories + AI digest: $0.08
- 100 search results: $0.10
- 500 top stories + AI: $0.55
- Daily watch (top 30 + AI digest): ~$2.40/month
vs. commercial alternatives: NewsAPI charges $449+/month for commercial use. This actor is pay-per-event with zero subscription fees.
AI Analysis (5 providers)
When enableAiAnalysis=true, the actor sends the scraped items to your chosen LLM and returns a structured digest with:
- top_themes - 3-6 dominant technology or industry themes
- notable_stories - top 5 by combined points + comments signal
- tech_trends - specific technologies, frameworks, or companies appearing prominently
- community_mood - overall tone (excited/skeptical/mixed/critical/optimistic)
- hot_topics - stories with high comment-to-points ratio (controversy signals)
- tldr - 2-3 sentence morning-briefing summary
| Provider | Default model | API key env var | Input field |
|---|---|---|---|
| OpenRouter (recommended) | google/gemini-2.0-flash-001 | OPENROUTER_API_KEY | openrouterApiKey |
| Anthropic | claude-sonnet-4-20250514 | ANTHROPIC_API_KEY | anthropicApiKey |
| Google AI | gemini-2.0-flash | GOOGLE_API_KEY | googleApiKey |
| OpenAI | gpt-4o-mini | OPENAI_API_KEY | openaiApiKey |
| Ollama (self-hosted) | llama3.1 | - | ollamaBaseUrl |
Code examples
Python - search for AI stories
from apify_client import ApifyClientclient = ApifyClient("YOUR_APIFY_TOKEN")run = client.actor("harvestlab/hacker-news-scraper").call(run_input={"mode": "search","searchQuery": "large language models","searchTags": "story","maxItems": 50,})for item in client.dataset(run["defaultDatasetId"]).iterate_items():print(item["title"], item["points"], item["url"])
Python - top stories with AI digest
run = client.actor("harvestlab/hacker-news-scraper").call(run_input={"mode": "top","maxItems": 30,"enableAiAnalysis": True,"llmProvider": "openrouter","openrouterApiKey": "sk-or-...",})items = list(client.dataset(run["defaultDatasetId"]).iterate_items())stories = [i for i in items if i.get("type") != "ai_analysis"]digest = next((i for i in items if i.get("report_type") == "ai_analysis"), None)print(digest["analysis"]["tldr"] if digest else "No digest")
LangChain tool integration
from langchain.tools import Toolfrom apify_client import ApifyClientapify = ApifyClient("YOUR_APIFY_TOKEN")def search_hacker_news(query: str) -> list:run = apify.actor("harvestlab/hacker-news-scraper").call(run_input={"mode": "search","q": query,"maxItems": 20,})return list(apify.dataset(run["defaultDatasetId"]).iterate_items())hn_search_tool = Tool(name="search_hacker_news",description="Search Hacker News for stories, comments, and discussions about any topic.",func=search_hacker_news,)# agent.invoke({"input": "What are people saying about Rust on HN?"})
n8n / Zapier workflow
Configure a scheduled trigger calling this actor with mode=top, maxItems=30, enableAiAnalysis=true. The tldr field from the AI analysis item is ready to paste directly into a Slack message or email digest.
To deliver the daily watch summary without a separate webhook workflow, select a Slack, Notion, GitHub, Sheets, or CRM MCP connector in outputConnectors and set connectorAlertTarget to the destination channel, page, table, or list. The actor will push a compact top-items summary plus structured payload after each successful scrape.
Scheduling for daily watch
- Set
mode: "top"+enableAiAnalysis: true+ preferred LLM provider. - Schedule via Apify Scheduler (hourly or daily).
- Estimated monthly cost at daily cadence: ~$2.40/month for top 30 + AI digest.
- Estimated monthly cost at hourly cadence: ~$57/month for top 30 + AI digest every hour.
Pair with other harvestlab actors
harvestlab/github-trending-scraper- pair HN's "what's launched today" with GitHub's "what's accelerating in stars" for complete developer intelligence.harvestlab/news-monitor- combine HN front page with Google News for cross-source coverage of AI launches.harvestlab/google-search-scraper- when an HN story hits #1, scrape the SERP for backlinks and downstream coverage.harvestlab/contact-extractor- feed each newShow HNURL into the contact extractor to build a founder lead list.
Legal and compliance
The Hacker News Firebase API is public, documented, and maintained by Y Combinator (https://github.com/HackerNews/API). The Algolia HN Search API is officially provided by Algolia under agreement with YC (https://hn.algolia.com/api). Both are intended for third-party use with no terms of service restrictions on read access.
Users are responsible for:
- Respecting rate norms: avoid aggressive concurrent fetching beyond
maxItems=500per run. - GDPR / CCPA compliance when storing usernames, karma, or comment text (HN user data is public but may constitute PII under some jurisdictions).
- Attribution when redistributing data: link back to
https://news.ycombinator.com/item?id=<id>per fair-use norms. - Not using the data for spam, harassment, or other violations of YC/HN community guidelines.
This actor does not scrape authenticated routes, bypass any security measures, or violate any robots.txt directives.
Contact
Built by harvestlab - 25+ monetized Apify Actors covering EU/US e-commerce, B2B intelligence, jobs, government procurement, social, and developer tools. Bug reports and feature requests via the Apify Store issue tracker on this actor's listing page.