Hacker News Scraper - Stories, Comments & AI Digest
Pricing
from $1.00 / 1,000 item scrapeds
Hacker News Scraper - Stories, Comments & AI Digest
Scrape HN top stories, search, Ask HN, Show HN, job posts and user profiles via Algolia & Firebase APIs. Zero anti-bot, no API key. AI topic digest (themes, trends, TLDR) via 5 LLM providers. x402-ready. $0.001/item.
Pricing
from $1.00 / 1,000 item scrapeds
Rating
0.0
(0)
Developer
Nick
Maintained by CommunityActor stats
0
Bookmarked
4
Total users
2
Monthly active users
2 days ago
Last modified
Categories
Share
Scrape Hacker News at $0.001/item. Top stories, full-text search, Ask HN, Show HN, job posts, and user profiles via the official Algolia HN Search API and Firebase HN API. Zero anti-bot, no account or API key required. AI topic digest (themes, trends, hot topics, TLDR) via 5-provider LLM router. pay-per-result - no cookies - no rental tax.
Built for: developer-relations teams, VC scouts, technical founders, AI agent builders, content writers, and researchers. Pairs with harvestlab/github-trending-scraper for a complete "developer intelligence" pipeline.
What this does
Hacker News is the highest-signal feed for early-stage tech, AI launches, and indie-dev culture. This actor exposes six scraping modes through two official, freely available APIs:
- Algolia HN Search API (
hn.algolia.com/api/v1) for full-text search and user profiles - no API key, paginated to any depth. - Firebase HN API (
hacker-news.firebaseio.com/v0) for live feeds (top, ask, show, jobs) - the same backend the official HN site uses.
Every item is normalized to a portfolio-standard shape: id, title, url, text, author, points, commentCount, type, createdAt, hackerNewsUrl. The optional AI analysis step summarizes the batch using your choice of 5 LLM providers.
Modes
| Mode | API used | What you get |
|---|---|---|
top | Firebase | Today's front-page top stories |
search | Algolia | Full-text search results (stories, comments, Ask HN, etc.) |
ask | Firebase | Ask HN posts |
show | Firebase | Show HN posts |
jobs | Firebase | Job posts (Who's Hiring threads) |
user | Algolia | User profile + recent submissions |
Why this beats the alternatives
| Approach | Cost | Search | AI digest | Live feeds |
|---|---|---|---|---|
| harvestlab/hacker-news-scraper | $0.001/item | Yes (Algolia) | Yes (5 LLMs) | Yes (Firebase) |
| HN Algolia API direct | $0 | Yes | No | No |
| HN RSS feeds | $0 | No | No | Limited |
| Third-party HN services | $5-29/mo | Limited | No | Limited |
| Custom Firebase + Algolia DIY | Engineering cost | DIY | DIY | DIY |
The wedge: search + live feeds + AI analysis in one pay-per-event actor. The Algolia API gives you full-text search and user profiles; Firebase gives you real-time feed data with full score/comment counts. Add enableAiAnalysis=true for a structured AI digest covering themes, trends, hot topics, and a newsletter-ready TLDR.
Use cases
- Developer-relations teams - monitor
showfeed for competitor launches and early adopters. - VC scouts - search for specific technology keywords + AI digest = pre-seed deal flow before the post hits 100 points.
- Indie founders - track
top+askfor validation signals and competitive intel. - Content writers / Substackers - auto-curated story shortlist for "this week in AI/dev" newsletters.
- AI agent builders - feed top-10 items + AI analysis into a research agent for trend detection.
- Recruiters -
jobsmode to find startups actively hiring in your tech stack. - Product managers - competitive intelligence on AI tools shipping in your category.
- Researchers / academics - longitudinal study of HN content trends with reproducible queries.
Inputs
| Field | Type | Default | Notes |
|---|---|---|---|
mode | enum | top | One of: top, search, ask, show, jobs, user |
searchQuery | string | - | Required when mode=search. Aliases: q, query |
username | string | - | Required when mode=user. Alias: user |
searchTags | string | story | Algolia tags filter for search mode (e.g., comment, ask_hn) |
maxItems | integer | 30 | Items to fetch (1-500) |
minPoints | integer | 0 | Skip stories below this score (feed modes only) |
enableAiAnalysis | boolean | false | AI digest (themes, trends, TLDR). Adds ~$0.05/run |
llmProvider | enum | openrouter | AI provider: openrouter / anthropic / google / openai / ollama |
llmModel | string | - | Model override (uses provider default if blank) |
openrouterApiKey | string | - | OpenRouter key (or OPENROUTER_API_KEY env var) |
anthropicApiKey | string | - | Anthropic key (or ANTHROPIC_API_KEY env var) |
googleApiKey | string | - | Google AI key (or GOOGLE_API_KEY env var) |
openaiApiKey | string | - | OpenAI key (or OPENAI_API_KEY env var) |
ollamaBaseUrl | string | http://localhost:11434 | Self-hosted Ollama URL |
Input aliases
To reduce friction for programmatic use, several fields accept shorter aliases:
qorquery->searchQueryuser->usernamefeed->mode
Alias resolution order (lesson #19): alias is checked before canonical, so q=rust overrides a schema default on searchQuery.
Outputs
Standard item (all modes except user)
{"id": "39481275","title": "Show HN: My open-source HN scraper","url": "https://github.com/example/hn-scraper","text": null,"author": "alice","points": 142,"commentCount": 38,"type": "story","createdAt": "2024-04-27T09:00:00+00:00","hackerNewsUrl": "https://news.ycombinator.com/item?id=39481275"}
User item (mode=user)
{"id": "pg","title": "User: pg","url": "https://news.ycombinator.com/user?id=pg","text": "Lisp hacker. Co-founder of YC.","author": "pg","points": 155000,"commentCount": 0,"type": "user","createdAt": "2006-10-09T00:00:00+00:00","hackerNewsUrl": "https://news.ycombinator.com/user?id=pg","submissions": [{"id": "39481275","title": "...","url": "...","points": 142,"commentCount": 38,"createdAt": "2024-04-27T09:00:00+00:00"}]}
AI analysis item (appended when enableAiAnalysis=true)
{"report_type": "ai_analysis","analysis": {"top_themes": [{"label": "AI/LLM tooling", "description": "Multiple new open-source LLM inference projects launched."}],"notable_stories": [{"title": "...", "points": 420, "comments": 120, "domain": "github.com", "why_it_matters": "..."}],"tech_trends": [{"name": "Rust", "context": "Three high-upvote stories about Rust systems projects."}],"community_mood": {"tone": "excited", "explanation": "Several major AI launches drove high engagement."},"hot_topics": [{"title": "...", "note": "High comment-to-points ratio indicates controversy."}],"tldr": "Today's HN front page was dominated by AI infrastructure launches and Rust systems projects, with an undercurrent of privacy skepticism in the comments."}}
Pricing
| Event | Price | When |
|---|---|---|
item-scraped | $0.001 | Per item successfully scraped (story, user, comment, etc.) |
ai-analysis-completed | $0.05 | Per AI analysis digest (once per run) |
Typical run costs:
- 30 top stories, no AI: $0.03
- 30 top stories + AI digest: $0.08
- 100 search results: $0.10
- 500 top stories + AI: $0.55
- Daily watch (top 30 + AI digest): ~$2.40/month
vs. commercial alternatives: NewsAPI charges $449+/month for commercial use. This actor is pay-per-event with zero subscription fees.
AI Analysis (5 providers)
When enableAiAnalysis=true, the actor sends the scraped items to your chosen LLM and returns a structured digest with:
- top_themes - 3-6 dominant technology or industry themes
- notable_stories - top 5 by combined points + comments signal
- tech_trends - specific technologies, frameworks, or companies appearing prominently
- community_mood - overall tone (excited/skeptical/mixed/critical/optimistic)
- hot_topics - stories with high comment-to-points ratio (controversy signals)
- tldr - 2-3 sentence morning-briefing summary
| Provider | Default model | API key env var | Input field |
|---|---|---|---|
| OpenRouter (recommended) | google/gemini-2.0-flash-001 | OPENROUTER_API_KEY | openrouterApiKey |
| Anthropic | claude-sonnet-4-20250514 | ANTHROPIC_API_KEY | anthropicApiKey |
| Google AI | gemini-2.0-flash | GOOGLE_API_KEY | googleApiKey |
| OpenAI | gpt-4o-mini | OPENAI_API_KEY | openaiApiKey |
| Ollama (self-hosted) | llama3.1 | - | ollamaBaseUrl |
Code examples
Python - search for AI stories
from apify_client import ApifyClientclient = ApifyClient("YOUR_APIFY_TOKEN")run = client.actor("harvestlab/hacker-news-scraper").call(run_input={"mode": "search","searchQuery": "large language models","searchTags": "story","maxItems": 50,})for item in client.dataset(run["defaultDatasetId"]).iterate_items():print(item["title"], item["points"], item["url"])
Python - top stories with AI digest
run = client.actor("harvestlab/hacker-news-scraper").call(run_input={"mode": "top","maxItems": 30,"enableAiAnalysis": True,"llmProvider": "openrouter","openrouterApiKey": "sk-or-...",})items = list(client.dataset(run["defaultDatasetId"]).iterate_items())stories = [i for i in items if i.get("type") != "ai_analysis"]digest = next((i for i in items if i.get("report_type") == "ai_analysis"), None)print(digest["analysis"]["tldr"] if digest else "No digest")
LangChain tool integration
from langchain.tools import Toolfrom apify_client import ApifyClientapify = ApifyClient("YOUR_APIFY_TOKEN")def search_hacker_news(query: str) -> list:run = apify.actor("harvestlab/hacker-news-scraper").call(run_input={"mode": "search","q": query,"maxItems": 20,})return list(apify.dataset(run["defaultDatasetId"]).iterate_items())hn_search_tool = Tool(name="search_hacker_news",description="Search Hacker News for stories, comments, and discussions about any topic.",func=search_hacker_news,)# agent.invoke({"input": "What are people saying about Rust on HN?"})
n8n / Zapier workflow
Configure a scheduled trigger calling this actor with mode=top, maxItems=30, enableAiAnalysis=true. The tldr field from the AI analysis item is ready to paste directly into a Slack message or email digest.
Scheduling for daily watch
- Set
mode: "top"+enableAiAnalysis: true+ preferred LLM provider. - Schedule via Apify Scheduler (hourly or daily).
- Estimated monthly cost at daily cadence: ~$2.40/month for top 30 + AI digest.
- Estimated monthly cost at hourly cadence: ~$57/month for top 30 + AI digest every hour.
Pair with other harvestlab actors
harvestlab/github-trending-scraper- pair HN's "what's launched today" with GitHub's "what's accelerating in stars" for complete developer intelligence.harvestlab/news-monitor- combine HN front page with Google News for cross-source coverage of AI launches.harvestlab/google-search-scraper- when an HN story hits #1, scrape the SERP for backlinks and downstream coverage.harvestlab/contact-extractor- feed each newShow HNURL into the contact extractor to build a founder lead list.
Legal and compliance
The Hacker News Firebase API is public, documented, and maintained by Y Combinator (https://github.com/HackerNews/API). The Algolia HN Search API is officially provided by Algolia under agreement with YC (https://hn.algolia.com/api). Both are intended for third-party use with no terms of service restrictions on read access.
Users are responsible for:
- Respecting rate norms: avoid aggressive concurrent fetching beyond
maxItems=500per run. - GDPR / CCPA compliance when storing usernames, karma, or comment text (HN user data is public but may constitute PII under some jurisdictions).
- Attribution when redistributing data: link back to
https://news.ycombinator.com/item?id=<id>per fair-use norms. - Not using the data for spam, harassment, or other violations of YC/HN community guidelines.
This actor does not scrape authenticated routes, bypass any security measures, or violate any robots.txt directives.
Contact
Built by harvestlab - 25+ monetized Apify Actors covering EU/US e-commerce, B2B intelligence, jobs, government procurement, social, and developer tools. Bug reports and feature requests via the Apify Store issue tracker on this actor's listing page.