Hacker News — CSV, Stories + Comments + Users, No API Key
Pricing
Pay per usage
Hacker News — CSV, Stories + Comments + Users, No API Key
Scrape HN top/new/Show HN/Ask HN/jobs in minutes. No rate limits. Title, URL, score, comments, author as JSON/CSV. 26+ runs. For launch monitoring, competitor tracking, market trends. spinov001@gmail.com · blog.spinov.online · t.me/scraping_ai
Pricing
Pay per usage
Rating
0.0
(0)
Developer
Alex
Maintained by CommunityActor stats
0
Bookmarked
2
Total users
0
Monthly active users
18 days ago
Last modified
Categories
Share
Hacker News Scraper
Scrape stories and comments from Hacker News — extract top, new, best, Ask HN, Show HN, and job posts with full comment threads. Uses the official HN API and Algolia search for fast, reliable data extraction.
Features
- 6 story types — top, new, best, Ask HN, Show HN, and job stories
- Full comment threads — nested comments with author, text, timestamp, depth level, and child count (up to 3 levels deep)
- Algolia search — find stories by keyword with relevance ranking across all of Hacker News history
- Score filtering — set a minimum score threshold to extract only high-quality stories
- Batch processing — fetches stories in parallel batches of 10 for maximum speed
- Domain extraction — automatically extracts the domain from story URLs
- Real-time data — uses the official Firebase HN API for live scores and comment counts
Output Example
{"id": 39876543,"title": "Show HN: I built an open-source alternative to Notion","url": "https://github.com/user/project","author": "developer_123","score": 487,"commentCount": 234,"time": "2026-03-17T14:20:00.000Z","type": "story","hnUrl": "https://news.ycombinator.com/item?id=39876543","domain": "github.com","source": "top","comments": [{"id": 39876600,"author": "tech_reviewer","text": "This is impressive! I especially like the.","time": "2026-03-17T14:35:00.000Z","depth": 0,"childCount": 5}],"scrapedAt": "2026-03-18T12:00:00.000Z"}
Use Cases
- Tech trend monitoring — track what topics, tools, and technologies the developer community is discussing
- Content research — discover high-performing content topics and formats that resonate with technical audiences
- Competitive intelligence — monitor mentions of your product, competitors, and industry on the #1 tech news site
- Startup discovery — scrape Show HN posts to find new product launches and early-stage startups
- Job market analysis — extract HN job postings to analyze hiring trends, salaries, and in-demand skills
Input Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
scrapeType | String | "top" | Story type: top, new, best, ask, show, job (6 options per input_schema enum). For keyword search, use searchQueries array — the search loop fires independently of scrapeType. |
searchQueries | Array | [] | Keywords to search across HN history (via Algolia) |
maxStories | Number | 100 | Maximum stories to extract |
includeComments | Boolean | true | Whether to extract comment threads |
maxCommentsPerStory | Number | 30 | Maximum comments per story (includes nested replies) |
minScore | Number | 0 | Minimum score threshold (filter out low-scoring stories) |
Pricing
Standard Apify per-run compute pricing — no per-story or per-comment fee. With includeComments: true, each comment requires a separate Firebase API request, so a 100-story run with full comment threads can issue 1000+ requests. Use maxCommentsPerStory and minScore to bound cost.
FAQ
Q: Can I search across all of Hacker News history?
A: Yes — for stories only. The search feature uses Algolia's HN Search API with tags=story filter, so comment-text matches are excluded. Use the dedicated HN Algolia UI (https://hn.algolia.com/) if you need comment-search.
Q: Why are comments more expensive to scrape? A: Each comment requires a separate API call to the HN Firebase API. A story with 200 comments can require 30+ individual requests to fetch the top-level and nested replies.
Q: What's the difference between "top" and "best" stories? A: "Top" shows the current front page ranking (changes frequently). "Best" shows the highest-scoring stories over a longer period. "New" shows the most recently submitted stories regardless of score.
Honest Limitations
- Comment recursion is bounded. Each parent comment fetches at most its first 5 child IDs (
item.kids.slice(0, 5)). So a comment with 50 replies emits only 5 of them per branch. Depth limit is 3 (top-level depth=0 plus up to 3 recursion levels = max 4 levels including root). maxCommentsPerStoryis a global per-story cap, applied across all depths simultaneously. A wide thread that hits 30 top-level comments before recursion reaches deeper replies will leave deeper branches unfetched.- Failed item fetches are silently skipped —
.catch(() => null)returns null, the iteration continues. There is no retry logic. - Output schema differs by branch. The type-branch (
top/new/best/ask/show/job) emits 13 fields includingtext(for ask/show story body). The search-branch (searchQueries) emits 12 fields includingtags(Algolia_tagsarray) but notext. The example above is the type-branch shape — search records lacktextand gaintags. - Algolia search filters to
tags=story— comment-only matches are not returned by this actor. - Story IDs from
*stories.jsonendpoints are returned in HN's display order. The actor pre-fetchesmaxStories * 2IDs to allow forminScorefiltering. If yourminScoreis high and the front page is light on high-score stories, the actor returns fewer thanmaxStoriesrows.
Related scrapers
- Walmart Reviews Scraper — Product reviews to CSV/JSON/Excel, 17 fields per review, bypasses Walmart's 100-review UI cap
- Trustpilot Review Scraper — 951 lifetime production runs, full review schema export
- Reddit Discussion Scraper — Discussion threads from any subreddit
- Google News Scraper — News headlines and source attribution
- Bluesky Scraper — Posts and profiles from Bluesky
- MCP Trend Detector — AI-powered trend detection across news + social
Proof of delivery: This Hacker News scraper has 27 lifetime production runs as of May 2026. Author maintains 31 published actors (78 total) and shipped a paid 3-article series in March 2026 ($150, proxy industry). Pilot pricing locked through May 2026.
Sample request? Reply sample to spinov001@gmail.com and we'll send 2 published case-study articles within 24 hours.
Custom scraping — pilot tiers
Need data, not infrastructure. We build, you query. Three tiers:
- Pilot — $97 · 1 actor, basic config, 7-day support. Good entry point for one-off jobs.
- Standard — $297 · custom actor + Slack/email alerts on results, 30-day support. Most projects fit here.
- Premium — $797 · custom actor + dashboard + 90-day support + 1 modification round. For ongoing data pipelines.
Email: spinov001@gmail.com — drop specs, schema, or target URLs and get a quote within 48h.
Proof of work: 31 published Apify scrapers (78 total in portfolio) — Trustpilot 951 runs, Reddit 82, Google News 45, Glassdoor 39, Email Extractor 107, Hacker News 27. Recently delivered a paid 3-article series for a client in the proxy industry ($150).
More tips: t.me/scraping_ai · blog.spinov.online
Honest disclosure: this scraper uses the public HN Firebase API and Algolia HN Search — no scraping behind login walls, no personal data beyond what HN publishes, robots.txt respected. Not affiliated with Y Combinator.