Hacker News Scraper
Pricing
from $1.30 / 1,000 item scrapeds
Hacker News Scraper
Scrape HN stories, comments, and profiles via Firebase API. Get top/new/best/ask/show/job stories with scores, authors, timestamps. Full comment tree threading. User profiles with karma. Fast API-based.
Pricing
from $1.30 / 1,000 item scrapeds
Rating
0.0
(0)
Developer

junipr
Actor stats
0
Bookmarked
2
Total users
1
Monthly active users
8 hours ago
Last modified
Categories
Share
Extract stories, comments, user profiles, and search results from Hacker News using the official Firebase API. This actor provides full comment threading with recursive replies, clean markdown output optimized for LLMs, and powerful search via the Algolia HN Search API — all with zero scraping, zero legal risk, and no API key required.
Why Use This Actor?
Unlike web scrapers that parse HTML and break when the layout changes, this actor uses Hacker News's official public APIs. This means reliable, structured data every time with no rate-limit headaches or CAPTCHAs. Comment threads are fetched recursively and returned as nested trees, making it easy to analyze discussions. All text content is converted to clean markdown alongside the original HTML, making the output ready for LLM pipelines, RAG systems, and content analysis.
Features
- All feed types: Top, New, Best, Ask HN, Show HN, and Jobs
- Full comment threading: Recursive comment fetching with configurable depth and limits
- Algolia search: Full-text search across all HN content with tag filtering
- User profiles: Karma, about text, account age, and recent submissions
- Markdown output: HTML body content converted to clean markdown for LLM consumption
- Flexible filtering: Filter by score, comment count, and sort by score/date/comments
- Poll support: Full poll option extraction with scores
- Zero configuration: Works out of the box — just run it and get the top 30 stories
Input Examples
Scrape Top Stories
{"scrapeType": "topStories","maxItems": 30,"includeComments": true,"maxCommentsPerItem": 50,"commentDepth": 5}
Search Hacker News
{"scrapeType": "search","searchQuery": "machine learning","searchTags": "story","maxItems": 20,"minScore": 50}
Fetch User Profiles
{"scrapeType": "user","usernames": ["pg", "dang", "patio11"]}
Fetch Specific Items
{"scrapeType": "item","itemIds": [8863, 121003],"includeComments": true}
Output Example
Each story in the dataset looks like this:
{"id": 12345678,"type": "story","title": "Show HN: A new open-source tool for data extraction","url": "https://example.com/tool","text": null,"textMarkdown": null,"by": "builder42","score": 287,"time": 1710150000,"timeISO": "2026-03-11T12:00:00.000Z","descendants": 143,"comments": [{"id": 12345679,"type": "comment","by": "techfan","text": "<p>This is really impressive work. The API design reminds me of...</p>","textMarkdown": "This is really impressive work. The API design reminds me of...","time": 1710151000,"timeISO": "2026-03-11T12:16:40.000Z","parent": 12345678,"replies": []}],"hnUrl": "https://news.ycombinator.com/item?id=12345678"}
How It Works
The actor uses two official APIs:
- HN Firebase API (
hacker-news.firebaseio.com/v0/): Fetches story feeds, individual items, comments, and user profiles. This is the same API that powers the HN website. - HN Algolia API (
hn.algolia.com/api/v1/): Provides full-text search across all HN content with filtering by story, comment, poll, or job.
Comment threads are fetched recursively — each item contains a list of child IDs, which are fetched individually and assembled into a nested tree structure. Rate limiting is applied automatically to stay within API limits.
Pricing
This actor uses pay-per-event pricing at $1.30 per 1,000 items scraped. Each story, comment batch, user profile, or search result counts as one item. You only pay for what you use — no minimum fees or subscriptions.
Pricing includes all platform compute costs — no hidden fees.
Tips for Best Results
- Speed vs. completeness: Set
includeComments: falsefor fast metadata-only runs. Comment fetching is the slowest part since each comment requires an individual API call. - Large runs: For 500 stories with comments, expect 5-15 minutes depending on comment density. The actor respects HN API rate limits to avoid being blocked.
- Search: The Algolia API is fast and supports standard search operators. Use
searchTagsto filter by content type. - Filtering: Use
minScoreandminCommentsto focus on high-quality, well-discussed content.
FAQ
How many items can I scrape at once?
You can scrape up to 500 items per run. For story feeds, HN typically returns 200-500 story IDs. The maxItems parameter limits how many are processed.
Does this actor require an API key?
No. Both the HN Firebase API and the Algolia HN Search API are public and free to use. No authentication is needed.
Can I get historical data?
Use the search feature with date filters via the Algolia API. The Firebase API only provides current feed positions (top, new, best) but individual items are available forever by ID.
Why are some comments missing?
Deleted and dead (flagged/killed) comments are excluded by default. The maxCommentsPerItem and commentDepth settings also limit how many comments are fetched per story.
How is this different from other HN scrapers?
This actor uses the official APIs instead of web scraping, provides full recursive comment threading, converts HTML to LLM-ready markdown, and supports all HN content types including polls, jobs, and user profiles. Most competitors only scrape the front page HTML.