๐ถ Hacker News Scraper โ Stories & Tech Trends
Pricing
from $8.00 / 1,000 results
๐ถ Hacker News Scraper โ Stories & Tech Trends
Scrape Hacker News stories โ top, new, best, ask, show, jobs. Engagement tracking, trend analysis, and tech topic monitoring.
Pricing
from $8.00 / 1,000 results
Rating
0.0
(0)
Developer
Stephan Corbeil
Maintained by CommunityActor stats
0
Bookmarked
27
Total users
7
Monthly active users
a day ago
Last modified
Categories
Share
๐ Hacker News Scraper & Tech Trend Tracker
Extract trending stories, comments, and metadata from Hacker News at scale. A drop-in alternative to the HN Algolia API and the official Firebase API โ with bulk pagination, comment-thread expansion, and structured JSON output, no rate-limit headaches.
Why This HN Scraper Beats HN Algolia, Firebase API & Manual Polling
| Feature | NexGenData Hacker News Scraper | HN Algolia API | HN Official Firebase API | Manual cron + scraping |
|---|---|---|---|---|
| Cost | $5 / 1,000 results, pay-per-event | Free but rate-limited | Free but no batch | Engineering time + infra |
| Bulk pagination | Up to 500 stories per run | Plan-limited | One ID at a time | Build it yourself |
| Comment threads | Full nested comments per story | Separate calls | Walk descendants tree manually | Build it yourself |
| Story feeds | top, new, best, ask, show, job | Limited categories | Yes (one per call) | Build it yourself |
| Output format | JSON / CSV / Excel | JSON | JSON | Whatever you write |
| Schedule + webhook | Native cron + webhook on completion | None | None | Build it yourself |
| Time-to-first-row | < 60 seconds | Signup needed | Yes (slow per-call) | Days |
| Auth | Apify token | None (anon) | None | Your IP / proxy |
| Maintenance | We handle it | Algolia handles it | You handle Firebase quirks | You handle everything |
Most teams pick this scraper because it is faster than walking the Firebase descendants tree by hand and more flexible than Algolia's fixed search index โ and it ships JSON straight to a dataset, no Firebase SDK required.
What This Actor Does
The Hacker News Scraper connects directly to Hacker News' Firebase API to extract stories, comments, and metadata in seconds. No parsing, no rate limits, no complex API documentation. Whether you're tracking tech trends, monitoring startup mentions, or feeding AI training data, this actor delivers structured JSON output you can use immediately.
Perfect for:
- Startups building competitive intelligence systems
- Data scientists gathering training datasets
- Content strategists tracking industry discussions
- Researchers analyzing tech community behavior
- Automated news feeds and aggregators
Why Scrape Hacker News?
Hacker News data extraction powers decision-making across tech companies. HN discussions reveal product launches before major announcements, engineering challenges competitors are solving, investor and founder sentiment shifts, early signals for emerging technologies, and real-time feedback on industry trends.
Key Features
Search Multiple Story Types
Need top Hacker News stories? Use searchType: top. Want trending HN tech news? Try searchType: best. The actor supports all six story feeds: top (frontpage stories), new (recently submitted), best (ranked by score with visibility weighting), ask (Ask HN discussions), show (Show HN project submissions), and job (job postings and hiring).
Fetch Exact Result Counts
Set maxResults from 1 to 500. Whether you need the top 10 Hacker News articles for a daily brief or 500 HN stories for machine learning training data, get exactly what you specify.
Include Full Comment Threads
Set includeComments: true to attach every comment under each story. Extract sentiment, track discussions, build comment datasets. With includeComments: false, run faster and leaner when you only need stories.
Fast Execution
Leverages HN's Firebase backend for speed. Most requests complete in under 30 seconds.
Real-World Use Cases
1. Competitive Intelligence Dashboard
Automatically surface mentions of competitors, their products, and industry discussions daily. Feed results into a dashboard that flags stories mentioning competitor names. Set it to run daily on searchType: new with maxResults: 100. Sales teams get alerts when competitors are discussed, what people like about them, and what criticism appears in comments.
2. AI Training Dataset for Tech Sentiment Analysis
Build production-grade datasets for fine-tuning LLMs on real tech conversations. A 500-result scrape with includeComments: true gives you 10,000-50,000 comments across stories. Combined with story scores and timestamps, you have labeled sentiment data.
3. Automated Newsletter Content
Run the actor daily on searchType: top, extract titles and top-voted comments, feed into your newsletter template. Readers see what the HN community is discussing with context from real discussions.
4. Job Board Aggregation
Set searchType: job and maxResults: 100 to scrape HN's job listings. Automatically notify candidates when companies in target cities are hiring. Extract company names and roles from structured output.
Input Parameters
| Parameter | Type | Range | Description |
|---|---|---|---|
searchType | string | top, new, best, ask, show, job | Which HN feed to scrape. Default: top |
maxResults | number | 1-500 | How many stories to extract. Default: 30 |
includeComments | boolean | true/false | Attach all comments under each story. Default: false |
Quick Start
from apify_client import ApifyClientclient = ApifyClient("YOUR_APIFY_TOKEN")run = client.actor("nexgendata/hacker-news-scraper").call(run_input={"searchType": "top","maxResults": 100,"includeComments": False,})for item in client.dataset(run["defaultDatasetId"]).iterate_items():print(item.get("title"), item.get("score"))
Sample Output
{"id": 42840302,"title": "Building a machine learning model for production","url": "https://example.com/ml-guide","score": 487,"descendants": 142,"time": 1711723200,"type": "story","by": "techauthor","comments": [{"id": 42840910, "text": "Great breakdown...", "score": 52, "by": "commentor1"}]}
Pricing: $5 per 1,000 Results
Cost breakdown: Scrape 30 stories = $0.15. Scrape 100 stories = $0.50. Scrape 500 stories = $2.50.
Building it yourself costs more: 40+ hours to write, test, and deploy a reliable HN scraper (~$2,000 in dev time), plus 5-10 hours/month in maintenance when things change.
FAQ
Will this scraper get blocked or rate-limited? No. The actor uses Hacker News' own Firebase API, which is public and official. No rate limits, no blocking risk. HN publicly documents and allows automated access via this API.
How fresh is the data? Real-time. The actor pulls directly from HN's live database. Stories appear in your output within seconds of being posted.
Can I schedule this to run daily automatically? Yes. Apify handles scheduling natively. Set up a daily run on your preferred search type and let it populate your database automatically.
Is my data private? Completely. All data stays within your Apify account. nexgendata has no access to results, metadata, or usage patterns.
How is this different from the HN Algolia API? HN Algolia is a search index built on top of HN โ great for full-text search across years of HN history, but the rate limit is real and the JSON shape doesn't include the comment tree. This actor walks the Firebase tree for you and ships flat comment arrays.
Agentic payments (AI agent buyers welcome)
This actor supports autonomous payment via Skyfire โ AI agents (Claude Desktop, OpenCode, Cursor, Vercel AI SDK, OpenAI Agents SDK) can discover, fund, and invoke it without a human-in-the-loop credit card flow.
Agents using Apify's MCP server can find this actor by searching for Hacker News stories, YC News trend monitoring, or tech community signals and pay via a Skyfire PAY token (minimum $5 prefund). The existing pay-per-event pricing applies unchanged โ the agent funds a token, runs the actor at the published per-result rate, and unused balance returns to the wallet on expiry.
Compatible agent frameworks:
- Apify's official MCP server (
mcp.apify.com) - Claude Desktop with Apify MCP integration
- OpenCode + Apify MCP
- OpenAI Agents SDK + Skyfire toolkit (via Composio)
- Vercel AI SDK + Skyfire toolkit (via Composio)
No code changes needed on the actor side โ the integration is fully on Apify's infrastructure. AI agents discover via allowsAgenticUsers=true filter on Apify's store API.
Related NexGenData Actors
| Use case | Actor |
|---|---|
| Show HN launch tracker | HN Show HN Tracker |
| Reddit subreddit trend & post tracker | Reddit Subreddit Trends |
| News & media monitoring for AI agents | News MCP Server |
| Indie Hackers product launches | Indie Hackers Products Tracker |
| Product Hunt launches tracker | Product Hunt Launches Scraper |
| Wikipedia structured-knowledge scraper | Wikipedia Scraper |
| Google Scholar paper search | Google Scholar Scraper |
| arXiv preprint search | arXiv Scraper |
About NexGenData
NexGenData publishes 260+ buyer-intent actors covering SEC filings, YC alumni, lead generation, competitive intelligence, stock fundamentals across 30+ exchanges, and more. All pay-per-result. Browse the full catalog at https://apify.com/nexgendata?fpr=2ayu9b
How NexGenData Pricing Works
Every NexGenData actor uses pay-per-event pricing โ you only pay for results that actually land in your dataset. No monthly minimum, no seat fees, no surprise overage bills.
- Actor Start: a single-event charge each time you spin the actor up (scaled to memory size)
- Result: charged per item written to the default dataset
- No charge for retries, internal proxy rotation, or failed sub-requests โ those are absorbed by the platform
If you only need the data once a quarter, you only pay once a quarter. If you scale to millions of records, the unit cost stays the same.
Apify Platform Bonus
New to Apify? Sign up with the NexGenData referral link โ you get free platform credits on signup (enough for several thousand free results) and you help fund the maintenance of this actor fleet.
Integration Surface
Every actor in the NexGenData catalog can be triggered from:
- Apify console โ point-and-click run
- Apify API โ REST + webhooks
- Apify Python / JS SDKs โ programmatic batch
- Zapier, Make.com, n8n โ official integrations
- MCP โ many actors are exposed as MCP tools for Claude / ChatGPT / Cursor agents
- Schedules โ built-in cron for daily / weekly / monthly runs
- Webhooks โ POST results to any HTTPS endpoint on dataset write
Support
NexGenData maintains 260+ Apify actors and ships updates regularly. Bug reports via the Apify console issues tab get a response within 24 hours. Roadmap requests are welcome โ high-demand features ship in the next version.
Home: thenextgennexus.com Full catalog: apify.com/nexgendata