Hacker News Scraper
Pricing
Pay per event
Hacker News Scraper
Scrape Hacker News stories (top, new, best, ask, show, jobs) plus per-story metadata in one call. We handle pagination, fan-out, retries, and rate-limit pacing — you get typed dataset rows with title, URL, score, author, comment count, timestamp, and permalink.
Pricing
Pay per event
Rating
0.0
(0)
Developer
DevilScrapes
Maintained by CommunityActor stats
0
Bookmarked
2
Total users
1
Monthly active users
7 days ago
Last modified
Categories
Share
🎯 What this scrapes
This Actor walks Hacker News for the story list you pick — top, new, best, ask, show, or jobs — fans out to each story's record, handles retries when the upstream hiccups, and writes one typed dataset row per story. You pick the feed; we deliver clean rows on a schedule, in JSON / CSV / Excel, ready to wire into Sheets, S3, a warehouse, or a webhook.
🔥 What we handle for you
- 🛡️ Browser fingerprint rotation —
curl-cffiimpersonates real Chrome / Firefox / Safari TLS handshakes so the target sees a browser, not Python. - 🌐 Residential proxy rotation via Apify Proxy — fresh session and exit IP on every block.
- 🔁 Retries with exponential backoff on
408 / 429 / 5xx— up to 5 attempts per page,Retry-Afterhonoured. - 🧱 Rate-limit-aware pacing — when the target pushes back, we slow down instead of getting banned.
- 🧊 Clean, typed dataset rows — Pydantic-validated, ISO-8601 timestamps, stable IDs, JSON / CSV / Excel export straight from the Apify Console.
- 💰 Pay-Per-Event pricing — you only pay for results that hit your dataset. No data, no charge.
💡 Use cases
- Trend monitoring — diff top stories hourly to see which posts gain traction fastest.
- Comment-volume alerts — pipe rows into Slack when a story passes 100 comments.
- Lead gen for dev tools — surface Show HN launches that mention your stack and reach out.
- Newsletter curation — feed the top 10 stories from
bestinto a weekly digest. - ML training data — historical top story metadata for clickbait or score-prediction models.
⚙️ How to use it
- Click Try for free at the top of the page.
- Fill in the input form — most fields have sensible defaults.
- Click Start. Output streams into the run's dataset.
- Export from Storage → Dataset as JSON, CSV, or Excel — or fetch via the API.
📥 Input
| Field | Type | Required | Default | Notes |
|---|---|---|---|---|
feed | string | no | 'top' | Which Hacker News story feed to pull from. top mirrors the front page; best is the time-decaye |
maxResults | integer | no | 100 | Total dataset items to keep. The feeds expose up to 500 items each; pulling all of them costs ~500 results. Set to <code |
includeText | boolean | no | True | Fetch the full self-post body for Ask HN / Show HN entries. Has no effect on regular link stories. |
concurrency | integer | no | 8 | How many story records to fetch in parallel. The Firebase endpoint is generous; 8 is comfortable. |
proxyConfiguration | object | no | {'useApifyProxy': False} | Apify Proxy is optional here — the Firebase API is happy to be hit directly. Enable proxy only if you're routing all tra |
Example input
{"feed": "top","maxResults": 3,"includeText": false,"concurrency": 4,"proxyConfiguration": {"useApifyProxy": false}}
📤 Output
Every row is one dataset item.
| Field | Type | Notes |
|---|---|---|
id | integer | Hacker News story ID (stable, monotonically increasing). |
type | string | HN item type — story, job, ask, show, comment, poll. |
title | string | Story headline. |
url | ['string', 'null'] | Outbound link (null for self-posts). |
permalink | string | Hacker News permalink (news.ycombinator.com/item?id=...). |
by | string | Author username on Hacker News. |
score | ['integer', 'null'] | Upvotes — null for jobs / dead items. |
descendants | ['integer', 'null'] | Total comment count, including replies. |
text | ['string', 'null'] | Self-post body (Ask HN / Show HN). HTML, only present when includeText=true. |
time | integer | Unix epoch seconds — when the story was posted. |
posted_at | string | ISO-8601 UTC timestamp derived from time. |
scraped_at | string | ISO-8601 UTC timestamp of when this row was recorded. |
rank | integer | Position of this story in the feed at scrape time (1-indexed). |
Example output
{"id": 39000000,"type": "story","title": "Show HN: Devil Scrapes \u2014 public-data Apify Actors with honest pricing","url": "https://apify.com/DevilScrapes","permalink": "https://news.ycombinator.com/item?id=39000000","by": "devilscrapes","score": 142,"descendants": 33,"text": null,"time": 1747353600,"posted_at": "2026-05-15T20:00:00+00:00","scraped_at": "2026-05-15T20:05:00+00:00","rank": 1}
💰 Pricing
Pay-Per-Event — you pay only when these events fire:
| Event | USD | What it is |
|---|---|---|
actor-start | $0.005 | One-off warm-up charge per run |
result | $0.002 | Per dataset item |
Example: 1 000 results at the rates above ≈ $2.00. No subscription, no minimum, no card to start — Apify gives every new account $5 of free credit.
🚧 Limitations
Comment threads aren't expanded — we return descendants (the count) but not the tree. Dead and deleted stories are skipped. The text field for self-posts is HTML, not Markdown — render it through your own sanitiser before display.
❓ FAQ
Is this legal?
Yes — Hacker News publishes this data through its documented Firebase API (https://github.com/HackerNews/API). We respect their terms, surface every fetch in the logs, and pace requests so we stay a good citizen on the upstream.
Why use this instead of the Firebase API directly?
You get scheduled Apify runs, typed output rows, ISO timestamps, and the rank column the raw API doesn't provide. Plus it integrates with every Apify downstream — Sheets, S3, webhooks.
Can I scrape comments too?
Not in this Actor — comment trees fan out 10–100x per story and would blow up cost. We'll ship a sibling hacker-news-comments-scraper if there's demand.
How fresh is the data?
The Firebase API updates in near-real-time. Your run reflects whatever the front page looked like the moment your scrape hit each story.
💬 Your feedback
Spotted a bug, hit a weird edge case, or need a new field? Open an issue on the Actor's Issues tab on Apify Console — we ship fixes weekly and we read every report.