Hacker News Scraper
Pricing
Pay per event
Hacker News Scraper
Scrape Hacker News stories (top, new, best, ask, show, jobs) plus per-story metadata in one call — title, URL, score, author, comment count, posted-at — export to JSON or CSV. A Hacker News API wrapper that handles pagination, fan-out, retries, and rate-limit pacing.
Pricing
Pay per event
Rating
0.0
(0)
Developer
DevilScrapes
Maintained by CommunityActor stats
0
Bookmarked
2
Total users
1
Monthly active users
an hour ago
Last modified
Categories
Share
🎯 What this scrapes
This Actor fetches Hacker News story lists from any of the six available feeds — top, new, best, ask, show, or jobs — fans out to each individual story record, and writes one typed dataset row per story. The underlying Firebase API returns only item IDs at the feed level; we perform the full N+1 enrichment call per story and assemble the complete record before it hits your dataset.
You pick the feed and the result cap; we deliver clean, schema-validated rows on a schedule in JSON, CSV, or Excel — ready to pipe into Google Sheets, S3, a data warehouse, a webhook, or a RAG pipeline.
🔥 Features
- 🛡️ Browser fingerprint rotation —
curl-cffiimpersonates real Chrome, Firefox, and Safari TLS handshakes so we look like a browser, not a Python script. - 🌐 Residential proxy rotation via Apify Proxy — fresh session and exit IP on every block or rate-limit response.
- 🔁 Retries with exponential backoff on
408 / 429 / 5xx— up to 5 attempts per request,Retry-Afterheader honoured. - 🧱 Rate-limit-aware pacing — when the target pushes back we slow down and surface a clear status message instead of silently returning an empty dataset.
- 🧊 Clean, typed dataset rows — Pydantic-validated, ISO-8601 timestamps, stable IDs, feed rank included.
- 💰 Pay-Per-Event pricing — you pay only for results that land in your dataset. No data, no charge.
💡 Use cases
- Trend monitoring — diff top stories hourly to see which posts gain traction fastest.
- Comment-volume alerts — pipe rows into Slack when a story passes 100 comments.
- Lead gen for dev tools — surface Show HN launches that mention your stack and reach out early.
- Newsletter curation — feed the top 10 stories from the
bestfeed into a weekly digest. - ML training data — historical top-story metadata for score-prediction or topic-classification models.
- Show HN tracker — schedule a daily run against the
showfeed to watch new product launches.
⚙️ How to use it
- Click Try for free at the top of the Store page.
- Fill in the input form — most fields have sensible defaults (feed:
top, max results: 100). - Click Start. Output streams into the run's dataset in real time.
- Export from Storage → Dataset as JSON, CSV, or Excel — or pull via the Apify REST API.
📥 Input
| Field | Type | Required | Default | Notes |
|---|---|---|---|---|
feed | string | no | top | Which story feed to pull: top (front page), new (most recent), best (time-decayed best), ask, show, or jobs. |
maxResults | integer | no | 100 | Total dataset rows to produce. Each feed exposes up to 500 items; set to 0 for the full feed length. |
includeText | boolean | no | true | Fetch the full self-post body for Ask HN and Show HN entries. Has no effect on regular link stories. |
concurrency | integer | no | 8 | How many story records to fetch in parallel (1–32). |
proxyConfiguration | object | no | {"useApifyProxy": false} | Apify Proxy configuration. Enable residential proxies if you need to route traffic through Apify for compliance or high-volume runs. |
Example input
{"feed": "top","maxResults": 3,"includeText": false,"concurrency": 4,"proxyConfiguration": {"useApifyProxy": false}}
📤 Output
Every row is one dataset item.
| Field | Type | Notes |
|---|---|---|
id | integer | Hacker News story ID (stable, monotonically increasing). |
type | string | HN item type — story, job, ask, show, comment, poll. |
title | string | Story headline. |
url | string | null | Outbound link (null for self-posts). |
permalink | string | Hacker News permalink (news.ycombinator.com/item?id=...). |
by | string | Author username on Hacker News. |
score | integer | null | Upvotes — null for jobs and dead items. |
descendants | integer | null | Total comment count, including replies. |
text | string | null | Self-post body (Ask HN / Show HN). HTML; only present when includeText is true. |
time | integer | Unix epoch seconds — when the story was posted. |
posted_at | string | ISO-8601 UTC timestamp derived from time. |
scraped_at | string | ISO-8601 UTC timestamp of when this row was recorded. |
rank | integer | Position of this story in the feed at scrape time (1-indexed). |
Example output
{"id": 39000000,"type": "story","title": "Show HN: Devil Scrapes — public-data Apify Actors with honest pricing","url": "https://apify.com/DevilScrapes","permalink": "https://news.ycombinator.com/item?id=39000000","by": "devilscrapes","score": 142,"descendants": 33,"text": null,"time": 1747353600,"posted_at": "2026-05-15T20:00:00+00:00","scraped_at": "2026-05-15T20:05:00+00:00","rank": 1}
💰 Pricing
Pay-Per-Event — you pay only when these events fire:
| Event | USD | What it is |
|---|---|---|
actor-start | $0.005 | One-off warm-up charge per run |
result | $0.002 | Per dataset item written |
Example: 1,000 results at the rates above ≈ $2.00. No subscription, no minimum, no card to start — Apify gives every new account $5 of free credit.
🚧 Limitations
Comment threads are not expanded — we return descendants (the count) but not the full tree. Dead and deleted stories are skipped automatically. The text field for self-posts is raw HTML, not Markdown — run it through your own sanitiser before display. Each feed is capped at 500 items by the upstream; we cannot exceed that without supplementing via search.
❓ FAQ
Is scraping Hacker News legal?
Yes — Y Combinator makes Hacker News data available through a documented, open API at github.com/HackerNews/API. We fetch only what that API surfaces, pace requests responsibly, and surface every call in the run log.
Why use this instead of calling the API myself?
The raw API returns an array of item IDs at the feed level — you need a separate round-trip per story to get title, score, and comment count. At 500 stories that is 501 HTTP calls to coordinate, de-duplicate, and fan out concurrently. We do that work, add ISO timestamps, attach the feed rank column (which the API does not expose), and deliver structured rows you can export or schedule without writing a line of code.
What about the hacker news show HN tracker use case?
Set feed to show and schedule your run on a cron. Each run captures the Show HN feed at that point in time with title, score, comment count, and author — ready for a Slack alert or spreadsheet diff without any glue code.
Can I export Hacker News data to a spreadsheet?
Yes — finish a run, open Storage → Dataset, and click Export as CSV or Export as Excel. Every field in the output table maps cleanly to a spreadsheet column. You can also connect the dataset URL directly to a Google Sheets IMPORTDATA formula.
Can I scrape comments too?
Not in this Actor — comment trees fan out 10-100x per story and would multiply cost significantly. A sibling hacker-news-comments-scraper will follow if there is enough demand.
How fresh is the data?
The upstream API reflects changes in near-real time. Your run captures whatever the feed contained the moment each story record was fetched.
💬 Your feedback
Spotted a bug, hit a weird edge case, or need a new field? Open an issue on the Actor's Issues tab on Apify Console — we ship fixes weekly and we read every report.