Pricing

Pay per event

Hacker News Scraper

Scrape Hacker News stories (top, new, best, ask, show, jobs) plus per-story metadata in one call — title, URL, score, author, comment count, posted-at — export to JSON or CSV. A Hacker News API wrapper that handles pagination, fan-out, retries, and rate-limit pacing.

Pricing

Pay per event

Rating

0.0

(0)

Developer

DevilScrapes

Actor stats

Bookmarked

Total users

Monthly active users

2 months ago

Last modified

🎯 What this scrapes

This Actor fetches Hacker News story lists from any of the six available feeds — top, new, best, ask, show, or jobs — fans out to each individual story record, and writes one typed dataset row per story. The underlying Firebase API returns only item IDs at the feed level; we perform the full N+1 enrichment call per story and assemble the complete record before it hits your dataset.

You pick the feed and the result cap; we deliver clean, schema-validated rows on a schedule in JSON, CSV, or Excel — ready to pipe into Google Sheets, S3, a data warehouse, a webhook, or a RAG pipeline.

🔥 Features

🛡️ Browser fingerprint rotation — curl-cffi impersonates real Chrome, Firefox, and Safari TLS handshakes so we look like a browser, not a Python script.
🌐 Residential proxy rotation via Apify Proxy — fresh session and exit IP on every block or rate-limit response.
🔁 Retries with exponential backoff on 408 / 429 / 5xx — up to 5 attempts per request, Retry-After header honoured.
🧱 Rate-limit-aware pacing — when the target pushes back we slow down and surface a clear status message instead of silently returning an empty dataset.
🧊 Clean, typed dataset rows — Pydantic-validated, ISO-8601 timestamps, stable IDs, feed rank included.
💰 Pay-Per-Event pricing — you pay only for results that land in your dataset. No data, no charge.

💡 Use cases

Trend monitoring — diff top stories hourly to see which posts gain traction fastest.
Comment-volume alerts — pipe rows into Slack when a story passes 100 comments.
Lead gen for dev tools — surface Show HN launches that mention your stack and reach out early.
Newsletter curation — feed the top 10 stories from the best feed into a weekly digest.
ML training data — historical top-story metadata for score-prediction or topic-classification models.
Show HN tracker — schedule a daily run against the show feed to watch new product launches.

⚙️ How to use it

Click Try for free at the top of the Store page.
Fill in the input form — most fields have sensible defaults (feed: top, max results: 100).
Click Start. Output streams into the run's dataset in real time.
Export from Storage → Dataset as JSON, CSV, or Excel — or pull via the Apify REST API.

📥 Input

Field	Type	Required	Default	Notes
`feed`	`string`	no	`top`	Which story feed to pull: `top` (front page), `new` (most recent), `best` (time-decayed best), `ask`, `show`, or `jobs`.
`maxResults`	`integer`	no	`100`	Total dataset rows to produce. Each feed exposes up to 500 items; set to `0` for the full feed length.
`includeText`	`boolean`	no	`true`	Fetch the full self-post body for Ask HN and Show HN entries. Has no effect on regular link stories.
`concurrency`	`integer`	no	`8`	How many story records to fetch in parallel (1–32).
`proxyConfiguration`	`object`	no	`{"useApifyProxy": false}`	Apify Proxy configuration. Enable residential proxies if you need to route traffic through Apify for compliance or high-volume runs.

Example input

{
  "feed": "top",
  "maxResults": 3,
  "includeText": false,
  "concurrency": 4,
  "proxyConfiguration": {
    "useApifyProxy": false
  }
}

📤 Output

Every row is one dataset item.

Field	Type	Notes
`id`	`integer`	Hacker News story ID (stable, monotonically increasing).
`type`	`string`	HN item type — `story`, `job`, `ask`, `show`, `comment`, `poll`.
`title`	`string`	Story headline.
`url`	`string \| null`	Outbound link (null for self-posts).
`permalink`	`string`	Hacker News permalink (`news.ycombinator.com/item?id=...`).
`by`	`string`	Author username on Hacker News.
`score`	`integer \| null`	Upvotes — null for jobs and dead items.
`descendants`	`integer \| null`	Total comment count, including replies.
`text`	`string \| null`	Self-post body (Ask HN / Show HN). HTML; only present when `includeText` is `true`.
`time`	`integer`	Unix epoch seconds — when the story was posted.
`posted_at`	`string`	ISO-8601 UTC timestamp derived from `time`.
`scraped_at`	`string`	ISO-8601 UTC timestamp of when this row was recorded.
`rank`	`integer`	Position of this story in the feed at scrape time (1-indexed).

Example output

{
  "id": 39000000,
  "type": "story",
  "title": "Show HN: Devil Scrapes — public-data Apify Actors with honest pricing",
  "url": "https://apify.com/DevilScrapes",
  "permalink": "https://news.ycombinator.com/item?id=39000000",
  "by": "devilscrapes",
  "score": 142,
  "descendants": 33,
  "text": null,
  "time": 1747353600,
  "posted_at": "2026-05-15T20:00:00+00:00",
  "scraped_at": "2026-05-15T20:05:00+00:00",
  "rank": 1
}

💰 Pricing

Pay-Per-Event — you pay only when these events fire:

Event	USD	What it is
`actor-start`	$0.005	One-off warm-up charge per run
`result`	$0.002	Per dataset item written

Example: 1,000 results at the rates above ≈ $2.00. No subscription, no minimum, no card to start — Apify gives every new account $5 of free credit.

🚧 Limitations

Comment threads are not expanded — we return descendants (the count) but not the full tree. Dead and deleted stories are skipped automatically. The text field for self-posts is raw HTML, not Markdown — run it through your own sanitiser before display. Each feed is capped at 500 items by the upstream; we cannot exceed that without supplementing via search.

❓ FAQ

Is scraping Hacker News legal?

Yes — Y Combinator makes Hacker News data available through a documented, open API at github.com/HackerNews/API. We fetch only what that API surfaces, pace requests responsibly, and surface every call in the run log.

Why use this instead of calling the API myself?

The raw API returns an array of item IDs at the feed level — you need a separate round-trip per story to get title, score, and comment count. At 500 stories that is 501 HTTP calls to coordinate, de-duplicate, and fan out concurrently. We do that work, add ISO timestamps, attach the feed rank column (which the API does not expose), and deliver structured rows you can export or schedule without writing a line of code.

What about the hacker news show HN tracker use case?

Set feed to show and schedule your run on a cron. Each run captures the Show HN feed at that point in time with title, score, comment count, and author — ready for a Slack alert or spreadsheet diff without any glue code.

Can I export Hacker News data to a spreadsheet?

Yes — finish a run, open Storage → Dataset, and click Export as CSV or Export as Excel. Every field in the output table maps cleanly to a spreadsheet column. You can also connect the dataset URL directly to a Google Sheets IMPORTDATA formula.

Can I scrape comments too?

Not in this Actor — comment trees fan out 10-100x per story and would multiply cost significantly. A sibling hacker-news-comments-scraper will follow if there is enough demand.

How fresh is the data?

The upstream API reflects changes in near-real time. Your run captures whatever the feed contained the moment each story record was fetched.

💬 Your feedback

Spotted a bug, hit a weird edge case, or need a new field? Open an issue on the Actor's Issues tab on Apify Console — we ship fixes weekly and we read every report.

Hacker News Scraper

ef12/hacker-news-scraper

Scrape Hacker News stories (top, ask, show, best, new) with points, authors, comment counts, and URLs.

Daniel Wilson

Hacker News Scraper

vernacular_reservoir/hacker-news-scraper

Scrape Hacker News top, new, best, ask, show and jobs stories. Extract title, URL, score, author, comment count and age. Optionally include top comments. No API key required. Perfect for tech news monitoring and trend analysis.

Aleksandrs

Hacker News Scraper

velvety_bedbug/hacker-news-scraper

Scrape Hacker News stories: top, new, best, ask, show, and job posts. Returns title, URL, score, author, time, comment count, and direct HN link. Free Firebase API, no auth required.

Peters Bugs

Hacker News Scraper

sweet_rebel/hacker-news-scraper

Rajat Sharda

Hacker News Scraper

klondikeking/hacker-news-scraper

Pierrick McD0nald

Hacker News Scraper

technicaldost/hackernews-scraper

Scrape Hacker News stories, comments, jobs and user profiles. Filter by top, new, best or Ask/Show HN. Export points, authors, timestamps and links as structured JSON.

Technical Dost Solutions

Hacker News Scraper

exuberant_volley/hackernews-scraper

Scrape Hacker News stories — top, new or best — with title, url, score, comment count and timestamps, via the official HN Firebase API. Clean JSON, no author usernames, no personal data.

ScrapeForge

Hacker News Scraper

cloud9_ai/hackernews-scraper

Scrape Hacker News stories, comments, and user profiles via official Firebase API. Get top, new, best, ask, show stories with scores, comments, and author data.

cloud9

Hacker News Scraper - Stories, Comments, Jobs, Users

piposlab/hacker-news-scraper

Scrape Hacker News via official APIs: top/new/best/Ask/Show/Jobs lists, full-text search, comment trees and user profiles. No API key.

Alejandro Bufarini

Hacker News Scraper

gentle_cloud/hacker-news-scraper

Scrape Hacker News stories, comments, and user data. Supports top/new/best/ask/show/job story feeds and full-text keyword search via the Algolia API. Extract titles, URLs, scores, authors, comment counts, and timestamps.