Hacker News Scraper avatar

Hacker News Scraper

Pricing

Pay per event

Go to Apify Store
Hacker News Scraper

Hacker News Scraper

Scrape Hacker News stories (top, new, best, ask, show, jobs) plus per-story metadata in one call. We handle pagination, fan-out, retries, and rate-limit pacing — you get typed dataset rows with title, URL, score, author, comment count, timestamp, and permalink.

Pricing

Pay per event

Rating

0.0

(0)

Developer

DevilScrapes

DevilScrapes

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

7 days ago

Last modified

Share


🎯 What this scrapes

This Actor walks Hacker News for the story list you pick — top, new, best, ask, show, or jobs — fans out to each story's record, handles retries when the upstream hiccups, and writes one typed dataset row per story. You pick the feed; we deliver clean rows on a schedule, in JSON / CSV / Excel, ready to wire into Sheets, S3, a warehouse, or a webhook.

🔥 What we handle for you

  • 🛡️ Browser fingerprint rotationcurl-cffi impersonates real Chrome / Firefox / Safari TLS handshakes so the target sees a browser, not Python.
  • 🌐 Residential proxy rotation via Apify Proxy — fresh session and exit IP on every block.
  • 🔁 Retries with exponential backoff on 408 / 429 / 5xx — up to 5 attempts per page, Retry-After honoured.
  • 🧱 Rate-limit-aware pacing — when the target pushes back, we slow down instead of getting banned.
  • 🧊 Clean, typed dataset rows — Pydantic-validated, ISO-8601 timestamps, stable IDs, JSON / CSV / Excel export straight from the Apify Console.
  • 💰 Pay-Per-Event pricing — you only pay for results that hit your dataset. No data, no charge.

💡 Use cases

  • Trend monitoring — diff top stories hourly to see which posts gain traction fastest.
  • Comment-volume alerts — pipe rows into Slack when a story passes 100 comments.
  • Lead gen for dev tools — surface Show HN launches that mention your stack and reach out.
  • Newsletter curation — feed the top 10 stories from best into a weekly digest.
  • ML training data — historical top story metadata for clickbait or score-prediction models.

⚙️ How to use it

  1. Click Try for free at the top of the page.
  2. Fill in the input form — most fields have sensible defaults.
  3. Click Start. Output streams into the run's dataset.
  4. Export from Storage → Dataset as JSON, CSV, or Excel — or fetch via the API.

📥 Input

FieldTypeRequiredDefaultNotes
feedstringno'top'Which Hacker News story feed to pull from. top mirrors the front page; best is the time-decaye
maxResultsintegerno100Total dataset items to keep. The feeds expose up to 500 items each; pulling all of them costs ~500 results. Set to <code
includeTextbooleannoTrueFetch the full self-post body for Ask HN / Show HN entries. Has no effect on regular link stories.
concurrencyintegerno8How many story records to fetch in parallel. The Firebase endpoint is generous; 8 is comfortable.
proxyConfigurationobjectno{'useApifyProxy': False}Apify Proxy is optional here — the Firebase API is happy to be hit directly. Enable proxy only if you're routing all tra

Example input

{
"feed": "top",
"maxResults": 3,
"includeText": false,
"concurrency": 4,
"proxyConfiguration": {
"useApifyProxy": false
}
}

📤 Output

Every row is one dataset item.

FieldTypeNotes
idintegerHacker News story ID (stable, monotonically increasing).
typestringHN item type — story, job, ask, show, comment, poll.
titlestringStory headline.
url['string', 'null']Outbound link (null for self-posts).
permalinkstringHacker News permalink (news.ycombinator.com/item?id=...).
bystringAuthor username on Hacker News.
score['integer', 'null']Upvotes — null for jobs / dead items.
descendants['integer', 'null']Total comment count, including replies.
text['string', 'null']Self-post body (Ask HN / Show HN). HTML, only present when includeText=true.
timeintegerUnix epoch seconds — when the story was posted.
posted_atstringISO-8601 UTC timestamp derived from time.
scraped_atstringISO-8601 UTC timestamp of when this row was recorded.
rankintegerPosition of this story in the feed at scrape time (1-indexed).

Example output

{
"id": 39000000,
"type": "story",
"title": "Show HN: Devil Scrapes \u2014 public-data Apify Actors with honest pricing",
"url": "https://apify.com/DevilScrapes",
"permalink": "https://news.ycombinator.com/item?id=39000000",
"by": "devilscrapes",
"score": 142,
"descendants": 33,
"text": null,
"time": 1747353600,
"posted_at": "2026-05-15T20:00:00+00:00",
"scraped_at": "2026-05-15T20:05:00+00:00",
"rank": 1
}

💰 Pricing

Pay-Per-Event — you pay only when these events fire:

EventUSDWhat it is
actor-start$0.005One-off warm-up charge per run
result$0.002Per dataset item

Example: 1 000 results at the rates above ≈ $2.00. No subscription, no minimum, no card to start — Apify gives every new account $5 of free credit.

🚧 Limitations

Comment threads aren't expanded — we return descendants (the count) but not the tree. Dead and deleted stories are skipped. The text field for self-posts is HTML, not Markdown — render it through your own sanitiser before display.

❓ FAQ

Is this legal?

Yes — Hacker News publishes this data through its documented Firebase API (https://github.com/HackerNews/API). We respect their terms, surface every fetch in the logs, and pace requests so we stay a good citizen on the upstream.

Why use this instead of the Firebase API directly?

You get scheduled Apify runs, typed output rows, ISO timestamps, and the rank column the raw API doesn't provide. Plus it integrates with every Apify downstream — Sheets, S3, webhooks.

Can I scrape comments too?

Not in this Actor — comment trees fan out 10–100x per story and would blow up cost. We'll ship a sibling hacker-news-comments-scraper if there's demand.

How fresh is the data?

The Firebase API updates in near-real-time. Your run reflects whatever the front page looked like the moment your scrape hit each story.

💬 Your feedback

Spotted a bug, hit a weird edge case, or need a new field? Open an issue on the Actor's Issues tab on Apify Console — we ship fixes weekly and we read every report.