Hacker News Scraper avatar

Hacker News Scraper

Pricing

from $0.40 / 1,000 data fetcheds

Go to Apify Store
Hacker News Scraper

Hacker News Scraper

Scrapes Hacker News stories, comments, jobs, polls, and user profiles via the official Firebase and Algolia APIs. Supports full-text search, Who's Hiring thread extraction, author karma snapshots, and deep comment trees.

Pricing

from $0.40 / 1,000 data fetcheds

Rating

5.0

(1)

Developer

Omar Eldeeb

Omar Eldeeb

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

2 days ago

Last modified

Share

Scrape Hacker News stories, comments, jobs, polls, and user profiles with no proxy, no captcha, no headless browser. Built directly on the official Firebase and Algolia APIs, so it's fast, reliable, and cheap.

What does Hacker News Scraper do?

This actor extracts structured data from Hacker News — stories, comments, jobs, user profiles, and the monthly "Ask HN: Who is hiring?" threads — via HN's two official public APIs (the Firebase API and the Algolia search API). It produces a unified JSON output with titles, URLs, points, authors, timestamps, and optional full comment trees with author karma snapshots.

Unlike most HN scrapers on the Store that parse HTML and require proxies, this one hits the official APIs directly. You can schedule it, call it via REST/Webhook/MCP from the Apify platform, pipe its output to downstream actors, and never worry about rate limits, captchas, or IP blocks.

Why use Hacker News Scraper?

  • 🏢 Recruiters & talent teams — pull the latest "Who is hiring?" monthly thread as structured job rows, filter by role keywords, feed into your ATS.
  • 📈 Market intelligence & PR teams — monitor mentions of your company, competitors, or product categories with full-text search.
  • 🧠 AI/LLM data pipelines — build clean, deduplicated training or retrieval-augmented-generation datasets from HN discussions, with author karma signals.
  • 📰 Newsletters & aggregators — daily digest of top / new / best stories above a score threshold, optionally filtered to specific domains.
  • 🎯 Research & academia — reproducible HN corpora for discourse analysis, link-graph research, or longitudinal studies of tech discussion.

How to use Hacker News Scraper

  1. Click Try for free on this actor's page to open the Apify Console input form.
  2. Pick a mode — start with topstories to see how it works.
  3. Set Max items (default 30). For a first run, try 10.
  4. Click Save & Start.
  5. When the run finishes, open the Output tab to see the scraped items, or use Export to download as JSON, CSV, or Excel.
  6. For more advanced runs, enable Include comments to attach comment trees, or use mode=search to run a keyword query.

Input

Every field is configurable via the input form. See the Input tab for live validation and tooltips. The only required field is mode.

Minimal examples (copy into the Console's "JSON" tab):

Top 30 front-page stories:

{ "mode": "topstories", "maxItems": 30 }

Stories mentioning "Claude", newest first, ≥ 50 points:

{
"mode": "search",
"searchQuery": "Claude",
"sortSearchBy": "date",
"minScore": 50,
"maxItems": 100
}

Show HN from github.com or arxiv.org with heavy discussion:

{
"mode": "showstories",
"minScore": 50,
"minComments": 20,
"domainFilter": ["github.com", "arxiv.org"],
"maxItems": 50
}

Latest "Who is hiring?" thread as structured job rows:

{ "mode": "hiring_threads", "maxItems": 500 }

A story + its full 3-level comment tree + author karma:

{
"mode": "topstories",
"maxItems": 5,
"includeComments": true,
"maxCommentDepth": 3,
"includeUserProfiles": true
}

Output

Every row — whether it's a story, comment, job, or poll — uses the same unified shape. You can download the dataset in various formats such as JSON, HTML, CSV, or Excel.

{
"id": 47822805,
"type": "story",
"title": "SPEAKE(a)R: Turn Speakers to Microphones for Fun and Profit",
"url": "https://www.usenix.org/system/files/conference/woot17/woot17-paper-guri.pdf",
"domain": "usenix.org",
"hnUrl": "https://news.ycombinator.com/item?id=47822805",
"by": "Eridanus2",
"byUserKarma": 1847,
"byUserCreated": 1625097600,
"score": 67,
"time": 1776588348,
"createdAt": "2026-04-19T08:45:48.000Z",
"descendants": 26,
"comments": [
{
"id": 47823010,
"by": "someuser",
"text": "<p>Interesting approach...</p>",
"depth": 1,
"replies": [{ "id": 47823201, "by": "replyer", "text": "...", "depth": 2 }]
}
],
"scrapedAt": "2026-04-19T11:21:58.601Z",
"source": "firebase"
}

Data fields

FieldTypeDescription
idnumberUnique HN item ID
typeenumstory / comment / job / poll / pollopt
titlestringStory / job title (null for comments)
urlstringOutbound URL for stories (null for text-only)
domainstringExtracted hostname (e.g., github.com)
hnUrlstringDeep link to the item on news.ycombinator.com
bystringAuthor username
byUserKarmanumberAuthor's karma (when includeUserProfiles=true)
byUserCreatednumberUnix timestamp of author account creation
scorenumberPoints (votes)
timenumberUnix timestamp of submission
createdAtstringISO 8601 timestamp
textstringHTML body (comments, Ask HN, job descriptions)
descendantsnumberTotal comments on a story
parentnumberParent ID for comments
commentsarrayNested comment tree (when includeComments=true)
flatCommentsarrayFlat list with depth (when flattenComments=true)
deleted / deadbooleanItem status flags
scrapedAtstringISO timestamp when this row was fetched
sourceenumfirebase or algolia — which API returned it

How much does it cost to scrape Hacker News?

This actor uses pay-per-event pricing — you only pay for the rows you get, no monthly subscription.

EventPrice
story-fetched$0.00040 / item ($0.40 per 1,000 stories or jobs)
comment-fetched$0.00015 / comment ($0.15 per 1,000 comments)
user-profile-fetched$0.00030 / profile ($0.30 per 1,000 profiles — only when includeUserProfiles=true)

The first 50 chargeable events in every run are free. That means any run with ≤ 50 total output rows + comments + profiles costs nothing beyond the platform's trivial startup fee. You only pay for events 51+ within the same run.

Typical run costs (after the 50-event trial is exhausted in the same run):

  • 100 top stories (no comments): 100 stories → 50 free + 50 paid = $0.020
  • 30 top stories + 10 comments each (≈ 330 events): **$0.046**
  • Monthly "Who is hiring?" full extract (≈ 500 jobs): $0.180
  • 1,000 search hits on a keyword: $0.380
  • A 30-story smoke test with no comments: $0 (entirely free)

Tips & advanced options

  • Comment depthmaxCommentDepth=3 is a good default. 5+ is only useful for megathreads; 1 gives you only top-level replies.
  • Domain filter — combine with topstories or showstories for a recruiter-style feed of GitHub / Arxiv / company-blog stories.
  • Date ranges — pass dateFromUnix / dateToUnix (Unix seconds) to scope search mode. Works best with sortSearchBy=date.
  • User karma snapshotincludeUserProfiles=true adds byUserKarma + byUserCreated to every row. Useful for weighting by poster reputation.
  • Flatten comments for spreadsheets — set flattenComments=true to get a flat array with a depth field, which exports cleanly to CSV/Excel.
  • Schedule it — use Apify's scheduler to run topstories every hour for a live HN feed, or hiring_threads monthly.
  • Chain it — pipe the output into another actor (e.g., a text classifier, Slack poster) via Apify's integrations.

FAQ, disclaimers & support

Does this need a proxy? No. Both HN APIs are public and unmetered.

How fresh is the data? Firebase list endpoints update roughly every 5 minutes (HN's own cadence). user and hiring_threads are live.

What counts as a "comment" for billing? Only comments that actually appear in your output dataset — i.e. inside the depth limit you set. Comments skipped because they're deleted or beyond maxCommentDepth are free.

Can I search for comments by a specific author? Yes — set mode=search with searchTags: ["comment", "author_dang"].

Why are some by fields null? Deleted items have no author. They're filtered out by default.

Legality. Hacker News data is publicly accessible and both APIs are officially sanctioned by Y Combinator. This actor respects those APIs' rate limits and does not bypass any access control. You are responsible for compliant use of the data under Y Combinator's Terms of Service and any applicable privacy laws (GDPR, CCPA, etc.) when processing personal data such as usernames.

Found a bug or want a feature? Open an issue on the actor's Issues tab. Custom extensions (e.g., Slack / Discord forwarding, semantic dedupe, dashboards) are available on request.