Pricing

$5.00 / 1,000 story scrapeds

HN Top Stories Scraper

Scrape Hacker News top stories — extract title, URL, score, author, comment count, and submission time. Monitor HN front page in real time. CSV/JSON.

Pricing

$5.00 / 1,000 story scrapeds

Rating

0.0

(0)

Developer

Web Data Labs

Actor stats

Bookmarked

Total users

Monthly active users

a month ago

Last modified

Hacker News Top Stories Scraper

Pull the current Hacker News front page (or top / new / best / ask / show / job lists) as structured JSON. Title, URL, score, author, comment count, age, and the discussion thread link — ready for dashboards, digests, alerts, and ML pipelines.

Built by Web Data Labs and hosted on Apify with managed retries and uptime.

Why use a scraper for HN?

Hacker News exposes a public Firebase API — and for many use cases, that API is the right tool. So why does this actor exist?

The API gives you IDs, not stories. To assemble a top-30 list with titles, URLs, scores, and comment counts you need 1 list call + 30 item calls + handling for missing/dead/deleted items. This actor does that for you in one call.
You don't want to build the plumbing. Caching, retries, rate handling, schema normalization, edge cases (jobs without URLs, polls, "Ask HN" titles, dead items) — all already solved.
You want filtering at the source. Minimum score? Story type? Posted in the last N hours? One JSON input field instead of a custom Lambda.
You want one consistent output schema across HN, Reddit, Lobsters, Substack, and other community-news sources. This actor's shape matches the rest of the cryptosignals catalogue so you can stitch them together with no glue code.
You want it as a job, not a dependency. Runs on a schedule, dumps to a dataset, hits your webhook. No server, no cron, no error pages at 3am.

If you just want to play with HN data interactively, the public API is great. If you want a reliable feed into a product, dashboard, or notification pipeline, this actor is faster.

What you get

Each item in the output represents one Hacker News story:

Field	Description
`id`	HN item ID (canonical, stable).
`title`	Story title as posted.
`url`	Outbound URL (null for Ask HN / Show HN text posts).
`score`	Current upvote score.
`by`	Username of the submitter.
`commentCount`	Number of comments (`descendants` in HN's terminology).
`time`	Unix timestamp (seconds) of submission.
`ageHours`	Hours since submission (computed at scrape time).
`hnUrl`	Direct link to the HN discussion thread.
`domain`	Hostname of `url` (e.g. `github.com`), null for text posts.
`type`	`story`, `job`, `ask`, `show`, `poll`.
`source`	The list this story came from (`top`, `new`, `best`, etc.).

Sample output

[
  {
    "id": 39842715,
    "title": "Show HN: I built a tool to extract data from any website",
    "url": "https://example.com/launch",
    "score": 412,
    "by": "founder123",
    "commentCount": 138,
    "time": 1709905200,
    "ageHours": 4.2,
    "hnUrl": "https://news.ycombinator.com/item?id=39842715",
    "domain": "example.com",
    "type": "story",
    "source": "top"
  },
  {
    "id": 39842901,
    "title": "Ask HN: How do you keep up with new ML papers?",
    "url": null,
    "score": 187,
    "by": "ml_curious",
    "commentCount": 96,
    "time": 1709908800,
    "ageHours": 3.2,
    "hnUrl": "https://news.ycombinator.com/item?id=39842901",
    "domain": null,
    "type": "ask",
    "source": "top"
  }
]

Stories are returned in HN's native ranking order for each list (i.e. top-of-list first).

Use cases

1. Daily digest emails / Slack bots. Run once a day at 9am, take the top 10 stories with score >= 100, format them into a digest, post to Slack or send via email. Five-line glue script.

2. Trending-topics dashboards. Feed scores and comment counts into a time-series store and chart momentum. Catch stories that are climbing fast before they peak.

3. Competitive monitoring. Filter for stories where domain matches your company's domain — or your competitors'. Get notified the moment something hits the front page.

4. Tech news ingestion for ML. Pull top and best daily, push to a vector store, run topic classification or summarization. Build a personalized "what's interesting today" feed.

5. Ask HN / Show HN watchlist. Filter by type=ask or type=show to track community questions and product launches without scrolling the site.

6. Hiring signal. Watch the monthly "Who is hiring?" thread and Show HN launches to identify hot startups, technologies, and hiring trends.

Input

The actor accepts a JSON input. The example default is:

{
  "count": 30,
  "type": "top",
  "minScore": 50
}

Typical fields:

type — which HN list to pull. One of: top, new, best, ask, show, job. Default top.
count — how many stories to fetch from the chosen list. Default 30.
minScore — drop stories below this score. Useful for "front page worth reading" filters.
maxAgeHours — drop stories older than this many hours.
domains — optional allowlist of domains (e.g. ["github.com", "arxiv.org"]).
excludeDomains — optional blocklist.

Open the actor in the Apify Console and the form-style editor documents every field with examples. You don't need to memorize anything.

How to run it

1. Apify Console (no code)

Open the actor, edit input, click Start. Output lands in the run's dataset and exports as JSON, CSV, Excel, or RSS feed.

2. Apify API

Synchronous run that returns dataset items in the response — perfect for cron jobs and webhooks:

curl -X POST "https://api.apify.com/v2/acts/cryptosignals~hn-top-stories/run-sync-get-dataset-items?token=YOUR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"count": 30, "type": "top", "minScore": 100}'

Async run:

curl -X POST "https://api.apify.com/v2/acts/cryptosignals~hn-top-stories/runs?token=YOUR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"count": 100, "type": "best", "minScore": 200}'

Then poll GET /v2/acts/cryptosignals~hn-top-stories/runs/{runId} and fetch items from defaultDatasetId.

3. Apify JavaScript SDK

import { ApifyClient } from 'apify-client';

const client = new ApifyClient({ token: process.env.APIFY_TOKEN });

const run = await client.actor('cryptosignals/hn-top-stories').call({
  count: 30,
  type: 'top',
  minScore: 100,
});

const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach(s => console.log(`${s.score}\t${s.title}\t${s.hnUrl}`));

4. Apify Python SDK

from apify_client import ApifyClient

client = ApifyClient(token="YOUR_TOKEN")

run = client.actor("cryptosignals/hn-top-stories").call(run_input={
    "count": 30,
    "type": "top",
    "minScore": 100,
})

for s in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(f"{s['score']:>4}  {s['title']}")
    print(f"      {s['hnUrl']}")

5. Schedule it

In the Apify Console go to Schedules → Create new and pick this actor. Set a cron expression (e.g. 0 9 * * * for daily 9am UTC) and an input. Use a webhook on the schedule to push results into Slack / Discord / your API on every run.

Pricing

Pay Per Event:

$0.005 per scraped story.
No compute-minute charges, no proxy charges, no per-request fees.
A typical front-page run (30 stories) costs $0.15. A count: 100 run costs $0.50.
Failed runs that produce no items cost you nothing.
Apify free accounts get monthly credit — usually enough to run a daily 30-story digest at no cost.

Output destinations

Apify dataset (default) — query later via API, export as JSON/CSV.
Webhooks — fire on run completion and POST the dataset URL to your endpoint.
Apify integrations — Zapier, Make, Slack, Google Sheets, Airtable, Pipedream all available out of the box from the actor's run page.
RSS — the dataset has a built-in RSS view if you'd rather treat HN as a feed.

FAQ

How fresh is the data? Each run pulls live from HN at the time of the request. The HN ranking algorithm itself updates roughly every minute or two.

Why is url sometimes null? Ask HN, Show HN (text-only), and poll posts have no outbound URL. Use hnUrl to get to the discussion.

Can I get the comments too? This actor focuses on the story listing — the high-frequency, low-cost feed. Comment-tree extraction is a separate concern (much larger payloads, much higher cost). Reach out via the actor page if you need it.

What if a story is deleted between list-fetch and detail-fetch? The actor silently drops it. Your dataset will never contain null rows or items missing core fields.

Does this comply with HN's terms? HN's API and front-end content are publicly accessible and intended for programmatic use. The actor uses public endpoints only and respects rate limits. Don't use the data for spam, mass-DM campaigns, or to harass posters — that's on you.

Other actors you might like

amazon-scraper — Amazon products, prices, ratings, reviews across all major locales.
See the full catalogue at apify.com/cryptosignals — Reddit, Lobsters, Product Hunt, GitHub trending, and more community-news / market-data sources, all using the same input/output conventions.

Support

Web: web-data-labs.com
Issues: open an issue on the actor page on Apify.
Updates: actively maintained. If HN changes its layout or API behavior, fixes typically ship within 24 hours.

Hackernews Scraper

fortuitous_pirate/hackernews-scraper

Extract stories, jobs, Ask HN, and Show HN posts from Hacker News. Get top stories, best stories, job listings, or search by keyword. Returns title, URL, score, comment count, author, and timestamp. Free API — no authentication required.

Fortuitous Pirate

Hacker News Scraper — Stories, Comments & Jobs

cryptosignals/hackernews-scraper

Scrape Hacker News stories, comments, and user profiles — extract title, URL, score, author, comment threads, and submission time. CSV/JSON output.

Web Data Labs

Hacker News Scraper & API - Export Stories, Comments, Data

fresh_cliff/hackernews-scraper

Extract top stories, trending posts, points, comments & authors from Hacker News front page. Real-time data export to JSON/CSV. Monitor tech trends, analyze viral content, track HN activity. Fast Playwright scraper.

Brennan Crawford

Hacker News Scraper — Stories, Comments & Users

openclawmara/hacker-news-scraper

Scrape Hacker News stories, comments, and user profiles. Extract trending tech news, top stories by score, new submissions, Ask HN, Show HN, and job posts. Filter by date, score, and comment count. Perfect for tech trend analysis, competitive intelligence, and content curation.

OpenClaw Mara

Hacker News Front Page Scraper

money_machine_agent/hn-front-page-scraper

Pulls stories from HN front page, best, Show HN, Ask HN, jobs feeds, or by keyword search. Title, URL, author, points, comments, tags. Free Algolia + Firebase APIs.

Shane Miller

Hacker News Scraper

muscular_quadruplet/hackernews-scraper

Scrape Hacker News stories, comments, and user profiles. Extract top stories, new posts, Show HN, Ask HN. Monitor tech trends, track discussions, build news aggregators. Real-time tech news scraping.

Do It

Hacker News Search Scraper Stories, Comments, Show HN, Ask HN

seemuapps/hn-search-scraper

Search Hacker News stories, comments, Show HN, Ask HN, polls, and jobs by keyword, author, date range, points, and comment count. Full text and engagement metrics. No login.

Andrew

Hacker News Scraper Pro — Stories, Jobs, Show HN, Ask HN

diverse_venture/hackernews-scraper

Comprehensive Hacker News scraper. Get top/new/best stories, Ask HN, Show HN, Who's Hiring jobs, comments, and search results. Uses the official HN Firebase API + Algolia search — no auth required. Export JSON, CSV, or Excel.

Chak Man Fung

Hacker News MCP Server

automation-lab/hackernews-mcp-server

Query Hacker News data programmatically: search stories, get top posts, Ask HN, Show HN, jobs, comments, and user profiles via the free HN Algolia API.

Stas Persiianenko

Hacker News Search — Stories, Comments & Developer Sentiment

ryanclinton/hackernews-search

Search and extract stories, comments, polls, Show HN, and Ask HN posts from Hacker News. This actor uses the Algolia HN Search API to find content by keyword, filter by author, date range, minimum points, and comment count -- then returns clean, structured JSON ready for analysis, monitoring, or ...