HN Top Stories Scraper
Pricing
$5.00 / 1,000 story scrapeds
HN Top Stories Scraper
Scrape Hacker News top stories — extract title, URL, score, author, comment count, and submission time. Monitor HN front page in real time. CSV/JSON.
Pricing
$5.00 / 1,000 story scrapeds
Rating
0.0
(0)
Developer
Web Data Labs
Actor stats
0
Bookmarked
4
Total users
3
Monthly active users
5 days ago
Last modified
Categories
Share
Hacker News Top Stories Scraper
Pull the current Hacker News front page (or top / new / best / ask / show / job lists) as structured JSON. Title, URL, score, author, comment count, age, and the discussion thread link — ready for dashboards, digests, alerts, and ML pipelines.
Built by Web Data Labs and hosted on Apify with managed retries and uptime.
Why use a scraper for HN?
Hacker News exposes a public Firebase API — and for many use cases, that API is the right tool. So why does this actor exist?
- The API gives you IDs, not stories. To assemble a top-30 list with titles, URLs, scores, and comment counts you need 1 list call + 30 item calls + handling for missing/dead/deleted items. This actor does that for you in one call.
- You don't want to build the plumbing. Caching, retries, rate handling, schema normalization, edge cases (jobs without URLs, polls, "Ask HN" titles, dead items) — all already solved.
- You want filtering at the source. Minimum score? Story type? Posted in the last N hours? One JSON input field instead of a custom Lambda.
- You want one consistent output schema across HN, Reddit, Lobsters, Substack, and other community-news sources. This actor's shape matches the rest of the cryptosignals catalogue so you can stitch them together with no glue code.
- You want it as a job, not a dependency. Runs on a schedule, dumps to a dataset, hits your webhook. No server, no cron, no error pages at 3am.
If you just want to play with HN data interactively, the public API is great. If you want a reliable feed into a product, dashboard, or notification pipeline, this actor is faster.
What you get
Each item in the output represents one Hacker News story:
| Field | Description |
|---|---|
id | HN item ID (canonical, stable). |
title | Story title as posted. |
url | Outbound URL (null for Ask HN / Show HN text posts). |
score | Current upvote score. |
by | Username of the submitter. |
commentCount | Number of comments (descendants in HN's terminology). |
time | Unix timestamp (seconds) of submission. |
ageHours | Hours since submission (computed at scrape time). |
hnUrl | Direct link to the HN discussion thread. |
domain | Hostname of url (e.g. github.com), null for text posts. |
type | story, job, ask, show, poll. |
source | The list this story came from (top, new, best, etc.). |
Sample output
[{"id": 39842715,"title": "Show HN: I built a tool to extract data from any website","url": "https://example.com/launch","score": 412,"by": "founder123","commentCount": 138,"time": 1709905200,"ageHours": 4.2,"hnUrl": "https://news.ycombinator.com/item?id=39842715","domain": "example.com","type": "story","source": "top"},{"id": 39842901,"title": "Ask HN: How do you keep up with new ML papers?","url": null,"score": 187,"by": "ml_curious","commentCount": 96,"time": 1709908800,"ageHours": 3.2,"hnUrl": "https://news.ycombinator.com/item?id=39842901","domain": null,"type": "ask","source": "top"}]
Stories are returned in HN's native ranking order for each list (i.e. top-of-list first).
Use cases
1. Daily digest emails / Slack bots. Run once a day at 9am, take the top 10 stories with score >= 100, format them into a digest, post to Slack or send via email. Five-line glue script.
2. Trending-topics dashboards. Feed scores and comment counts into a time-series store and chart momentum. Catch stories that are climbing fast before they peak.
3. Competitive monitoring. Filter for stories where domain matches your company's domain — or your competitors'. Get notified the moment something hits the front page.
4. Tech news ingestion for ML. Pull top and best daily, push to a vector store, run topic classification or summarization. Build a personalized "what's interesting today" feed.
5. Ask HN / Show HN watchlist. Filter by type=ask or type=show to track community questions and product launches without scrolling the site.
6. Hiring signal. Watch the monthly "Who is hiring?" thread and Show HN launches to identify hot startups, technologies, and hiring trends.
Input
The actor accepts a JSON input. The example default is:
{"count": 30,"type": "top","minScore": 50}
Typical fields:
type— which HN list to pull. One of:top,new,best,ask,show,job. Defaulttop.count— how many stories to fetch from the chosen list. Default 30.minScore— drop stories below this score. Useful for "front page worth reading" filters.maxAgeHours— drop stories older than this many hours.domains— optional allowlist of domains (e.g.["github.com", "arxiv.org"]).excludeDomains— optional blocklist.
Open the actor in the Apify Console and the form-style editor documents every field with examples. You don't need to memorize anything.
How to run it
1. Apify Console (no code)
Open the actor, edit input, click Start. Output lands in the run's dataset and exports as JSON, CSV, Excel, or RSS feed.
2. Apify API
Synchronous run that returns dataset items in the response — perfect for cron jobs and webhooks:
curl -X POST "https://api.apify.com/v2/acts/cryptosignals~hn-top-stories/run-sync-get-dataset-items?token=YOUR_TOKEN" \-H "Content-Type: application/json" \-d '{"count": 30, "type": "top", "minScore": 100}'
Async run:
curl -X POST "https://api.apify.com/v2/acts/cryptosignals~hn-top-stories/runs?token=YOUR_TOKEN" \-H "Content-Type: application/json" \-d '{"count": 100, "type": "best", "minScore": 200}'
Then poll GET /v2/acts/cryptosignals~hn-top-stories/runs/{runId} and fetch items from defaultDatasetId.
3. Apify JavaScript SDK
import { ApifyClient } from 'apify-client';const client = new ApifyClient({ token: process.env.APIFY_TOKEN });const run = await client.actor('cryptosignals/hn-top-stories').call({count: 30,type: 'top',minScore: 100,});const { items } = await client.dataset(run.defaultDatasetId).listItems();items.forEach(s => console.log(`${s.score}\t${s.title}\t${s.hnUrl}`));
4. Apify Python SDK
from apify_client import ApifyClientclient = ApifyClient(token="YOUR_TOKEN")run = client.actor("cryptosignals/hn-top-stories").call(run_input={"count": 30,"type": "top","minScore": 100,})for s in client.dataset(run["defaultDatasetId"]).iterate_items():print(f"{s['score']:>4} {s['title']}")print(f" {s['hnUrl']}")
5. Schedule it
In the Apify Console go to Schedules → Create new and pick this actor. Set a cron expression (e.g. 0 9 * * * for daily 9am UTC) and an input. Use a webhook on the schedule to push results into Slack / Discord / your API on every run.
Pricing
Pay Per Event:
- $0.005 per scraped story.
- No compute-minute charges, no proxy charges, no per-request fees.
- A typical front-page run (30 stories) costs $0.15. A
count: 100run costs $0.50. - Failed runs that produce no items cost you nothing.
- Apify free accounts get monthly credit — usually enough to run a daily 30-story digest at no cost.
Output destinations
- Apify dataset (default) — query later via API, export as JSON/CSV.
- Webhooks — fire on run completion and POST the dataset URL to your endpoint.
- Apify integrations — Zapier, Make, Slack, Google Sheets, Airtable, Pipedream all available out of the box from the actor's run page.
- RSS — the dataset has a built-in RSS view if you'd rather treat HN as a feed.
FAQ
How fresh is the data? Each run pulls live from HN at the time of the request. The HN ranking algorithm itself updates roughly every minute or two.
Why is url sometimes null?
Ask HN, Show HN (text-only), and poll posts have no outbound URL. Use hnUrl to get to the discussion.
Can I get the comments too? This actor focuses on the story listing — the high-frequency, low-cost feed. Comment-tree extraction is a separate concern (much larger payloads, much higher cost). Reach out via the actor page if you need it.
What if a story is deleted between list-fetch and detail-fetch?
The actor silently drops it. Your dataset will never contain null rows or items missing core fields.
Does this comply with HN's terms? HN's API and front-end content are publicly accessible and intended for programmatic use. The actor uses public endpoints only and respects rate limits. Don't use the data for spam, mass-DM campaigns, or to harass posters — that's on you.
Other actors you might like
- amazon-scraper — Amazon products, prices, ratings, reviews across all major locales.
- See the full catalogue at apify.com/cryptosignals — Reddit, Lobsters, Product Hunt, GitHub trending, and more community-news / market-data sources, all using the same input/output conventions.
Support
- Web: web-data-labs.com
- Issues: open an issue on the actor page on Apify.
- Updates: actively maintained. If HN changes its layout or API behavior, fixes typically ship within 24 hours.