๐ŸŸง Hacker News Watchlist Scraper avatar

๐ŸŸง Hacker News Watchlist Scraper

Pricing

from $2.00 / 1,000 hacker news records

Go to Apify Store
๐ŸŸง Hacker News Watchlist Scraper

๐ŸŸง Hacker News Watchlist Scraper

Scrape Hacker News stories and comments across top, new, best, ask, show, jobs streams. Normalized JSON, ISO dates, author karma + account age, AI-ready markdown. Watchlist mode emits only new records since the last run. Export, run via API, schedule, or integrate with other tools.

Pricing

from $2.00 / 1,000 hacker news records

Rating

0.0

(0)

Developer

Skootle

Skootle

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

18 days ago

Last modified

Share

Hacker News Watchlist hero

TL;DR

Monitor Hacker News stories and comments across top, new, best, ask, show, and jobs streams. Returns clean structured JSON with story-type enum, ISO timestamps, author karma + account age, and a 300-500 character markdown summary per story. Watchlist mode emits only NEW records since the previous run. Built on HN's official Firebase API. Zero authentication, zero anti-bot, no rate-limit issues in practice.


Try it on a small dataset, then let us know what you think in a review.


What does Hacker News Watchlist do?

Hacker News Watchlist extracts stories and comments from any Hacker News stream (top, new, best, ask, show, jobs). For each story you get: title, URL, external domain, score, comment count, author, author's karma + account age, rank in the stream, story-type enum (story, ask_hn, show_hn, job, poll), and ISO 8601 timestamps.

With fetchComments: true, the actor walks the comment tree per story (configurable cap, max 500 per story) and emits one record per comment with depth, parent ID, body, score, and author info.

Watchlist mode (watchlistMode: true) makes this scraper schedulable. State persists across runs in the actor's key-value store, so a daily cron only emits stories and comments NEW since the last run.

Why scrape Hacker News?

HN moves fast, important threads age out in 12 hours. Watch top + new + best + Show HN + Ask HN + jobs across the day without 30 tabs open. Useful for founders watching for product mentions, VC scouts watching Show HN for early signal, journalists watching for breaking tech news, and recruiters scanning the monthly Who Is Hiring thread.

Daily AI-driven HN summary digests, brand-mention alerts on competitor domains, and labeled tech-discourse corpora for LLM training all run off one watchlist feed.

Who needs this?

  • DevRel teams monitoring HN for new posts about competitor or category technologies
  • Recruiters scanning the monthly Who Is Hiring? thread plus the jobs stream
  • Brand and product teams watching for unexpected HN posts about their tools
  • VC and tech-scouting analysts filtering Show HN for new product launches
  • AI / LLM teams building training corpora from high-quality tech discourse
  • AI agents consuming a daily filtered HN digest as a topic-of-interest feed

How to use Hacker News Watchlist

  1. Open the Input tab on the actor page
  2. Pick streams in the streams field (top, new, best, ask, show, jobs). One run handles many.
  3. Set storiesPerStream (default 20)
  4. Optionally enable fetchAuthorProfile (default true) for author karma + account age
  5. Optionally enable fetchComments and set commentsPerStory (max 500 per story tree)
  6. Optionally set domainAllowlist to filter to specific external domains
  7. Optionally set minScore to ignore low-engagement posts
  8. Optionally enable watchlistMode for daily diffs
  9. Click Start

How much will scraping Hacker News cost?

This actor is priced per event:

  • Actor Start: $0.01 once per run
  • Hacker News record (story or comment): tiered, charged per record written
Apify plan$/1000 records
FREE$20.00
BRONZE$17.00
SILVER$14.00
GOLD$12.00
PLATINUM$12.00
DIAMOND$10.80

A daily watchlist on top with storiesPerStream: 30 and fetchAuthorProfile: true runs ~$0.45-$0.50/day on GOLD after the first day (most stories already seen, watchlist filters them out).

Yes. Hacker News's Firebase API is explicitly published as a public read-only API for developers (hacker-news.firebaseio.com/v0/). HN encourages programmatic access. There is no authentication, no terms-of-service block on commercial use, and the data (titles, URLs, author handles, scores, public comments) is freely visible to anyone in a browser.

Use the data for research, AI training, brand monitoring, recruiting, internal analytics. Standard practice is to attribute HN as a source if you republish content, but the API itself is unrestricted.

Examples

Example 1: Daily top-30 digest

{
"streams": ["top"],
"storiesPerStream": 30,
"fetchAuthorProfile": true,
"watchlistMode": true,
"maxItems": 30
}

Example 2: New posts above 100 score, last 24h

{
"streams": ["new"],
"storiesPerStream": 100,
"minScore": 100,
"fetchAuthorProfile": true,
"maxItems": 30
}

Example 3: Show HN product-launch tracker

{
"streams": ["show"],
"storiesPerStream": 50,
"watchlistMode": true,
"fetchAuthorProfile": true,
"maxItems": 50
}

Example 4: Ask HN community question feed

{
"streams": ["ask"],
"storiesPerStream": 30,
"fetchComments": true,
"commentsPerStory": 30,
"watchlistMode": true,
"maxItems": 1000
}

Example 5: Job-listing watchlist

{
"streams": ["jobs"],
"storiesPerStream": 100,
"watchlistMode": true,
"maxItems": 100
}

Example 6: Brand monitoring (filter by domain)

{
"streams": ["top", "new"],
"storiesPerStream": 100,
"domainAllowlist": ["yourcompany.com", "competitor1.com", "competitor2.com"],
"watchlistMode": true,
"maxItems": 50
}

Example 7: AI / LLM corpus build

{
"streams": ["top"],
"storiesPerStream": 500,
"fetchComments": true,
"commentsPerStory": 100,
"fetchAuthorProfile": true,
"maxItems": 50000
}

Run weekly to accumulate a labeled tech-discourse dataset for fine-tuning.

Example 8: Author-trust filter

{
"streams": ["new"],
"storiesPerStream": 200,
"fetchAuthorProfile": true,
"minScore": 5,
"maxItems": 100
}

Filter the output downstream for authorAccountAge != 'today' and authorKarma > 100 to skip brand-new spam accounts.

Input parameters

FieldTypeDefaultDescription
streamsenum[]["top"]top, new, best, ask, show, jobs. One run handles many.
storiesPerStreamint201-500
fetchAuthorProfilebooltrueAdds author karma + account age. One extra API call per unique author, cached.
fetchCommentsboolfalseWalks the comment tree per story
commentsPerStoryint0Max 500
domainAllowliststring[][]Only emit stories whose external URL matches
minScoreint0Score threshold
watchlistModeboolfalseIdempotent diff against KV-stored seen IDs
maxItemsint50Hard cap on records (stories + comments)

Story-type enum

ValueMeaning
storyStandard linked story
ask_hnAsk HN: question to the community
show_hnShow HN: project/product launch
jobYC company job posting
pollHN poll

Hacker News output format

The dataset has two record types. Filter by recordType.

hn_story

FieldTypeDescription
outputSchemaVersion, recordType, recordIdstringDiscriminated identity
itemId, url, hnUrlint/stringHN ID + external URL + HN comment-page URL
storyTypeenumSee enum table
title, text, textPlainstringTitle + body (HTML + stripped)
externalUrl, domainstringExternal link + parsed domain
author, authorKarma, authorAccountAgestring/int/stringAuthor profile (when fetchAuthorProfile: true); accountAge as '12y', '5mo', etc.
score, descendants, rankintScore, comment count, position in stream
streamenumSource stream
createdAt, scrapedAtISO 8601
fieldCompletenessScore, agentMarkdownint / stringQuality + LLM-ready summary

hn_comment

FieldTypeDescription
outputSchemaVersion, recordType, recordIdstringDiscriminated identity
itemId, urlint/stringComment ID + URL
storyId, parentId, depthintTree linkage
text, textPlainstringBody (HTML + stripped)
author, createdAt, scrapedAtstring / ISO 8601
fieldCompletenessScore, agentMarkdownint / stringQuality + LLM-ready summary

Hacker News scraper output example (story)

{
"outputSchemaVersion": "2026-05-08",
"recordType": "hn_story",
"recordId": "hn:story:48067119",
"itemId": 48067119,
"stream": "top",
"rank": 1,
"storyType": "story",
"title": "Google broke reCAPTCHA for de-googled Android users",
"url": "https://reclaimthenet.org/google-broke-recaptcha-for-de-googled-android-users",
"domain": "reclaimthenet.org",
"score": 656,
"descendants": 234,
"author": "anonymousiam",
"authorKarma": 5099,
"authorAccountAge": "9y",
"createdAt": "2026-05-08T14:22:53.000Z",
"fieldCompletenessScore": 100,
"agentMarkdown": "**๐Ÿ“ฐ HN ยท Google broke reCAPTCHA for de-googled Android users**\n- โฌ† 656 ยท ๐Ÿ’ฌ 234 ยท #1\n- ๐Ÿ‘ค u/anonymousiam ยท 5099 karma ยท 9y\n- ๐ŸŒ reclaimthenet.org\n- ๐Ÿ”— https://reclaimthenet.org/..."
}

During the Actor run

The actor pulls stories, comments, and author profiles from HN's official Firebase API with respectful pacing. No authentication required, no rate-limit issues in practice; author lookups are cached per run so a busy thread doesn't multiply API calls. Alongside the dataset, three artifacts land in the actor's key-value store: OUTPUT (run summary), AGENT_BRIEFING (markdown digest with top stories by score), and WATCHLIST_STATE (seen story + comment IDs, when watchlistMode: true).

FAQ

Is there a rate limit?

HN's Firebase API doesn't publish a hard rate limit and is generous in practice. The actor paces requests respectfully so a daily run never trips a soft cap.

Can I monitor for new stories only?

Yes. Set watchlistMode: true. The first run captures everything; subsequent runs only emit records new since the previous run.

Can I get author karma and account age?

Yes, that's the default (fetchAuthorProfile: true). Adds one HN API call per unique author, cached within the run.

Can I get comments along with stories?

Yes. Set fetchComments: true and commentsPerStory to your cap.

Can I filter by domain?

Yes. Set domainAllowlist to a list of allowed domains (e.g., ["yourcompany.com"]). Only stories whose external URL matches will be emitted.

Can I filter by score?

Yes. Set minScore to your threshold. Stories below it are skipped.

Can I use this with the Apify API?

Yes. POST to https://api.apify.com/v2/acts/skootle~hackernews-watchlist/runs.

Can I integrate with Make / Zapier / n8n / Slack?

Yes. Click Integrations on the actor page.

Why use this when HN's API is free?

The free API requires per-item fetches and gives you no schema. This actor handles all the orchestration (stream โ†’ IDs โ†’ items โ†’ authors โ†’ comments โ†’ watchlist diff), normalizes into a versioned typed schema, joins author profile data per story, computes ranks, and ships agent-ready markdown summaries. If you're feeding this into a daily Slack digest or an AI agent, it pays back the per-record cost in saved engineering time.

Your feedback

Hit a bug or want a feature? Open an issue on the Issues tab rather than the reviews page, and we'll fix it fast (typically within 48 hours).

Why choose Hacker News Watchlist

  • Monitor mode emits only what's new since last run, so a daily Slack digest or AI agent feed never replays yesterday's stories
  • All 6 streams in one run, top, new, best, ask, show, jobs, instead of refreshing HN tab by tab through the day
  • Author profile join, authorKarma and authorAccountAge per story, so brand monitors and VC scouts can filter spam vs trusted contributors immediately
  • Story-type filter without grep, typed enum (story, ask_hn, show_hn, job, poll) means Show HN trackers and job watchlists work in one downstream query
  • Sub-minute typical runtime, built on HN's official public Firebase API, no anti-bot, no auth, no rate-limit issues in practice
  • Agent-ready markdown per record drops straight into an LLM context window
  • Stories and comments in one dataset, filterable by recordType
  • Re-runs are safe to dedupe by ID, stable hn:story:<id> and hn:comment:<id> keys
  • Schema doesn't break your pipeline, versioned and bumped on breaking change

Other Skootle actors you might want to check

Support and contact

File issues on this actor's page, replies within 48 hours. Feature requests welcome, tag with enhancement.