๐ŸŸง Hacker News Watchlist Scraper avatar

๐ŸŸง Hacker News Watchlist Scraper

Pricing

from $2.00 / 1,000 hacker news records

Go to Apify Store
๐ŸŸง Hacker News Watchlist Scraper

๐ŸŸง Hacker News Watchlist Scraper

Scrape Hacker News stories and comments across top, new, best, ask, show, jobs streams. Normalized JSON, ISO dates, author karma + account age, AI-ready markdown. Watchlist mode emits only new records since the last run. Export, run via API, schedule, or integrate with other tools.

Pricing

from $2.00 / 1,000 hacker news records

Rating

0.0

(0)

Developer

Skootle

Skootle

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

2 hours ago

Last modified

Share

Hacker News Watchlist hero

TL;DR

Monitor Hacker News stories and comments across top, new, best, ask, show, and jobs streams. Returns clean structured JSON with story-type enum, ISO timestamps, author karma + account age, and a 300-500 character markdown summary per story. Watchlist mode emits only NEW records since the previous run. Built on HN's official Firebase API. Zero authentication, zero anti-bot, no rate-limit issues in practice.


Try it on a small dataset, then let us know what you think in a review.


What does Hacker News Watchlist do?

Hacker News Watchlist extracts stories and comments from any Hacker News stream (top, new, best, ask, show, jobs). For each story you get: title, URL, external domain, score, comment count, author, author's karma + account age, rank in the stream, story-type enum (story, ask_hn, show_hn, job, poll), and ISO 8601 timestamps.

With fetchComments: true, the actor walks the comment tree per story (configurable cap, max 500 per story) and emits one record per comment with depth, parent ID, body, score, and author info.

Watchlist mode (watchlistMode: true) makes this scraper schedulable. State persists across runs in the actor's key-value store, so a daily cron only emits stories and comments NEW since the last run.

Why scrape Hacker News?

Hacker News is the canonical curated tech-and-startup discourse feed on the internet. Y Combinator's affiliated community surfaces breaking technical stories, OSS launches, founder posts, hiring threads (Who Is Hiring?), and product announcements hours before they hit Twitter or mainstream media. For dev relations, recruiting, technology radar, AI training corpora, and startup intelligence, HN is high-signal data per byte.

The HN Firebase API is free and public, but it's verbose: each item is a separate fetch (one for the topstories list, then one per item, then one per comment). This actor handles all of that orchestration plus author lookups (cached per run) plus the watchlist diff logic.

Who needs this?

  • DevRel teams monitoring HN for new posts about competitor or category technologies
  • Recruiters scanning the monthly Who Is Hiring? thread plus the jobs stream
  • Brand and product teams watching for unexpected HN posts about their tools
  • VC and tech-scouting analysts filtering Show HN for new product launches
  • AI / LLM teams building training corpora from high-quality tech discourse
  • AI agents consuming a daily filtered HN digest as a topic-of-interest feed

How to use Hacker News Watchlist

  1. Open the Input tab on the actor page
  2. Pick streams in the streams field (top, new, best, ask, show, jobs). One run handles many.
  3. Set storiesPerStream (default 20)
  4. Optionally enable fetchAuthorProfile (default true) for author karma + account age
  5. Optionally enable fetchComments and set commentsPerStory (max 500 per story tree)
  6. Optionally set domainAllowlist to filter to specific external domains
  7. Optionally set minScore to ignore low-engagement posts
  8. Optionally enable watchlistMode for daily diffs
  9. Click Start

How much will scraping Hacker News cost?

This actor is priced per event:

  • Actor Start: $0.01 once per run
  • Hacker News record (story or comment): tiered, charged per record written
Apify plan$/1000 records
FREE$20.00
BRONZE$17.00
SILVER$14.00
GOLD$12.00
PLATINUM$12.00
DIAMOND$10.80

A daily watchlist on top with storiesPerStream: 30 and fetchAuthorProfile: true runs ~$0.45-$0.50/day on GOLD after the first day (most stories already seen, watchlist filters them out).

Yes. Hacker News's Firebase API is explicitly published as a public read-only API for developers (hacker-news.firebaseio.com/v0/). HN encourages programmatic access. There is no authentication, no terms-of-service block on commercial use, and the data (titles, URLs, author handles, scores, public comments) is freely visible to anyone in a browser.

Use the data for research, AI training, brand monitoring, recruiting, internal analytics. Standard practice is to attribute HN as a source if you republish content, but the API itself is unrestricted.

Examples

Example 1: Daily top-30 digest

{
"streams": ["top"],
"storiesPerStream": 30,
"fetchAuthorProfile": true,
"watchlistMode": true,
"maxItems": 30
}

Example 2: New posts above 100 score, last 24h

{
"streams": ["new"],
"storiesPerStream": 100,
"minScore": 100,
"fetchAuthorProfile": true,
"maxItems": 30
}

Example 3: Show HN product-launch tracker

{
"streams": ["show"],
"storiesPerStream": 50,
"watchlistMode": true,
"fetchAuthorProfile": true,
"maxItems": 50
}

Example 4: Ask HN community question feed

{
"streams": ["ask"],
"storiesPerStream": 30,
"fetchComments": true,
"commentsPerStory": 30,
"watchlistMode": true,
"maxItems": 1000
}

Example 5: Job-listing watchlist

{
"streams": ["jobs"],
"storiesPerStream": 100,
"watchlistMode": true,
"maxItems": 100
}

Example 6: Brand monitoring (filter by domain)

{
"streams": ["top", "new"],
"storiesPerStream": 100,
"domainAllowlist": ["yourcompany.com", "competitor1.com", "competitor2.com"],
"watchlistMode": true,
"maxItems": 50
}

Example 7: AI / LLM corpus build

{
"streams": ["top"],
"storiesPerStream": 500,
"fetchComments": true,
"commentsPerStory": 100,
"fetchAuthorProfile": true,
"maxItems": 50000
}

Run weekly to accumulate a labeled tech-discourse dataset for fine-tuning.

Example 8: Author-trust filter

{
"streams": ["new"],
"storiesPerStream": 200,
"fetchAuthorProfile": true,
"minScore": 5,
"maxItems": 100
}

Filter the output downstream for authorAccountAge != 'today' and authorKarma > 100 to skip brand-new spam accounts.

Input parameters

FieldTypeDefaultDescription
streamsenum[]["top"]top, new, best, ask, show, jobs. One run handles many.
storiesPerStreamint201-500
fetchAuthorProfilebooltrueAdds author karma + account age. One extra API call per unique author, cached.
fetchCommentsboolfalseWalks the comment tree per story
commentsPerStoryint0Max 500
domainAllowliststring[][]Only emit stories whose external URL matches
minScoreint0Score threshold
watchlistModeboolfalseIdempotent diff against KV-stored seen IDs
maxItemsint50Hard cap on records (stories + comments)

Story-type enum

ValueMeaning
storyStandard linked story
ask_hnAsk HN: question to the community
show_hnShow HN: project/product launch
jobYC company job posting
pollHN poll

Hacker News output format

The dataset has two record types. Filter by recordType.

hn_story

FieldTypeDescription
outputSchemaVersion, recordType, recordIdstringDiscriminated identity
itemId, url, hnUrlint/stringHN ID + external URL + HN comment-page URL
storyTypeenumSee enum table
title, text, textPlainstringTitle + body (HTML + stripped)
externalUrl, domainstringExternal link + parsed domain
author, authorKarma, authorAccountAgestring/int/stringAuthor profile (when fetchAuthorProfile: true); accountAge as '12y', '5mo', etc.
score, descendants, rankintScore, comment count, position in stream
streamenumSource stream
createdAt, scrapedAtISO 8601
fieldCompletenessScore, agentMarkdownint / stringQuality + LLM-ready summary

hn_comment

FieldTypeDescription
outputSchemaVersion, recordType, recordIdstringDiscriminated identity
itemId, urlint/stringComment ID + URL
storyId, parentId, depthintTree linkage
text, textPlainstringBody (HTML + stripped)
author, createdAt, scrapedAtstring / ISO 8601
fieldCompletenessScore, agentMarkdownint / stringQuality + LLM-ready summary

Hacker News scraper output example (story)

{
"outputSchemaVersion": "2026-05-08",
"recordType": "hn_story",
"recordId": "hn:story:48067119",
"itemId": 48067119,
"stream": "top",
"rank": 1,
"storyType": "story",
"title": "Google broke reCAPTCHA for de-googled Android users",
"url": "https://reclaimthenet.org/google-broke-recaptcha-for-de-googled-android-users",
"domain": "reclaimthenet.org",
"score": 656,
"descendants": 234,
"author": "anonymousiam",
"authorKarma": 5099,
"authorAccountAge": "9y",
"createdAt": "2026-05-08T14:22:53.000Z",
"fieldCompletenessScore": 100,
"agentMarkdown": "**๐Ÿ“ฐ HN ยท Google broke reCAPTCHA for de-googled Android users**\n- โฌ† 656 ยท ๐Ÿ’ฌ 234 ยท #1\n- ๐Ÿ‘ค u/anonymousiam ยท 5099 karma ยท 9y\n- ๐ŸŒ reclaimthenet.org\n- ๐Ÿ”— https://reclaimthenet.org/..."
}

During the Actor run

The actor first fetches the stream's ID list (one call per stream). Then for each ID it fetches the item details (hacker-news.firebaseio.com/v0/item/<id>.json). When fetchAuthorProfile: true, the author's profile is fetched once and cached for the rest of the run (so a story with 50 comments by the same person costs one extra API call, not 50). When fetchComments: true, the comment tree is walked depth-first per story.

Each record is validated against a Zod schema before push. The actor writes:

  1. OUTPUT โ€” compact run summary
  2. AGENT_BRIEFING โ€” markdown digest with top stories by score
  3. WATCHLIST_STATE โ€” (when watchlistMode: true) seen story + comment IDs

FAQ

How does Hacker News Watchlist work?

The actor calls HN's official Firebase API at hacker-news.firebaseio.com/v0/. Every list (top, new, best, ask, show, jobs) is one HTTP call returning an array of IDs. Every item is one HTTP call returning the full item JSON.

Is there a rate limit?

HN's Firebase API doesn't publish a hard rate limit and is generous in practice. The actor adds a 60ms delay between requests as a courtesy to avoid spiking concurrent connections.

Can I monitor for new stories only?

Yes. Set watchlistMode: true. The first run captures everything; subsequent runs only emit records new since the previous run.

Can I get author karma and account age?

Yes โ€” that's the default (fetchAuthorProfile: true). Adds one HN API call per unique author, cached within the run.

Can I get comments along with stories?

Yes. Set fetchComments: true and commentsPerStory to your cap.

Can I filter by domain?

Yes. Set domainAllowlist to a list of allowed domains (e.g., ["yourcompany.com"]). Only stories whose external URL matches will be emitted.

Can I filter by score?

Yes. Set minScore to your threshold. Stories below it are skipped.

Can I use this with the Apify API?

Yes. POST to https://api.apify.com/v2/acts/skootle~hackernews-watchlist/runs.

Can I integrate with Make / Zapier / n8n / Slack?

Yes. Click Integrations on the actor page.

Why use this when HN's API is free?

The free API requires per-item fetches and gives you no schema. This actor handles all the orchestration (stream โ†’ IDs โ†’ items โ†’ authors โ†’ comments โ†’ watchlist diff), normalizes into a versioned typed schema, joins author profile data per story, computes ranks, and ships agent-ready markdown summaries. If you're feeding this into a daily Slack digest or an AI agent, it pays back the per-record cost in saved engineering time.

Your feedback

Hit a bug or want a feature? Open an issue on the Issues tab rather than the reviews page, and we'll fix it fast (typically within 48 hours).

Why choose Hacker News Watchlist

  • All 6 streams covered โ€” top, new, best, ask, show, jobs. One actor handles all in one run.
  • Author profile join โ€” authorKarma, authorAccountAge ready to filter spam vs trusted contributors
  • Story-type enum โ€” story, ask_hn, show_hn, job, poll. No need to grep titles.
  • Watchlist diff mode โ€” only emits NEW records since the last run
  • Versioned schema โ€” outputSchemaVersion: '2026-05-08' literal
  • Idempotent record IDs โ€” hn:story:<id>, hn:comment:<id> stable across runs
  • Discriminated union โ€” stories + comments share one dataset, filterable by recordType
  • Agent-grade output โ€” agentMarkdown ready to paste into an LLM context
  • Zero anti-bot, zero auth โ€” built on HN's official public Firebase API

Other Skootle actors you might want to check

Support and contact

File issues on this actor's page โ€” replies within 48 hours. Feature requests welcome โ€” tag with enhancement.