Hacker News Scraper — Stories, Jobs, Comments & Users API avatar

Hacker News Scraper — Stories, Jobs, Comments & Users API

Pricing

from $0.18 / 1,000 hacker news items

Go to Apify Store
Hacker News Scraper — Stories, Jobs, Comments & Users API

Hacker News Scraper — Stories, Jobs, Comments & Users API

Scrape Hacker News stories, comments, jobs, and user profiles via the official Firebase and Algolia APIs. No proxy, no auth. Supports top/new/best/ask/show/job feeds, full-text search, comment trees, and user data. Pay per result.

Pricing

from $0.18 / 1,000 hacker news items

Rating

0.0

(0)

Developer

Vitalii Bondarev

Vitalii Bondarev

Maintained by Community

Actor stats

0

Bookmarked

1

Total users

0

Monthly active users

19 hours ago

Last modified

Share

Hacker News Scraper — Stories, Jobs & Comments API | from $1/1K No Login

Used by developer-tooling teams, content intelligence platforms, and AI agents monitoring tech discourse.

Scrape Hacker News stories, comments, jobs, and user profiles via the official Firebase API and Algolia HN Search API. No login required. No proxy needed. Zero-auth Firebase and Algolia APIs. Pay per result.

Features

  • 6 live feeds — Top, New, Best, Ask HN, Show HN, Job stories (via Firebase)
  • Full-text search — Algolia HN Search with relevance or date sort
  • Comments — fetch top-level comment threads with parent_id for threading
  • User profiles — karma, created date, bio, submission count
  • Direct item fetch — supply any HN item IDs
  • parse_confidence — machine-readable quality score in every record
  • Flat schema — 17 fields, ready for CSV/JSON/BigQuery

Why this beats the alternatives

ActorPriceAPICommentsUsersparse_confidenceProxy needed
This actor$1.00/1kFirebase + AlgoliaNo
epctex/hackernews-scraper$10/mo rentalHTML scrapingNoRequired
harvestlab/hacker-news-scraper$1.00/1kFirebase + AlgoliaNoNo
automation-lab/hackernews-scraper$1.00/1kHTML scrapingNoNo
andok/hackernews-scraper?FirebaseNoNo

Our edge: only actor combining Firebase feeds + Algolia search + comments + user profiles + parse_confidence, all via official APIs. No proxy required — HN Firebase is public, zero-auth, and extremely stable.

Modes

ModeAPIReturns
topStoriesFirebaseFront-page top 500 stories
newStoriesFirebaseNewest 500 stories
bestStoriesFirebaseAll-time best 200 stories
askStoriesFirebaseAsk HN posts
showStoriesFirebaseShow HN posts
jobStoriesFirebaseJob posts
searchQueriesAlgoliaFull-text search results
itemIdsFirebaseFetch specific items by ID
usersFirebaseUser profiles

Output schema

FieldTypeDescription
item_idintHN item ID
typestringstory / comment / job / poll / user
titlestringStory or job title
textstringSelf-post body or comment text (HTML decoded)
urlstringExternal link (link stories)
domainstringBare domain extracted from url
authorstringSubmitter username
scoreintUpvote points
num_commentsintComment count (descendants)
created_atstringISO 8601 UTC timestamp
hn_urlstringDirect HN link
parent_idintParent item ID (comments only)
story_idintRoot story ID (comments only)
querystringQuery or feed name
feedstringMode used
scraped_atstringRun timestamp ISO 8601 UTC
parse_confidencefloat0.0–1.0 data quality score
warningsarrayMachine-readable quality issues

Usage examples

Get today's top 50 stories

{
"mode": "topStories",
"maxItems": 50
}

Search for AI stories (newest first)

{
"mode": "searchQueries",
"searchQueries": ["large language models", "AI agents"],
"searchSortBy": "date",
"searchTags": "story",
"maxItems": 100
}

Top stories with comments

{
"mode": "topStories",
"maxItems": 10,
"fetchComments": true,
"maxCommentsPerStory": 10
}

Fetch specific items

{
"mode": "itemIds",
"itemIds": ["48343683", "48340411"]
}

User profiles

{
"mode": "users",
"usernames": ["pg", "tptacek", "dang"]
}

Pricing

Base: $1.00 per 1,000 items (stories, jobs, user profiles). You only pay for what you get.

With comments (fetchComments=true): comments are billed as an additional hn-comment event. This keeps comment-only runs transparent — you pay for what you actually receive.

  • 50 top stories (no comments): ~$0.05
  • 100 search results: ~$0.10
  • Daily top-50 monitoring (no comments): ~$1.50/month
  • Top-10 stories + 10 comments each: $0.01 + $0.10 = ~$0.11

Platform compute is negligible — pure API, no browser, no proxy.

Pricing examples

RunItemsCost
Top 50 stories (no comments)50~$0.05
100 search results100~$0.10
Daily top-50 monitoring, 30 days1,500~$1.50/mo
Top-10 stories + 10 comments each110~$0.11

You only pay for items successfully pushed to the dataset.

FAQ

Do I need a proxy or API key? No. The HN Firebase and Algolia APIs are public and require zero auth. No proxy needed — HN data is served globally with no geo-blocks.

What formats can I export the results in? JSON, CSV, JSONL, Excel, and XML — all via Apify's dataset export. Connect to Google Sheets, Airtable, or any BI tool with no code.

Can I run this on a schedule? Yes. Use Apify Schedules to run daily/hourly on topStories or a searchQueries query and monitor tech discourse automatically.

What if the actor returns empty results? For feed modes (topStories etc.) the Firebase API returns IDs before story bodies — if a story was deleted mid-run, it is skipped silently. For searchQueries mode, verify your searchTags filter matches the content type you expect (e.g. story vs comment). The OUTPUT key-value store always explains the failure reason.

Monitoring mode

Run daily on topStories or searchQueries to track trending topics. Pair with Apify Schedules for automated tech-signal monitoring — no code required.

Use with AI agents (MCP)

This actor is callable as a tool by AI agents (Claude Desktop, Cursor, VS Code, n8n, LangGraph, CrewAI, or any MCP-compatible client) via Apify's hosted Model Context Protocol server. An agent uses it to look up live Hacker News stories, search for content by keyword, or fetch a user's profile mid-conversation — e.g. "what's trending on HN right now?", "find recent HN discussions about AI agents", or "what's pg's karma on HN?".

Point your MCP client at this tool:

{
"mcpServers": {
"apify": {
"command": "npx",
"args": [
"mcp-remote",
"https://mcp.apify.com/?tools=bovi/hacker-news-scraper",
"--header",
"Authorization: Bearer <YOUR_APIFY_TOKEN>"
]
}
}
}

Minimal calls an agent can make:

{ "mode": "topStories", "maxItems": 10 }
{ "mode": "searchQueries", "searchQueries": ["AI agents"], "maxItems": 20 }
{ "mode": "users", "usernames": ["pg"] }

Flat output rows the agent can reason over directly:

{ "item_id": 48343683, "type": "story", "title": "Show HN: We built an MCP server for live data",
"url": "https://example.com/mcp", "domain": "example.com", "author": "hnuser",
"score": 312, "num_comments": 87, "created_at": "2026-06-03T10:22:00Z",
"hn_url": "https://news.ycombinator.com/item?id=48343683",
"query": "topStories", "feed": "topStories", "parse_confidence": 1.0 }

Reliability for agents: both APIs (Firebase + Algolia) are officially maintained by Y Combinator for third-party use, so results don't silently break on redesigns. Every row carries a parse_confidence score (0.0–1.0) and a warnings array — a machine-readable quality signal your agent can filter on. No proxy needed. No API key inside the tool — auth is your Apify token in the client config above.

Integrations

Built for developer-tooling teams and content-intelligence platforms monitoring tech discourse, trending stories, and HN job posts — the JSON/dataset output drops into the tools you already run, no glue code:

  • n8n / Make / Zapier — trigger a run or pipe every new dataset item into 500+ apps (Google Sheets, Airtable, Slack, HubSpot, your database) with no code: n8n, Make, Zapier.
  • Webhooks — fire your own endpoint the moment a run finishes, to push results straight into your pipeline (docs).
  • MCP server — expose this actor as a tool to Claude, Cursor, or any MCP client so an AI agent can pull this data mid-conversation (guide).
  • API & SDKs — fetch the dataset as JSON, CSV, or Excel through the Apify REST API or the Python / JS SDKs.

See all Apify integrations.

The HN Firebase API is publicly documented and maintained by Y Combinator for third-party use. The Algolia HN Search API is officially provided under agreement with YC. Both require no authentication and have no scraping restrictions.

Built by the Apify actor factory. Not affiliated with Y Combinator or Hacker News.

More scrapers from our toolkit

Building a data pipeline? These actors pair well with this one — each runs on your own Apify account with the same pay-per-result pricing, no subscription:

Chain any of them together from the Integrations tab (the Run succeeded trigger) to build a multi-step workflow — one actor's output feeds the next.

Use it from your existing tools

Use with Claude Desktop / Cursor / Cline (MCP)

Load this actor as a tool in your AI assistant. Call it directly from your AI assistant via the Apify MCP server — no Store browsing needed. Paste this into your MCP client config (e.g. claude_desktop_config.json) and restart the client:

{
"mcpServers": {
"apify-hacker-news-scraper": {
"command": "npx",
"args": [
"-y",
"@apify/actors-mcp-server",
"--tools",
"bovi/hacker-news-scraper"
],
"env": {
"APIFY_TOKEN": "YOUR_APIFY_TOKEN"
}
}
}
}

Replace YOUR_APIFY_TOKEN with your own Apify API token (free at apify.com → Settings → Integrations). Curated to a handful of tools so the agent selects reliably.

Works with Clay

Run this actor as an HTTP enrichment step inside a Clay table:

  • Method: POST
  • URL: https://api.apify.com/v2/acts/bovi~hacker-news-scraper/run-sync-get-dataset-items?token={{apify_token}}
  • Body (JSON): map your Clay columns to the actor input (see the Input section above), e.g. {"mode": "{{clay_column}}"}

The run finishes synchronously and returns the dataset rows straight into your Clay table. It runs on Apify's cloud under your own token and usage. Synchronous runs must complete within 300 seconds.