Hacker News — CSV, Stories + Comments + Users, No API Key avatar

Hacker News — CSV, Stories + Comments + Users, No API Key

Pricing

Pay per usage

Go to Apify Store
Hacker News — CSV, Stories + Comments + Users, No API Key

Hacker News — CSV, Stories + Comments + Users, No API Key

Scrape HN top/new/Show HN/Ask HN/jobs in minutes. No rate limits. Title, URL, score, comments, author as JSON/CSV. 26+ runs. For launch monitoring, competitor tracking, market trends. spinov001@gmail.com · blog.spinov.online · t.me/scraping_ai

Pricing

Pay per usage

Rating

0.0

(0)

Developer

Alex

Alex

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

0

Monthly active users

18 days ago

Last modified

Share

Hacker News Scraper

Scrape stories and comments from Hacker News — extract top, new, best, Ask HN, Show HN, and job posts with full comment threads. Uses the official HN API and Algolia search for fast, reliable data extraction.

Features

  • 6 story types — top, new, best, Ask HN, Show HN, and job stories
  • Full comment threads — nested comments with author, text, timestamp, depth level, and child count (up to 3 levels deep)
  • Algolia search — find stories by keyword with relevance ranking across all of Hacker News history
  • Score filtering — set a minimum score threshold to extract only high-quality stories
  • Batch processing — fetches stories in parallel batches of 10 for maximum speed
  • Domain extraction — automatically extracts the domain from story URLs
  • Real-time data — uses the official Firebase HN API for live scores and comment counts

Output Example

{
"id": 39876543,
"title": "Show HN: I built an open-source alternative to Notion",
"url": "https://github.com/user/project",
"author": "developer_123",
"score": 487,
"commentCount": 234,
"time": "2026-03-17T14:20:00.000Z",
"type": "story",
"hnUrl": "https://news.ycombinator.com/item?id=39876543",
"domain": "github.com",
"source": "top",
"comments": [
{
"id": 39876600,
"author": "tech_reviewer",
"text": "This is impressive! I especially like the.",
"time": "2026-03-17T14:35:00.000Z",
"depth": 0,
"childCount": 5
}
],
"scrapedAt": "2026-03-18T12:00:00.000Z"
}

Use Cases

  • Tech trend monitoring — track what topics, tools, and technologies the developer community is discussing
  • Content research — discover high-performing content topics and formats that resonate with technical audiences
  • Competitive intelligence — monitor mentions of your product, competitors, and industry on the #1 tech news site
  • Startup discovery — scrape Show HN posts to find new product launches and early-stage startups
  • Job market analysis — extract HN job postings to analyze hiring trends, salaries, and in-demand skills

Input Parameters

ParameterTypeDefaultDescription
scrapeTypeString"top"Story type: top, new, best, ask, show, job (6 options per input_schema enum). For keyword search, use searchQueries array — the search loop fires independently of scrapeType.
searchQueriesArray[]Keywords to search across HN history (via Algolia)
maxStoriesNumber100Maximum stories to extract
includeCommentsBooleantrueWhether to extract comment threads
maxCommentsPerStoryNumber30Maximum comments per story (includes nested replies)
minScoreNumber0Minimum score threshold (filter out low-scoring stories)

Pricing

Standard Apify per-run compute pricing — no per-story or per-comment fee. With includeComments: true, each comment requires a separate Firebase API request, so a 100-story run with full comment threads can issue 1000+ requests. Use maxCommentsPerStory and minScore to bound cost.

FAQ

Q: Can I search across all of Hacker News history? A: Yes — for stories only. The search feature uses Algolia's HN Search API with tags=story filter, so comment-text matches are excluded. Use the dedicated HN Algolia UI (https://hn.algolia.com/) if you need comment-search.

Q: Why are comments more expensive to scrape? A: Each comment requires a separate API call to the HN Firebase API. A story with 200 comments can require 30+ individual requests to fetch the top-level and nested replies.

Q: What's the difference between "top" and "best" stories? A: "Top" shows the current front page ranking (changes frequently). "Best" shows the highest-scoring stories over a longer period. "New" shows the most recently submitted stories regardless of score.


Honest Limitations

  • Comment recursion is bounded. Each parent comment fetches at most its first 5 child IDs (item.kids.slice(0, 5)). So a comment with 50 replies emits only 5 of them per branch. Depth limit is 3 (top-level depth=0 plus up to 3 recursion levels = max 4 levels including root).
  • maxCommentsPerStory is a global per-story cap, applied across all depths simultaneously. A wide thread that hits 30 top-level comments before recursion reaches deeper replies will leave deeper branches unfetched.
  • Failed item fetches are silently skipped.catch(() => null) returns null, the iteration continues. There is no retry logic.
  • Output schema differs by branch. The type-branch (top/new/best/ask/show/job) emits 13 fields including text (for ask/show story body). The search-branch (searchQueries) emits 12 fields including tags (Algolia _tags array) but no text. The example above is the type-branch shape — search records lack text and gain tags.
  • Algolia search filters to tags=story — comment-only matches are not returned by this actor.
  • Story IDs from *stories.json endpoints are returned in HN's display order. The actor pre-fetches maxStories * 2 IDs to allow for minScore filtering. If your minScore is high and the front page is light on high-score stories, the actor returns fewer than maxStories rows.

Proof of delivery: This Hacker News scraper has 27 lifetime production runs as of May 2026. Author maintains 31 published actors (78 total) and shipped a paid 3-article series in March 2026 ($150, proxy industry). Pilot pricing locked through May 2026.

Sample request? Reply sample to spinov001@gmail.com and we'll send 2 published case-study articles within 24 hours.

Custom scraping — pilot tiers

Need data, not infrastructure. We build, you query. Three tiers:

  • Pilot — $97 · 1 actor, basic config, 7-day support. Good entry point for one-off jobs.
  • Standard — $297 · custom actor + Slack/email alerts on results, 30-day support. Most projects fit here.
  • Premium — $797 · custom actor + dashboard + 90-day support + 1 modification round. For ongoing data pipelines.

Email: spinov001@gmail.com — drop specs, schema, or target URLs and get a quote within 48h.

Proof of work: 31 published Apify scrapers (78 total in portfolio) — Trustpilot 951 runs, Reddit 82, Google News 45, Glassdoor 39, Email Extractor 107, Hacker News 27. Recently delivered a paid 3-article series for a client in the proxy industry ($150).

More tips: t.me/scraping_ai · blog.spinov.online


Honest disclosure: this scraper uses the public HN Firebase API and Algolia HN Search — no scraping behind login walls, no personal data beyond what HN publishes, robots.txt respected. Not affiliated with Y Combinator.