Hacker News API Scraper avatar

Hacker News API Scraper

Pricing

from $3.00 / 1,000 results

Go to Apify Store
Hacker News API Scraper

Hacker News API Scraper

Hacker News scraper & HN API alternative. Scrape stories, comments, Ask HN & Show HN without login; export to CSV/JSON. No key, no proxy.

Pricing

from $3.00 / 1,000 results

Rating

0.0

(0)

Developer

Logiover

Logiover

Maintained by Community

Actor stats

0

Bookmarked

3

Total users

2

Monthly active users

2 days ago

Last modified

Share

🟠 Hacker News API Scraper β€” Scrape HN Posts, Comments & Polls at Scale

Hacker News Search Scraper

Bulk-scrape Hacker News stories, comments, polls, Show HN, Ask HN and front-page items through the official HN Algolia search API β€” by keyword, item type, points threshold and date. Returns title, author, URL, text, points, comment counts, tags, created date and HN permalink for tens of thousands of items per run thanks to date-windowed pagination that breaks past the Algolia 1,000-result cap.

Built for founders, growth marketers, journalists, dev-tool product teams, AI/LLM researchers and trend hunters who need a clean, structured Hacker News dataset on a schedule β€” without writing pagination, retry or rate-limit code by hand.

🟒 No API key. No proxy. No login. No browser. Pure public REST.


πŸš€ Why this scraper

Hacker News is the homepage of the technical internet. Every Show HN launch, every Ask HN thread, every front-page debate is a signal β€” about technologies on the rise, products gaining traction, AI labs hiring, infrastructure choices, security issues, founder sentiment and a thousand other things you can only learn by reading what builders are saying right now. But pulling HN at scale yourself means:

  • Working around the Algolia HN Search API's 1,000-result hard cap per query
  • Splitting queries into date windows to backfill years of history
  • Distinguishing stories vs. comments vs. polls vs. Show HN vs. Ask HN
  • Joining Algolia tags into a clean schema
  • Filtering by points and dates without burning quota
  • Persisting flat rows for spreadsheets, warehouses or LLM training pipelines

This scraper does all of that for you. Feed it a keyword (or no keyword) plus an item type and points threshold β€” get back a clean, paginated, flat dataset of every matching HN item, ready for Excel, BigQuery, your newsletter, your fine-tuning dataset or your competitive intelligence dashboard.


✨ Key features

FeatureWhat it gives you
πŸ”Œ Official HN Algolia Search APIStable, well-documented, rate-limit friendly β€” no HTML scraping or anti-bot games
πŸ“ˆ Beyond the 1,000-row capDate-windowed pagination automatically splits queries to pull tens of thousands of items per run
πŸ—‚οΈ Every item typeStories, comments, polls, Show HN, Ask HN and front-page items in one Actor
πŸ”Ž Free-text keyword searchTrack any topic, product, company, language, framework, person or technology across all of HN history
⭐ Points filterCut noise by returning only items above a minimum upvote threshold
🏷️ Rich tag metadataAlgolia tags preserved so you can filter to specific user accounts, story types or front-page items downstream
♾️ Unlimited modeSet maxItems to 0 to pull every matching item
πŸ“¦ Flat, export-ready rows12 columns, no nested JSON β€” direct to CSV, Excel, JSON or XML
⏱️ Schedule-friendlyIdempotent, deterministic β€” perfect for recurring runs that build a continuously growing HN dataset
πŸ”“ No auth, no proxyPublic Algolia endpoint β€” no HN account, no API key, no residential proxy required
🧱 Stable schemaField names match HN/Algolia conventions; safe to use in downstream pipelines
πŸ’Ύ All Apify storage formatsJSON, CSV, Excel, HTML, XML, JSONL β€” straight out of the Dataset

🎯 Built for these use cases

1. Tech & product trend research

What did HN talk about most this month? Which programming languages, frameworks or AI models keep showing up on the front page? Pull stories on a schedule, group by tags or by keyword, and chart the evolution of attention over time.

2. "Show HN" launch tracker

Every Tuesday a few hundred new products ship on Show HN. Scrape show_hn items daily to maintain a launch board β€” find the next devtool, AI wrapper, indie SaaS or open-source project before it explodes.

3. "Ask HN" knowledge mining

Ask HN is one of the best sources of qualitative founder/engineer wisdom on the internet. Pull years of ask_hn threads to build a knowledge base, a search corpus, or a fine-tuning dataset for a developer/founder assistant.

4. Brand & competitor sentiment

Track every HN mention of your company, competitor, framework or product. Combine the points filter and keyword search to surface high-signal discussions instead of one-off mentions buried in long threads.

5. AI / LLM training data

Hacker News comments are dense, opinionated, technically literate and well-attributed. Scrape millions of comments for a topic, anonymize, then use them as fine-tuning material for code/tech assistants or as benchmark prompts.

6. Newsletter & content curation

Power "best of HN" newsletters by scheduling the Actor to pull the previous day's top stories above a points threshold. Filter by tag (e.g. ai, rust, security) for niche editions.

7. Hiring & talent intel

Look at who is posting Show HN launches, answering Ask HN questions or writing widely-upvoted comments in your domain β€” a near-perfect sourcing channel for senior engineers and founders.

8. Journalism & market research

Build a longitudinal dataset on a topic ("crypto", "remote work", "openai", "rust adoption") β€” story counts per week, sentiment shifts, who's commenting, what's getting traction. Backfill years of history with date-windowed pagination.


πŸ“₯ Inputs

FieldTypeRequiredDescription
querystringNoFree-text keyword across all of Hacker News (e.g. "openai", "rust", "acquired", "AGI"). Leave empty to scrape everything matching the other filters.
itemTypeenumNoWhat kind of HN item to scrape. One of: story, comment, poll, show_hn, ask_hn, front_page. Default story.
minPointsintegerNoOnly return items with at least this many points. Use to cut noise. Default 0 (no filter).
maxItemsintegerNoCap on items saved. Set 0 to pull everything matching your query. Date-windowed pagination is used to break past the Algolia 1,000-row cap.

Example inputs

Track every AI front-page story:

{
"query": "AI",
"itemType": "front_page",
"minPoints": 50,
"maxItems": 0
}

Mine Show HN launches above 100 points:

{
"itemType": "show_hn",
"minPoints": 100,
"maxItems": 5000
}

Build an Ask HN knowledge corpus:

{
"itemType": "ask_hn",
"minPoints": 20,
"maxItems": 0
}

Track every comment about Rust:

{
"query": "rust",
"itemType": "comment",
"minPoints": 10,
"maxItems": 20000
}

πŸ“€ Output

Each HN item becomes one flat dataset record. Sample:

{
"objectId": "44321987",
"type": "story",
"title": "Show HN: I built a self-hosted ChatGPT alternative in 200 lines of Go",
"author": "indiehacker42",
"url": "https://github.com/example/mini-llm",
"text": null,
"points": 487,
"numComments": 132,
"tags": ["story", "show_hn", "author_indiehacker42"],
"createdAt": "2026-05-15T08:21:00.000Z",
"hnUrl": "https://news.ycombinator.com/item?id=44321987",
"scrapedAt": "2026-05-16T10:00:00.000Z"
}

Full field reference

FieldTypeMeaning
objectIdstringUnique Hacker News item ID (same as the ?id= in news.ycombinator.com)
typestringItem type: story, comment, poll, pollopt, job
titlestringItem title (null for raw comments)
authorstringHN username of the author
urlstringExternal URL (for stories with a link)
textstringBody text (for Ask HN, comments, polls, self-posts)
pointsnumberNet upvote count
numCommentsnumberTotal comment count under the item
tagsarrayAlgolia tags (e.g. story, show_hn, front_page, author_<username>)
createdAtstringISO 8601 timestamp of creation
hnUrlstringDirect permalink to the item on news.ycombinator.com
scrapedAtstringISO 8601 timestamp of the scrape

βš™οΈ How it works

  1. Parses input β€” keyword, item type, minimum points, max items.
  2. Picks search endpoint β€” uses hn.algolia.com/api/v1/search (relevance) or search_by_date depending on the strategy required to break past the 1,000-row cap.
  3. Builds Algolia filters β€” combines tags, numericFilters (points>=N, created_at_i<timestamp) and query into a single request.
  4. Date-windowed pagination β€” when more than ~1,000 matches exist, the Actor slices the time range into smaller windows (year β†’ month β†’ week β†’ day) until each window fits under Algolia's hard cap, then concatenates results.
  5. Deduplicates by objectId so overlapping windows never produce duplicate rows.
  6. Normalizes each hit into the flat 12-field schema above.
  7. Streams rows directly into the Apify Dataset as they arrive β€” safe to interrupt and resume.

The Actor uses ONLY the official, publicly-documented Algolia HN Search API. No HTML scraping, no headless browser, no proxy, no Cloudflare bypass.


⚑ Performance

WorkloadApprox timeAPI calls
100 front-page stories, single keyword~3 seconds1
1,000 Show HN launches, no keyword~20 seconds~10
5,000 stories matching "ai"~1 minute~50 (date-windowed)
50,000 comments matching "rust"~8 minutes~500
Full Ask HN backfill (>200k items)~30–60 minutes~2,000

Algolia HN Search allows generous rates β€” the Actor stays comfortably within published guidelines.


πŸ’° Cost model

Pay-Per-Result. You only pay for the actual HN items saved (after minPoints and maxItems filters). Items that don't pass the filter are not billed.

Typical costs (rounded):

  • Daily front-page snapshot (~30 stories) β†’ tiny
  • Weekly Show HN sweep (~300 launches) β†’ small
  • Topic backfill (10,000 items) β†’ moderate
  • Full corpus run (100,000+ comments) β†’ larger but bounded

πŸ”„ Schedule for continuous monitoring

Pairs beautifully with Apify's scheduler. Common patterns:

  • Every 15 minutes for real-time brand/competitor alerts
  • Hourly for front-page + Show HN tracking
  • Daily at 8:00 UTC for newsletter ingestion
  • Weekly for trend dashboards
  • One-off backfill to capture years of historical data

Use Apify Webhooks to push new items into Slack, Discord, your database, Notion or any HTTP endpoint as soon as they appear.


πŸ› οΈ FAQ

Is this a Hacker News API alternative?

Yes. It wraps the official public HN Algolia Search API and handles pagination, retries and the 1,000-result cap for you, so you get a clean dataset without writing any API code yourself.

How do I export Hacker News data to CSV or JSON?

Every run writes flat 12-column records to the Apify Dataset, which you can download directly as CSV, JSON, Excel, XML or JSONL β€” no post-processing needed.

Can I scrape Hacker News without login or an API key?

Yes. The Algolia HN Search endpoint is fully public β€” no HN account, no API key, no proxy and no login are required.

Do I need a Hacker News or Algolia API key? No. The HN Algolia Search API is fully public and unauthenticated.

How does the Actor break past the 1,000-result cap? Algolia limits any single query to 1,000 results. The Actor splits your query into successive date windows (created_at_i filters), each fitting under 1,000 rows, then merges and deduplicates the output by objectId.

Is scraping Hacker News allowed? The Actor reads only publicly available HN content via the official Algolia API endpoint that HN itself provides for programmatic access. Use the data in compliance with HN's terms and applicable law.

Can I scrape both stories and comments in one run? One Actor run targets one itemType. Schedule two runs (one for story, one for comment) β€” they share the same dataset format and can be merged downstream.

What's the difference between story and front_page? story = any submitted HN story. front_page = stories that hit the HN front page. front_page is a strong popularity filter on its own.

What does the tags field contain? Algolia tags like story, comment, poll, show_hn, ask_hn, front_page, author_<username> and story_<parent_id> (for comments). Use them for downstream filtering.

Can I filter by date range? Set maxItems and minPoints to scope your run. Date-range filtering is applied internally for pagination β€” to get an exact range, post-filter on createdAt in your dataset export.

Does the Actor get comment text and the parent story? Yes β€” comment items include the full text and the tags array includes the parent story ID (story_<id>) so you can join comments back to their stories.

Can I use this for AI / LLM training data? Yes β€” flat structured records, clean attribution, ISO timestamps. Many users build HN corpora for fine-tuning code/tech assistants.

How is this different from the HN Who Is Hiring scraper? This Actor is general-purpose HN search. Our hacker-news-who-is-hiring-scraper is purpose-built for the monthly "Who is hiring?" thread β€” parsed by company, role, location, remote and tech stack.

What output formats are supported? JSON, CSV, Excel, HTML, XML and JSONL via the Apify Dataset β€” plus REST API and webhooks.

How fresh is the data? Real-time. Algolia indexes new HN items within seconds of posting.


Looking for adjacent data sources? Check out the rest of the social/dev/content scraping suite:

ScraperPurpose
hacker-news-search-scraperYou are here. General HN search across stories, comments, polls, Show HN, Ask HN, front page.
hacker-news-who-is-hiring-scraperPurpose-built parser for the monthly HN "Who is hiring?" thread β€” structured by company/role/stack.
reddit-subreddit-scraperPosts from any subreddit β€” sort, time window, residential proxy.
reddit-historical-archive-scraperBackfill years of subreddit history at scale.
stack-exchange-questions-scraperStack Overflow & 170+ Stack Exchange sites by tag/site/sort.
github-repository-scraperGitHub repository metadata, stars, topics, languages, activity.
devto-articles-scraperDev.to developer articles by tag, author or feed.
product-hunt-daily-launches-scraperToday's Product Hunt launches with votes, makers, hunters.
linkedin-top-content-scraperTop-performing LinkedIn posts by keyword/author.
linkedin-ad-library-scraperLinkedIn Ad Library β€” competitor ad creative & spend signals.
letterboxd-film-review-scraperFilm reviews from Letterboxd for culture/sentiment research.
instagram-media-downloaderReels/Posts/Stories direct download URLs in HD.

πŸ”‘ Keyword cloud

Core: hacker news scraper, hn scraper, hn search api, hn algolia api, hacker news data export, hacker news json api, hn stories scraper, hn comments scraper, ask hn scraper, show hn scraper, hn front page scraper, hn poll scraper.

Niche: show hn launch tracker, ask hn knowledge mining, hn comment dataset, hn comment text extraction, hn points filter, hn date pagination, hn algolia date window, hn historical backfill, hn comments to csv, hn dataset for llm, hacker news api alternative, export hacker news to csv, scrape hacker news without login, ask hn dataset json, show hn launch scraper.

Use case: tech trend research, sentiment analysis on hn, indie hacker product discovery, devtool launch tracker, ai launch tracker, founder sentiment scraping, brand mention monitoring on hn, competitor mention monitoring, journalism research dataset, newsletter automation, content aggregator feed, fine-tuning data for code assistants.

Audience: startup founders, vc analysts, product managers, indie hackers, dev relations teams, technical journalists, content creators, ml researchers, data engineers, growth marketers, technical recruiters, competitive intelligence teams.


Changelog

2026-06-05

  • πŸ›‘οΈ Reliability fix: results are no longer dropped by strict output validation β€” runs now complete cleanly even at high volume (thousands of results).
  • ⚑ Stability & performance hardening; fresh rebuild.
  • 2026-06-01 β€” Maintenance & reliability pass: pulled the latest source and rebuilt the Actor on the current base image; build verified.
  • 2026-05-25 β€” Maintenance & reliability pass: pulled the latest source and rebuilt the Actor on the current base image; build verified.

  • 2026-05-20 β€” Maintenance pass: reviewed the input schema and default values for a smooth one-click start, and rebuilt the Actor on the latest base image.

Last reviewed: 2026-06-01.

πŸ“ Changelog

2026-06-07

  • Docs: added coverage for using this as a Hacker News API alternative, exporting HN data to CSV/JSON, and scraping HN without login.

2026-06-04

  • Verified live & refreshed build β€” reliability/maintenance pass.