🟧 Hacker News Scraper — Stories, Comments & Search by Keyword avatar

🟧 Hacker News Scraper — Stories, Comments & Search by Keyword

Pricing

from $0.01 / 1,000 results

Go to Apify Store
🟧 Hacker News Scraper — Stories, Comments & Search by Keyword

🟧 Hacker News Scraper — Stories, Comments & Search by Keyword

Search and scrape Hacker News stories, comments, and polls by keyword — points, authors, comment counts, dates, and links. Powered by the official HN API.

Pricing

from $0.01 / 1,000 results

Rating

0.0

(0)

Developer

Is Koren

Is Koren

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

a day ago

Last modified

Share

Scrape Hacker News at scale with a fast, reliable Hacker News scraper built on the official Algolia HN Search API. Search Hacker News by keyword to pull matching stories, comments, and polls, or grab the current front page and newest items — no API key, no login, and no anti-bot headaches. Every result is emitted as one clean, structured JSON record ready for analysis, dashboards, alerting, or AI pipelines.

Whether you are tracking what Hacker News says about your product, monitoring keywords like "artificial intelligence", or building a dataset of top stories, this Hacker News scraper gets you there in seconds.

✨ Features

  • 🔎 Keyword search across Hacker News stories, comments, and polls.
  • 📰 Front page mode — scrape the current HN front page without a query.
  • 🏆 Sort by relevance or date (newest first).
  • 🎯 Numeric filters — only keep items above a minimum points or comment count.
  • 📄 Automatic pagination up to the Algolia ~1000-result cap.
  • 🧱 Flat, structured output — one record per result, ready for CSV/JSON/Excel export.
  • 🛡️ No anti-bot issues — uses the public Algolia HN API, so runs are cheap and stable.

🚀 Quick start

Paste this input to scrape the top 10 stories about artificial intelligence:

{
"query": "artificial intelligence",
"contentType": "story",
"sortBy": "relevance",
"maxItems": 10
}

Scrape the current front page (no keyword needed):

{
"query": "",
"contentType": "front_page",
"sortBy": "relevance",
"maxItems": 30
}

Find the newest highly-upvoted discussions about a topic:

{
"query": "rust programming",
"contentType": "story",
"sortBy": "date",
"minPoints": 50,
"maxItems": 100
}

⚙️ Input

FieldTypeDefaultDescription
querystring"artificial intelligence"Keyword or phrase to search for. Leave empty to fetch the latest items / front page.
contentTypeselectstoryWhat to scrape: story, comment, poll, or front_page.
sortByselectrelevancerelevance (best match) or date (newest first).
maxItemsinteger50Maximum total results (1–1000; Algolia caps near 1000).
minPointsintegerOnly keep items with at least this many points.
minCommentsintegerOnly keep items with at least this many comments.
proxyConfigurationproxy{ "useApifyProxy": true }Proxy settings. Datacenter proxies work fine here.

📤 Output

Each result is pushed as one record to the dataset. Example story record:

{
"query": "artificial intelligence",
"objectID": "39038064",
"title": "The rise of artificial intelligence agents",
"url": "https://example.com/ai-agents",
"author": "pg",
"points": 412,
"numComments": 187,
"createdAt": "2026-01-12T09:33:00.000Z",
"createdAtTimestamp": 1768210380,
"hnUrl": "https://news.ycombinator.com/item?id=39038064",
"storyText": null,
"tags": ["story", "author_pg", "story_39038064"]
}

Comment records additionally include commentText, storyId, and parentId.

FieldDescription
queryThe search query used for the run.
objectIDUnique Hacker News item ID.
titleStory/poll title (null for comments).
urlExternal link (null for Ask/Show HN and text posts).
authorHacker News username of the author.
pointsScore / upvotes.
numCommentsNumber of comments on the item.
createdAtISO 8601 creation timestamp.
createdAtTimestampUnix creation timestamp.
hnUrlCanonical Hacker News discussion URL.
storyTextHTML-stripped self/Ask HN text (if any).
tagsAlgolia _tags array.
commentText(Comments only) HTML-stripped comment body.
storyId(Comments only) ID of the parent story.
parentId(Comments only) ID of the direct parent item.

❓ FAQ

Do I need a Hacker News API key? No. This Hacker News scraper uses the free, public Algolia HN Search API — no key or login.

How many results can I get? The Algolia HN API caps results at roughly 1000 per query. Set maxItems accordingly.

Why is url sometimes null? Ask HN, Show HN, and text posts have no external link, so url is null. Use hnUrl for the discussion page and storyText for the body.

Can I scrape only comments? Yes — set contentType to comment. Records will include commentText, storyId, and parentId.

Will I get rate-limited or blocked? The Algolia HN API is very tolerant and has no anti-bot protection, so datacenter proxies are fine.

💡 Tips

  • Use sortBy: "date" with minPoints to build a feed of fresh, already-popular discussions.
  • Combine query with contentType: "comment" to mine sentiment and opinions on a topic.
  • Leave query empty and set contentType: "front_page" to snapshot the HN front page on a schedule.
  • Schedule this actor to run hourly to monitor a keyword and feed results into Slack or a webhook.