Hacker News Scraper — Stories, Comments & Jobs avatar

Hacker News Scraper — Stories, Comments & Jobs

Pricing

$4.99/month + usage

Go to Apify Store
Hacker News Scraper — Stories, Comments & Jobs

Hacker News Scraper — Stories, Comments & Jobs

Hacker News scraper 2026 — extract posts, comments, karma and user profiles without API key. Pay-per-result pricing. Returns structured JSON. Perfect for tech trend monitoring, brand mentions and developer community research.

Pricing

$4.99/month + usage

Rating

0.0

(0)

Developer

Web Data Labs

Web Data Labs

Maintained by Community

Actor stats

0

Bookmarked

6

Total users

2

Monthly active users

a day ago

Last modified

Share

Hacker News Scraper — Extract Stories, Comments & User Profiles from HN

Scrape Hacker News stories, comments, and user profiles. Browse top, new, best, Ask HN, Show HN, and job stories. Search by keyword with date filtering. Get full comment trees. Fetch user profiles with karma and activity data.

Why Use This Hacker News Scraper?

Hacker News is the internet's most influential tech community — where startups launch, trends emerge, and technical discussions shape the industry. While HN has a public API, building a reliable scraper around it requires handling concurrency, pagination, rate limits, and data formatting.

This actor handles all of that for you:

  • All HN categories — Top, New, Best, Ask HN, Show HN, Jobs
  • Full-text search with date range filtering (via Algolia)
  • Comment trees — fetch full discussion threads with nested replies
  • User profiles — karma, account age, submission count
  • Filters — minimum score, minimum comments, keyword matching
  • Concurrent fetching — fast parallel requests for large datasets
  • Clean JSON output — structured, consistent, ready to use

Use Cases

1. Tech Trend Monitoring

Track emerging technologies, frameworks, and tools by monitoring what gets upvoted on HN. Spot trends weeks before they hit mainstream tech media.

2. Startup & Product Launch Tracking

Monitor "Show HN" and "Launch HN" posts to discover new startups and products the moment they're announced to the tech community.

3. Competitive Intelligence

Search for mentions of your competitors, their products, or your market category. Analyze how the tech community perceives them.

4. Content Research & Curation

Find the most discussed and upvoted content in your niche. Build newsletters, blog post ideas, or social media content based on what resonates with the HN audience.

5. Recruiting & Talent Intelligence

Monitor "Who is Hiring?" threads and job posts. Analyze hiring trends, popular technologies in job listings, and salary discussions.

6. Academic Research

Build datasets of tech community discussions for research on information diffusion, opinion formation, or technology adoption patterns.

7. Investment & Market Analysis

Track discussions about specific companies, funding rounds, or market segments. HN comments often contain insider perspectives and technical due diligence.

8. Developer Relations & DevTool Marketing

Understand how developers discuss tools and frameworks. Find pain points, feature requests, and sentiment about your developer product.

9. Technical Due Diligence

Research specific technologies or companies by searching HN discussions. Get candid technical opinions from experienced practitioners.

Input Parameters

ParameterTypeRequiredDefaultDescription
categorystringNotopStory category: top, new, best, ask, show, jobs, search
searchQuerystringNoSearch query (only when category is search)
sortBystringNorelevanceSearch sort: relevance or date
sincestringNoStart date filter for search (YYYY-MM-DD)
untilstringNoEnd date filter for search (YYYY-MM-DD)
storyIdsarrayNoFetch specific stories by HN ID (overrides category)
maxItemsintegerNo100Maximum stories to return (1–5000)
minScoreintegerNo0Minimum upvote score filter
minCommentsintegerNo0Minimum comment count filter
keywordstringNoFilter stories by title keyword (case-insensitive)
includeCommentsbooleanNofalseFetch full comment trees for each story
maxCommentsPerStoryintegerNo50Max comments per story (1–1000)
scrapeTypestringNostoriesOutput type: stories, users, or both

Example Input

Get Top Stories

{
"category": "top",
"maxItems": 30,
"minScore": 100
}

Search with Date Range

{
"category": "search",
"searchQuery": "AI agents",
"sortBy": "date",
"since": "2026-03-01",
"until": "2026-03-31",
"maxItems": 200
}

Get Story with Comments

{
"storyIds": ["39876543"],
"includeComments": true,
"maxCommentsPerStory": 100
}

Show HN Posts This Week

{
"category": "show",
"maxItems": 50,
"minScore": 10
}

Fetch User Profiles

{
"category": "top",
"maxItems": 100,
"scrapeType": "users"
}

Sample Output

Story

{
"id": 39876543,
"title": "Show HN: I built an open-source alternative to Notion",
"url": "https://example.com/my-project",
"domain": "example.com",
"text": null,
"author": "builder_dev",
"score": 847,
"commentCount": 234,
"createdAt": "2026-03-15T10:30:00.000Z",
"createdAtUnix": 1773835800,
"storyType": "show",
"hnUrl": "https://news.ycombinator.com/item?id=39876543",
"dead": false,
"kids": [39876600, 39876601, 39876602]
}

Comment

{
"id": 39876600,
"text": "This looks great. How does it handle real-time collaboration?",
"author": "curious_dev",
"parentId": 39876543,
"storyId": 39876543,
"createdAt": "2026-03-15T10:45:00.000Z",
"depth": 0,
"kids": [39876610, 39876611]
}

User Profile

{
"username": "builder_dev",
"karma": 12847,
"about": "Building tools for developers. Previously at BigCorp.",
"createdAt": "2019-06-15T00:00:00.000Z",
"createdAtUnix": 1560556800,
"submittedCount": 342,
"profileUrl": "https://news.ycombinator.com/user?id=builder_dev"
}

Integration Examples

Python

from apify_client import ApifyClient
client = ApifyClient("YOUR_API_TOKEN")
run_input = {
"category": "search",
"searchQuery": "machine learning",
"sortBy": "date",
"since": "2026-03-01",
"maxItems": 100,
"minScore": 50
}
run = client.actor("cryptosignals/hackernews-scraper").call(run_input=run_input)
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
print(f"[{item['score']}] {item['title']}{item.get('domain', 'self')}")

Node.js

import { ApifyClient } from 'apify-client';
const client = new ApifyClient({ token: 'YOUR_API_TOKEN' });
const input = {
category: "top",
maxItems: 50,
minScore: 100,
includeComments: true,
maxCommentsPerStory: 20
};
const run = await client.actor("cryptosignals/hackernews-scraper").call(input);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach(item => {
if (item.title) {
console.log(`[${item.score}] ${item.title} (${item.commentCount} comments)`);
}
});

Automated Daily Digest

Build a daily HN digest delivered to your inbox or Slack:

  1. Go to the actor page and click Schedule
  2. Set it to run daily at your preferred time
  3. Configure: category: "top", maxItems: 20, minScore: 200
  4. Add a webhook to send results to Slack, email, or your CMS

Pricing & Cost Estimates

This actor uses Apify's pay-per-event (PPE) pricing — pay only for what you scrape.

Use CaseItemsEstimated Cost
Daily top stories30 stories~$0.30
Weekly trend search200 stories~$2.00
Deep research with comments50 stories + comments~$3.00
User profile batch100 users~$1.00

Free tier: Apify provides $5 in free monthly credits — enough for daily HN monitoring at no cost.

Tips to minimize costs:

  • Use minScore and minComments filters to skip low-value stories
  • Set includeComments to false if you only need story metadata
  • Limit maxCommentsPerStory to reduce data volume

Frequently Asked Questions

Does this scraper require a Hacker News account?

No. All data is fetched from HN's public Firebase API and Algolia search API. No authentication needed.

How is search different from category browsing?

Category browsing (top, new, best, etc.) fetches HN's official ranked lists. Search mode uses Algolia's full-text index, which supports keyword queries and date range filtering.

Can I get historical data?

Yes. Use search mode with since and until date parameters to fetch stories from any time period. Algolia indexes HN stories going back years.

How do I fetch comments for a specific story?

Provide the story ID in the storyIds array and set includeComments to true. Comments are returned as a flat list with depth and parentId fields to reconstruct the tree.

What's the difference between stories, users, and both scrape types?

  • stories returns story/post data only (default)
  • users fetches profile data for the authors of matched stories
  • both returns stories AND their authors' profiles in the same dataset

Can I filter stories by domain or source?

Not directly as an input parameter, but you can use the keyword filter on titles, or post-process results by filtering on the domain field in the output.

Is there a maximum number of stories I can fetch?

Up to 5,000 stories per run. For category browsing, HN's API returns a maximum of 500 story IDs per category. For search, Algolia supports deep pagination.

Can I export data as CSV?

Yes. All results are stored in Apify datasets, which support one-click export to JSON, CSV, Excel, XML, and RSS formats.

How fast is this scraper?

It uses concurrent fetching (20 parallel requests) for story details. A typical run of 100 stories completes in under 30 seconds. Adding comments increases runtime proportionally.

Can I use this for real-time monitoring?

Yes. Schedule the actor to run every hour (or more frequently) with specific search terms or categories. Combine with webhooks to get instant notifications.

Support & Updates

This actor is actively maintained and updated. Need help?

  • Report issues via the Issues tab on this actor's page
  • Request features by leaving a comment
  • Star this actor to help others find it

Built and maintained by cryptosignals on Apify.


⭐ Found this useful?

If this actor saved you time, please leave a review on the Apify Store — it takes 30 seconds and helps other developers find it.

Questions or issues? Drop a comment below and I'll respond within 24 hours.