Hacker News Data Scraper
Pricing
from $2.99 / 1,000 results
Hacker News Data Scraper
Hacker News scraper that pulls stories, jobs, Ask HN and Show HN posts from news.ycombinator.com, so developers and SEO teams can track tech trends and job listings without manual browsing.
Pricing
from $2.99 / 1,000 results
Rating
0.0
(0)
Developer
Kawsar
Actor stats
0
Bookmarked
2
Total users
1
Monthly active users
a day ago
Last modified
Categories
Share
Hacker News Data Scraper: extract stories, jobs, and posts from news.ycombinator.com
Pulls structured data from news.ycombinator.com. Covers all six feeds (top, new, best, ask, show, and jobs), returns post titles, URLs, points, authors, comment counts, and post types, and pages through automatically until you hit your item limit. Works with any HN feed URL — paste a URL like https://news.ycombinator.com/show?p=5 into Start URLs and it will paginate forward from that page.
What data does this actor return?
| Field | Type | Description | Example |
|---|---|---|---|
itemId | integer | Hacker News item ID | 48031684 |
rank | integer | Position in the feed | 1 |
storyTitle | string | Post title | "Agents can now create Cloudflare accounts" |
url | string | Linked URL (internal HN link for Ask/Show) | https://blog.cloudflare.com/... |
domain | string | Domain extracted from the linked URL | cloudflare.com |
points | integer|null | Upvote score (null for job posts) | 200 |
author | string|null | Submitter username (null for job posts) | rolph |
commentCount | integer|null | Number of comments (null for job posts) | 108 |
commentsUrl | string|null | HN discussion thread URL | https://news.ycombinator.com/item?id=... |
age | string | Post age as displayed on HN | 3 hours ago |
postType | string | One of: story, job, ask, show, launch | story |
scrapedAt | string | ISO 8601 UTC timestamp | 2026-05-06T10:00:00+00:00 |
How to use
Option 1: Scrape a feed
- Open the input tab
- Pick a feed type: top, new, best, ask, show, or jobs
- Set your item limit (up to 1000)
- Click Run
The actor pages through HN automatically (30 items per page) until it hits your limit.
Option 2: Start from a specific page
Add any HN feed URL to the Start URLs field. The actor detects the page number from the URL and paginates forward from there.
Examples:
https://news.ycombinator.com/show?p=3— starts at Show HN page 3 and pages forwardhttps://news.ycombinator.com/newest— scrapes the New feed from page 1https://news.ycombinator.com/ask?p=10— starts at Ask HN page 10
Multiple URLs are supported. The actor processes each in order and stops when it hits your item limit.
Input
| Parameter | Type | Default | Description |
|---|---|---|---|
feedType | string (select) | top | Feed to scrape when no Start URLs are set |
startUrls | array of strings | [] | HN URLs to start paginating from. Overrides Feed type. |
maxItems | integer | 100 | Max items to collect per run (up to 1000) |
requestTimeoutSecs | integer | 30 | Per-request timeout in seconds |
Feed type options
| Value | URL | Description |
|---|---|---|
top | news.ycombinator.com/ | Front page — highest-voted recent stories |
new | news.ycombinator.com/newest | Newest submissions, unfiltered |
best | news.ycombinator.com/best | Highest-voted of all time |
ask | news.ycombinator.com/ask | Ask HN posts only |
show | news.ycombinator.com/show | Show HN and Launch HN posts only |
jobs | news.ycombinator.com/jobs | YC startup job listings |
Example output
[{"itemId": 48031684,"rank": 1,"storyTitle": "Agents can now create Cloudflare accounts, buy domains, and deploy products","url": "https://blog.cloudflare.com/agents-stripe-projects/","domain": "cloudflare.com","points": 200,"author": "rolph","commentCount": 108,"commentsUrl": "https://news.ycombinator.com/item?id=48031684","age": "3 hours ago","postType": "story","scrapedAt": "2026-05-06T10:00:00.000000+00:00"},{"itemId": 48025244,"rank": 1,"storyTitle": "Proliferate (YC S25) Is Hiring","url": "https://www.ycombinator.com/companies/proliferate/jobs/...","domain": "ycombinator.com","points": null,"author": null,"commentCount": null,"commentsUrl": "https://news.ycombinator.com/item?id=48025244","age": "13 hours ago","postType": "job","scrapedAt": "2026-05-06T10:00:00.000000+00:00"}]
How pagination works
Each HN feed page returns 30 items. The actor increments the ?p= query parameter and fetches the next page until either your maxItems limit is reached or there are no more items. If you set maxItems to 300, the actor fetches 10 pages automatically.
When you use Start URLs with a page number (e.g. ?p=5), the actor starts at that page and paginates forward — it does not go back to page 1.
Use cases
- SEO research: track which tech topics trend on HN and use that to shape your content calendar
- Job market monitoring: collect startup listings from the jobs feed and compare them week over week
- Show HN and Launch HN watching: see what new products the community pays attention to
- Content curation: pull top stories automatically for newsletters or internal feeds
- Dataset building: community engagement data (points, comment counts) across thousands of posts over time
- Competitive intelligence: monitor mentions of competitor products or technologies in trending discussions
Scheduling
To collect HN data on a recurring schedule, use Apify's built-in scheduler:
- Go to your actor page and click Schedules
- Set a cron expression (e.g.
0 9 * * *for 9am daily) - Configure the input (feed type, item limit)
- Each run's results land in a separate dataset
This works well for building historical trend datasets over days or weeks.
Limitations
- Max 1000 items per run (HN has no API rate limit, but this keeps run costs predictable)
- Comment content is not extracted — post-level data only
- Job posts return null for
points,author, andcommentCount(HN does not display these for jobs) - HN's "best" feed is relatively small — it may return fewer than 200 unique items before repeating
- The
agefield is a human-readable string from HN ("3 hours ago"), not a parsed timestamp
FAQ
What feeds are supported? Top, new, best, ask, show, and jobs.
How many items can I collect per run? Up to 1000. Each page has 30 items and the actor pages through automatically.
Can I start scraping from a specific page?
Yes. Add a URL like https://news.ycombinator.com/show?p=5 to Start URLs. The actor reads the page number from the URL and paginates forward from there.
Can I scrape multiple feeds in one run?
Yes. Add multiple feed URLs to Start URLs (e.g. both /show and /ask) and the actor will scrape each in sequence until maxItems is reached.
Does it scrape comments? No. Post-level only: title, URL, points, author, comment count. Comment text is not extracted.
Do job posts include points and author? No. HN job posts do not show vote counts or usernames. Those fields come back null.
How does post type detection work? Title prefix: "Ask HN:" becomes ask, "Show HN:" becomes show, "Launch HN:" becomes launch. Posts from the jobs feed are always tagged job. Everything else is story.
Can I export results to CSV or Excel? Yes. In the Apify dataset view, click Export and choose CSV, Excel, JSON, or JSONL.