Hacker News Data Scraper avatar

Hacker News Data Scraper

Pricing

from $2.99 / 1,000 results

Go to Apify Store
Hacker News Data Scraper

Hacker News Data Scraper

Hacker News scraper that pulls stories, jobs, Ask HN and Show HN posts from news.ycombinator.com, so developers and SEO teams can track tech trends and job listings without manual browsing.

Pricing

from $2.99 / 1,000 results

Rating

0.0

(0)

Developer

Kawsar

Kawsar

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

a day ago

Last modified

Share

Hacker News Data Scraper: extract stories, jobs, and posts from news.ycombinator.com

Pulls structured data from news.ycombinator.com. Covers all six feeds (top, new, best, ask, show, and jobs), returns post titles, URLs, points, authors, comment counts, and post types, and pages through automatically until you hit your item limit. Works with any HN feed URL — paste a URL like https://news.ycombinator.com/show?p=5 into Start URLs and it will paginate forward from that page.

What data does this actor return?

FieldTypeDescriptionExample
itemIdintegerHacker News item ID48031684
rankintegerPosition in the feed1
storyTitlestringPost title"Agents can now create Cloudflare accounts"
urlstringLinked URL (internal HN link for Ask/Show)https://blog.cloudflare.com/...
domainstringDomain extracted from the linked URLcloudflare.com
pointsinteger|nullUpvote score (null for job posts)200
authorstring|nullSubmitter username (null for job posts)rolph
commentCountinteger|nullNumber of comments (null for job posts)108
commentsUrlstring|nullHN discussion thread URLhttps://news.ycombinator.com/item?id=...
agestringPost age as displayed on HN3 hours ago
postTypestringOne of: story, job, ask, show, launchstory
scrapedAtstringISO 8601 UTC timestamp2026-05-06T10:00:00+00:00

How to use

Option 1: Scrape a feed

  1. Open the input tab
  2. Pick a feed type: top, new, best, ask, show, or jobs
  3. Set your item limit (up to 1000)
  4. Click Run

The actor pages through HN automatically (30 items per page) until it hits your limit.

Option 2: Start from a specific page

Add any HN feed URL to the Start URLs field. The actor detects the page number from the URL and paginates forward from there.

Examples:

  • https://news.ycombinator.com/show?p=3 — starts at Show HN page 3 and pages forward
  • https://news.ycombinator.com/newest — scrapes the New feed from page 1
  • https://news.ycombinator.com/ask?p=10 — starts at Ask HN page 10

Multiple URLs are supported. The actor processes each in order and stops when it hits your item limit.

Input

ParameterTypeDefaultDescription
feedTypestring (select)topFeed to scrape when no Start URLs are set
startUrlsarray of strings[]HN URLs to start paginating from. Overrides Feed type.
maxItemsinteger100Max items to collect per run (up to 1000)
requestTimeoutSecsinteger30Per-request timeout in seconds

Feed type options

ValueURLDescription
topnews.ycombinator.com/Front page — highest-voted recent stories
newnews.ycombinator.com/newestNewest submissions, unfiltered
bestnews.ycombinator.com/bestHighest-voted of all time
asknews.ycombinator.com/askAsk HN posts only
shownews.ycombinator.com/showShow HN and Launch HN posts only
jobsnews.ycombinator.com/jobsYC startup job listings

Example output

[
{
"itemId": 48031684,
"rank": 1,
"storyTitle": "Agents can now create Cloudflare accounts, buy domains, and deploy products",
"url": "https://blog.cloudflare.com/agents-stripe-projects/",
"domain": "cloudflare.com",
"points": 200,
"author": "rolph",
"commentCount": 108,
"commentsUrl": "https://news.ycombinator.com/item?id=48031684",
"age": "3 hours ago",
"postType": "story",
"scrapedAt": "2026-05-06T10:00:00.000000+00:00"
},
{
"itemId": 48025244,
"rank": 1,
"storyTitle": "Proliferate (YC S25) Is Hiring",
"url": "https://www.ycombinator.com/companies/proliferate/jobs/...",
"domain": "ycombinator.com",
"points": null,
"author": null,
"commentCount": null,
"commentsUrl": "https://news.ycombinator.com/item?id=48025244",
"age": "13 hours ago",
"postType": "job",
"scrapedAt": "2026-05-06T10:00:00.000000+00:00"
}
]

How pagination works

Each HN feed page returns 30 items. The actor increments the ?p= query parameter and fetches the next page until either your maxItems limit is reached or there are no more items. If you set maxItems to 300, the actor fetches 10 pages automatically.

When you use Start URLs with a page number (e.g. ?p=5), the actor starts at that page and paginates forward — it does not go back to page 1.

Use cases

  • SEO research: track which tech topics trend on HN and use that to shape your content calendar
  • Job market monitoring: collect startup listings from the jobs feed and compare them week over week
  • Show HN and Launch HN watching: see what new products the community pays attention to
  • Content curation: pull top stories automatically for newsletters or internal feeds
  • Dataset building: community engagement data (points, comment counts) across thousands of posts over time
  • Competitive intelligence: monitor mentions of competitor products or technologies in trending discussions

Scheduling

To collect HN data on a recurring schedule, use Apify's built-in scheduler:

  1. Go to your actor page and click Schedules
  2. Set a cron expression (e.g. 0 9 * * * for 9am daily)
  3. Configure the input (feed type, item limit)
  4. Each run's results land in a separate dataset

This works well for building historical trend datasets over days or weeks.

Limitations

  • Max 1000 items per run (HN has no API rate limit, but this keeps run costs predictable)
  • Comment content is not extracted — post-level data only
  • Job posts return null for points, author, and commentCount (HN does not display these for jobs)
  • HN's "best" feed is relatively small — it may return fewer than 200 unique items before repeating
  • The age field is a human-readable string from HN ("3 hours ago"), not a parsed timestamp

FAQ

What feeds are supported? Top, new, best, ask, show, and jobs.

How many items can I collect per run? Up to 1000. Each page has 30 items and the actor pages through automatically.

Can I start scraping from a specific page? Yes. Add a URL like https://news.ycombinator.com/show?p=5 to Start URLs. The actor reads the page number from the URL and paginates forward from there.

Can I scrape multiple feeds in one run? Yes. Add multiple feed URLs to Start URLs (e.g. both /show and /ask) and the actor will scrape each in sequence until maxItems is reached.

Does it scrape comments? No. Post-level only: title, URL, points, author, comment count. Comment text is not extracted.

Do job posts include points and author? No. HN job posts do not show vote counts or usernames. Those fields come back null.

How does post type detection work? Title prefix: "Ask HN:" becomes ask, "Show HN:" becomes show, "Launch HN:" becomes launch. Posts from the jobs feed are always tagged job. Everything else is story.

Can I export results to CSV or Excel? Yes. In the Apify dataset view, click Export and choose CSV, Excel, JSON, or JSONL.