Hacker News Stories & Search Scraper avatar

Hacker News Stories & Search Scraper

Pricing

$5.99/month + usage

Go to Apify Store
Hacker News Stories & Search Scraper

Hacker News Stories & Search Scraper

A powerful and flexible scraper for Hacker News that extracts stories, job posts, polls, and comments. Get real-time data from the front page, newest submissions, Ask HN, Show HN, hiring threads, or search results β€” all in clean, structured JSON format.

Pricing

$5.99/month + usage

Rating

0.0

(0)

Developer

Scrape Pilot

Scrape Pilot

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

2 days ago

Last modified

Share


Hacker News Stories & Search Scraper

A powerful and flexible scraper for Hacker News that extracts stories, job posts, polls, and comments. Get real-time data from the front page, newest submissions, Ask HN, Show HN, hiring threads, or search results β€” all in clean, structured JSON format.


πŸš€ Features

  • Multiple Scraping Modes – Front page, newest, Ask HN, Show HN, Jobs, or custom search
  • Rich Story Data – Title, URL, Hacker News link, score, author, comment count, creation time, and story text
  • Flexible Filtering – Filter by item type (story, comment, poll, job) and sort by popularity or date
  • Pagination Control – Set the exact number of results you need with max_results
  • Search Functionality – Search Hacker News by keyword with full-text support
  • Proxy Integration – Built-in Apify proxy with residential groups to avoid rate limiting
  • Clean JSON Output – Ready for analysis, dashboards, or integration with other tools

πŸ“₯ Input Schema

The actor accepts the following JSON input:

FieldTypeRequiredDefaultDescription
proxyConfigurationobjectNoNo proxyProxy settings – use Apify proxy for high-volume scraping.
modestringYes–Which Hacker News section to scrape. Options: "front_page", "newest", "ask", "show", "jobs", or "search".
sortstringNo"popular"Sorting order: "popular" (by score) or "date" (by submission time).
tagsstringNo"story"Filter by item type: "story", "comment", "poll", "job". Multiple tags can be comma-separated (e.g., "story,comment").
max_resultsintegerNo30Maximum number of items to return.
querystringNo*–Required only when mode is "search". The search term.

Example Input (Front Page)

{
"proxyConfiguration": {
"useApifyProxy": true,
"apifyProxyGroups": ["RESIDENTIAL"]
},
"mode": "front_page",
"sort": "popular",
"tags": "story",
"max_results": 30
}
{
"mode": "search",
"query": "artificial intelligence",
"max_results": 20
}

πŸ“€ Output Format

The output is an array of item objects. Each object contains the following fields (some may be null if not applicable):

FieldTypeDescription
story_idstringUnique Hacker News item ID.
titlestringTitle of the story or job post.
urlstringExternal URL linked from the post (if any).
hn_urlstringDirect link to the item on Hacker News.
scoreintegerNumber of points (upvotes).
authorstringUsername of the submitter.
num_commentsintegerNumber of comments on the item.
created_atstringUnix timestamp (seconds) of submission.
story_textstringText content for self-posts, Ask HN, or comments.
typestringItem type: "story", "comment", "poll", "job".

Example Output (Truncated)

[
{
"story_id": "47375682",
"title": "Claude Code's binary reveals silent A/B tests on core features",
"url": "https://backnotprop.com/blog/do-not-ab-test-my-workflow/",
"hn_url": "https://news.ycombinator.com/item?id=47375682",
"score": 63,
"author": "ramoz",
"num_comments": 41,
"created_at": "1773488789",
"story_text": null,
"type": "story"
},
{
"story_id": "47367129",
"title": "1M context is now generally available for Opus 4.6 and Sonnet 4.6",
"url": "https://claude.com/blog/1m-context-ga",
"hn_url": "https://news.ycombinator.com/item?id=47367129",
"score": 807,
"author": "meetpateltech",
"num_comments": 317,
"created_at": "1773422341",
"story_text": null,
"type": "story"
},
{
"story_id": "47336100",
"title": "Show HN: Channel Surfer – Watch YouTube like it’s cable TV",
"url": "https://channelsurfer.tv",
"hn_url": "https://news.ycombinator.com/item?id=47336100",
"score": 529,
"author": "kilroy123",
"num_comments": 156,
"created_at": "1773239697",
"story_text": "I know, it's a very first-world problem. But in my house, we have a hard time deciding what to watch. Too many options! So I made this to recreate Cable TV for YouTube...",
"type": "story"
}
// ... up to max_results
]

πŸ›  Usage on Apify

  1. Create a new task with this actor.
  2. Provide input as a JSON object (see examples above).
  3. Run the actor – results will be stored in Apify dataset.
  4. Download results as JSON, CSV, XML, or HTML.

Running via API

Trigger runs programmatically using the Apify API:

curl -X POST "https://api.apify.com/v2/acts/your-username~hacker-news-scraper/runs" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_API_TOKEN" \
-d '{
"mode": "front_page",
"max_results": 30
}'

πŸ“Š Use Cases

  • Trend Analysis – Track which topics are gaining traction over time
  • Job Market Research – Monitor "Who is hiring?" posts and required skills
  • Content Curation – Build custom feeds or newsletters based on Hacker News data
  • Social Listening – Analyze comments and sentiment around specific technologies
  • Academic Research – Study online communities and information diffusion patterns
  • Investor Intelligence – Identify emerging startups and technologies through Show HN posts

βš™οΈ Mode Details

ModeDescription
front_pageTop stories currently on the Hacker News homepage
newestNewest submissions (What's New)
askAsk HN: Questions from the community
showShow HN: Projects and products built by community members
jobsWho is hiring? – Job postings
searchSearch Hacker News by keyword (requires query parameter)

⚠️ Notes & Limitations

  • Rate Limiting – Hacker News is generally permissive, but excessive requests may be throttled. Use proxies for large-scale scraping.
  • Item Types – The tags field accepts Hacker News-style tags (story, comment, poll, job). In search mode, tags can be combined.
  • Timestamp – created_at is a Unix timestamp in seconds. Convert to human-readable format as needed (e.g., new Date(created_at * 1000)).
  • Story Text – For Ask HN and some Show HN posts, the story_text field contains the self-post content. May contain HTML entities.

πŸ“¦ Changelog

v1.0.0 (2025-03-15)

  • Initial release
  • Support for front page, newest, ask, show, jobs, and search modes
  • Sorting by popularity or date
  • Tag filtering for item types
  • Proxy integration for reliable scraping