Hacker News Stories & Search Scraper
Pricing
$5.99/month + usage
Hacker News Stories & Search Scraper
A powerful and flexible scraper for Hacker News that extracts stories, job posts, polls, and comments. Get real-time data from the front page, newest submissions, Ask HN, Show HN, hiring threads, or search results β all in clean, structured JSON format.
Pricing
$5.99/month + usage
Rating
0.0
(0)
Developer

Scrape Pilot
Actor stats
0
Bookmarked
2
Total users
1
Monthly active users
2 days ago
Last modified
Categories
Share
Hacker News Stories & Search Scraper
A powerful and flexible scraper for Hacker News that extracts stories, job posts, polls, and comments. Get real-time data from the front page, newest submissions, Ask HN, Show HN, hiring threads, or search results β all in clean, structured JSON format.
π Features
- Multiple Scraping Modes β Front page, newest, Ask HN, Show HN, Jobs, or custom search
- Rich Story Data β Title, URL, Hacker News link, score, author, comment count, creation time, and story text
- Flexible Filtering β Filter by item type (story, comment, poll, job) and sort by popularity or date
- Pagination Control β Set the exact number of results you need with
max_results - Search Functionality β Search Hacker News by keyword with full-text support
- Proxy Integration β Built-in Apify proxy with residential groups to avoid rate limiting
- Clean JSON Output β Ready for analysis, dashboards, or integration with other tools
π₯ Input Schema
The actor accepts the following JSON input:
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
proxyConfiguration | object | No | No proxy | Proxy settings β use Apify proxy for high-volume scraping. |
mode | string | Yes | β | Which Hacker News section to scrape. Options: "front_page", "newest", "ask", "show", "jobs", or "search". |
sort | string | No | "popular" | Sorting order: "popular" (by score) or "date" (by submission time). |
tags | string | No | "story" | Filter by item type: "story", "comment", "poll", "job". Multiple tags can be comma-separated (e.g., "story,comment"). |
max_results | integer | No | 30 | Maximum number of items to return. |
query | string | No* | β | Required only when mode is "search". The search term. |
Example Input (Front Page)
{"proxyConfiguration": {"useApifyProxy": true,"apifyProxyGroups": ["RESIDENTIAL"]},"mode": "front_page","sort": "popular","tags": "story","max_results": 30}
Example Input (Search)
{"mode": "search","query": "artificial intelligence","max_results": 20}
π€ Output Format
The output is an array of item objects. Each object contains the following fields (some may be null if not applicable):
| Field | Type | Description |
|---|---|---|
story_id | string | Unique Hacker News item ID. |
title | string | Title of the story or job post. |
url | string | External URL linked from the post (if any). |
hn_url | string | Direct link to the item on Hacker News. |
score | integer | Number of points (upvotes). |
author | string | Username of the submitter. |
num_comments | integer | Number of comments on the item. |
created_at | string | Unix timestamp (seconds) of submission. |
story_text | string | Text content for self-posts, Ask HN, or comments. |
type | string | Item type: "story", "comment", "poll", "job". |
Example Output (Truncated)
[{"story_id": "47375682","title": "Claude Code's binary reveals silent A/B tests on core features","url": "https://backnotprop.com/blog/do-not-ab-test-my-workflow/","hn_url": "https://news.ycombinator.com/item?id=47375682","score": 63,"author": "ramoz","num_comments": 41,"created_at": "1773488789","story_text": null,"type": "story"},{"story_id": "47367129","title": "1M context is now generally available for Opus 4.6 and Sonnet 4.6","url": "https://claude.com/blog/1m-context-ga","hn_url": "https://news.ycombinator.com/item?id=47367129","score": 807,"author": "meetpateltech","num_comments": 317,"created_at": "1773422341","story_text": null,"type": "story"},{"story_id": "47336100","title": "Show HN: Channel Surfer β Watch YouTube like itβs cable TV","url": "https://channelsurfer.tv","hn_url": "https://news.ycombinator.com/item?id=47336100","score": 529,"author": "kilroy123","num_comments": 156,"created_at": "1773239697","story_text": "I know, it's a very first-world problem. But in my house, we have a hard time deciding what to watch. Too many options! So I made this to recreate Cable TV for YouTube...","type": "story"}// ... up to max_results]
π Usage on Apify
- Create a new task with this actor.
- Provide input as a JSON object (see examples above).
- Run the actor β results will be stored in Apify dataset.
- Download results as JSON, CSV, XML, or HTML.
Running via API
Trigger runs programmatically using the Apify API:
curl -X POST "https://api.apify.com/v2/acts/your-username~hacker-news-scraper/runs" \-H "Content-Type: application/json" \-H "Authorization: Bearer YOUR_API_TOKEN" \-d '{"mode": "front_page","max_results": 30}'
π Use Cases
- Trend Analysis β Track which topics are gaining traction over time
- Job Market Research β Monitor "Who is hiring?" posts and required skills
- Content Curation β Build custom feeds or newsletters based on Hacker News data
- Social Listening β Analyze comments and sentiment around specific technologies
- Academic Research β Study online communities and information diffusion patterns
- Investor Intelligence β Identify emerging startups and technologies through Show HN posts
βοΈ Mode Details
| Mode | Description |
|---|---|
front_page | Top stories currently on the Hacker News homepage |
newest | Newest submissions (What's New) |
ask | Ask HN: Questions from the community |
show | Show HN: Projects and products built by community members |
jobs | Who is hiring? β Job postings |
search | Search Hacker News by keyword (requires query parameter) |
β οΈ Notes & Limitations
- Rate Limiting β Hacker News is generally permissive, but excessive requests may be throttled. Use proxies for large-scale scraping.
- Item Types β The
tagsfield accepts Hacker News-style tags (story,comment,poll,job). In search mode, tags can be combined. - Timestamp β
created_atis a Unix timestamp in seconds. Convert to human-readable format as needed (e.g.,new Date(created_at * 1000)). - Story Text β For Ask HN and some Show HN posts, the
story_textfield contains the self-post content. May contain HTML entities.
π¦ Changelog
v1.0.0 (2025-03-15)
- Initial release
- Support for front page, newest, ask, show, jobs, and search modes
- Sorting by popularity or date
- Tag filtering for item types
- Proxy integration for reliable scraping