Hacker News Scraper
Pricing
from $0.65 / run started
Hacker News Scraper
Reliable Apify actor for scraping public Hacker News sections with HTML-only crawling. Extract rank, title, URL, points, author, age, and comment count in a clean dataset for trend tracking, research, content discovery, and automation.
Pricing
from $0.65 / run started
Rating
0.0
(0)
Developer
Techionik
Actor stats
0
Bookmarked
2
Total users
1
Monthly active users
3 days ago
Last modified
Categories
Share
Hacker News Scraper
A fast, lightweight, and marketplace-ready Apify actor for scraping public Hacker News listings using plain HTML parsing.
This actor is built specifically for Hacker News and uses CheerioCrawler instead of a full browser, making it efficient, low-cost, and reliable for structured data extraction. It collects one clean dataset item per post and supports the main Hacker News sections used for trending stories, discovery, monitoring, and research workflows.
Features
- Scrapes public Hacker News listing pages
- Extracts one clean record per post
- Supports multiple Hacker News sections
- Automatically paginates until the requested result limit is reached
- Uses HTML-only crawling for faster and cheaper runs
- Keeps input simple and user-friendly
Supported Page Types
- front
- newest
- ask
- show
- jobs
- best
- active
- classic
Extracted Fields
Each result may include the following fields:
- pageType
- rank
- title
- url
- points
- author
- age
- commentsCount
Input
This actor uses a very simple input format:
- pageType: The Hacker News section to scrape
- maxResults: The maximum number of posts to extract
Example input:
{ "pageType": "best", "maxResults": 10 }
Output
The actor returns one dataset item per Hacker News post.
Example output:
{ "pageType": "best", "rank": 1, "title": "Ghostty is leaving GitHub", "url": "https://mitchellh.com/writing/ghostty-leaving-github", "points": 3400, "author": "WadeGrimridge", "age": "1 day ago", "commentsCount": 1015 }
Notes About the Data
Some Hacker News sections do not always expose the same metadata.
For example, on the jobs page, fields like points, author, or commentsCount may be missing on the page itself. In such cases, the actor returns default values such as 0 or null where appropriate.
This is expected behavior and reflects the actual structure of Hacker News.
Why Use This Actor
Hacker News is mostly server-rendered HTML, which makes it a strong fit for a Cheerio-based scraper.
Benefits of this actor include:
- Faster execution than browser-based scrapers
- Lower runtime and compute cost
- Clean and structured output
- Reliable extraction from major Hacker News sections
- Good fit for automation, trend tracking, research, and content workflows
Best Use Cases
This actor is useful for:
- Tracking trending Hacker News posts
- Monitoring top stories by section
- Startup and tech news aggregation
- Research and content discovery workflows
- Lightweight automation and data collection pipelines
Technical Approach
This actor is built with:
- Apify
- Crawlee CheerioCrawler
- Plain HTML parsing
- Automatic pagination handling
Because it does not use Playwright or Puppeteer, it is more efficient for Hacker News than a full-browser solution.
Scope
This actor is designed specifically for Hacker News.
It is not a universal news scraper and is not intended for arbitrary websites with different HTML structures. If you need broad website text extraction, a generic content scraper is a better fit. This actor is purpose-built for clean Hacker News post extraction.
Summary
If you need a simple, reliable, and cost-effective Hacker News scraper for Apify, this actor provides a clean structured output with minimal input and efficient HTML-only crawling.