Hacker News Scraper avatar

Hacker News Scraper

Pricing

from $0.65 / run started

Go to Apify Store
Hacker News Scraper

Hacker News Scraper

Reliable Apify actor for scraping public Hacker News sections with HTML-only crawling. Extract rank, title, URL, points, author, age, and comment count in a clean dataset for trend tracking, research, content discovery, and automation.

Pricing

from $0.65 / run started

Rating

0.0

(0)

Developer

Techionik

Techionik

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

3 days ago

Last modified

Share

Hacker News Scraper

A fast, lightweight, and marketplace-ready Apify actor for scraping public Hacker News listings using plain HTML parsing.

This actor is built specifically for Hacker News and uses CheerioCrawler instead of a full browser, making it efficient, low-cost, and reliable for structured data extraction. It collects one clean dataset item per post and supports the main Hacker News sections used for trending stories, discovery, monitoring, and research workflows.

Features

  • Scrapes public Hacker News listing pages
  • Extracts one clean record per post
  • Supports multiple Hacker News sections
  • Automatically paginates until the requested result limit is reached
  • Uses HTML-only crawling for faster and cheaper runs
  • Keeps input simple and user-friendly

Supported Page Types

  • front
  • newest
  • ask
  • show
  • jobs
  • best
  • active
  • classic

Extracted Fields

Each result may include the following fields:

  • pageType
  • rank
  • title
  • url
  • points
  • author
  • age
  • commentsCount

Input

This actor uses a very simple input format:

  • pageType: The Hacker News section to scrape
  • maxResults: The maximum number of posts to extract

Example input:

{ "pageType": "best", "maxResults": 10 }

Output

The actor returns one dataset item per Hacker News post.

Example output:

{ "pageType": "best", "rank": 1, "title": "Ghostty is leaving GitHub", "url": "https://mitchellh.com/writing/ghostty-leaving-github", "points": 3400, "author": "WadeGrimridge", "age": "1 day ago", "commentsCount": 1015 }

Notes About the Data

Some Hacker News sections do not always expose the same metadata.

For example, on the jobs page, fields like points, author, or commentsCount may be missing on the page itself. In such cases, the actor returns default values such as 0 or null where appropriate.

This is expected behavior and reflects the actual structure of Hacker News.

Why Use This Actor

Hacker News is mostly server-rendered HTML, which makes it a strong fit for a Cheerio-based scraper.

Benefits of this actor include:

  • Faster execution than browser-based scrapers
  • Lower runtime and compute cost
  • Clean and structured output
  • Reliable extraction from major Hacker News sections
  • Good fit for automation, trend tracking, research, and content workflows

Best Use Cases

This actor is useful for:

  • Tracking trending Hacker News posts
  • Monitoring top stories by section
  • Startup and tech news aggregation
  • Research and content discovery workflows
  • Lightweight automation and data collection pipelines

Technical Approach

This actor is built with:

  • Apify
  • Crawlee CheerioCrawler
  • Plain HTML parsing
  • Automatic pagination handling

Because it does not use Playwright or Puppeteer, it is more efficient for Hacker News than a full-browser solution.

Scope

This actor is designed specifically for Hacker News.

It is not a universal news scraper and is not intended for arbitrary websites with different HTML structures. If you need broad website text extraction, a generic content scraper is a better fit. This actor is purpose-built for clean Hacker News post extraction.

Summary

If you need a simple, reliable, and cost-effective Hacker News scraper for Apify, this actor provides a clean structured output with minimal input and efficient HTML-only crawling.