Hacker News Scraper & API - Export Stories, Comments, Data avatar
Hacker News Scraper & API - Export Stories, Comments, Data

Pricing

$8.00/month + usage

Go to Apify Store
Hacker News Scraper & API - Export Stories, Comments, Data

Hacker News Scraper & API - Export Stories, Comments, Data

Extract top stories, trending posts, points, comments & authors from Hacker News front page. Real-time data export to JSON/CSV. Monitor tech trends, analyze viral content, track HN activity. Fast Playwright scraper.

Pricing

$8.00/month + usage

Rating

0.0

(0)

Developer

Brennan Crawford

Brennan Crawford

Maintained by Community

Actor stats

0

Bookmarked

3

Total users

1

Monthly active users

4 days ago

Last modified

Share

Hacker News Scraper for Apify

A production-ready Apify actor that scrapes stories from Hacker News front page using Playwright.

🚀 Features

  • Scrapes Hacker News front page stories
  • Extracts comprehensive story data:
    • Title and URL
    • Points (upvotes)
    • Author username
    • Number of comments
    • Time posted
    • Story rank
    • Hacker News discussion URL
  • Configurable number of stories to scrape
  • Option to include/exclude job posts
  • Built with Playwright for reliable scraping
  • Production-ready for Apify platform

📁 Project Structure

hackernews-scraper/
├── .actor/
│ ├── actor.json # Actor metadata and configuration
│ └── dataset_schema.json # Output data schema
├── apify_actor.py # Main actor entry point
├── hackernews_scraper.py # Core scraper implementation
├── Dockerfile # Docker configuration for Apify
├── requirements.txt # Python dependencies
├── INPUT_SCHEMA.json # Input configuration schema
└── README.md # This file

🔧 Local Testing

Prerequisites

  • Python 3.11+
  • pip

Installation

  1. Install dependencies:
$pip install -r requirements.txt
  1. Install Playwright browsers:
$playwright install chromium
  1. Test the scraper locally:
$python hackernews_scraper.py

🌐 Deploy to Apify

Prerequisites

  1. Create an Apify account
  2. Install Apify CLI: npm install -g apify-cli
  3. Login: apify login

Deployment Steps

  1. Navigate to project directory:
$cd hackernews-scraper
  1. Deploy to Apify:
$apify push
  1. Access your actor at Apify Console

Running on Apify

  1. Navigate to your actor in the Apify Console
  2. Click "Run"
  3. Configure input options (optional)
  4. Click "Start" to run the actor
  5. View results in the "Dataset" tab

⚙️ Input Configuration

FieldTypeDefaultDescription
maxStoriesinteger30Maximum number of stories to scrape (1-100)
includeJobPostsbooleanfalseInclude "Who is hiring?" job posts

Example Input

{
"maxStories": 30,
"includeJobPosts": false
}

📊 Output Format

Each story is returned as a JSON object with the following structure:

{
"rank": 1,
"title": "Show HN: I built a tool for...",
"url": "https://example.com/article",
"points": 342,
"author": "username",
"comments": 127,
"timeAgo": "2024-01-15T10:30:00.000Z",
"hackerNewsUrl": "https://news.ycombinator.com/item?id=12345678"
}

Output Fields

FieldTypeDescription
ranknumberStory position on front page
titlestringStory title
urlstringLink to the story/article
pointsnumberNumber of upvotes
authorstringUsername who posted the story
commentsnumberNumber of comments
timeAgostringTimestamp when story was posted
hackerNewsUrlstringURL to Hacker News discussion

🛠️ Built With

  • Python 3.11 - Programming language
  • Playwright - Browser automation
  • Apify SDK - Actor framework
  • Following Apify best practices and patterns

📝 Use Cases

  • Monitor trending tech stories
  • Track specific topics on HN
  • Build custom HN readers/aggregators
  • Research what content performs well
  • Create HN analytics dashboards

🔒 Rate Limiting

The scraper is designed to be respectful of Hacker News:

  • Single page load per run
  • No aggressive pagination
  • Configurable limits on stories scraped

📄 License

This actor is provided as-is for use on the Apify platform.

🤝 Support

For issues or questions:


Ready to deploy in under 10 minutes! 🎉