Deprecated

Pricing

$1.00/month + usage

See alternative Actors

Go to Apify Store

Hacker News Scraper

Deprecated

See alternative Actors

Developed by

gmgn

Scrape Hacker News stories within specified date ranges using this Actor. It handles pagination, timezone adjustments, and delivers structured datasets with all relevant metadata.

0.0 (0)

Pricing

$1.00/month + usage

Last modified

7 months ago

Developer tools

Automation

Integrations

Hacker News Story Scraper

This actor scrapes Hacker News stories within a specified date range using the Algolia API. It collects all stories and their metadata, paginating through results efficiently. Data is collected in 8-hour intervals for optimal performance.

Pricing

Monthly subscription: $1/month
Pay as you go: Based on compute units used

Features

Scrapes Hacker News stories between specified dates
Supports precise datetime ranges with timezone handling
Handles pagination automatically
Stores results in a structured dataset

Input

The actor accepts the following input parameters:

{
    "startDate": "2024-12-31T15:30:00",  // Date in ISO 8601 format
    "endDate": "2025-01-15T09:45:00",    // Date in ISO 8601 format
    "timezone": "Europe/London"           // Optional: defaults to America/New_York
}

Date Format Options

You can specify dates in two formats:

Date only:

{
  "startDate": "2024-12-31",  // Will use 2024-12-31T00:00:00 in specified timezone
  "endDate": "2025-01-15"     // Will use 2025-01-15T23:59:59 in specified timezone
}

Date with time:

{
  "startDate": "2024-12-31T15:30:00",  // Will use exact time in specified timezone
  "endDate": "2025-01-15T09:45:00"
}

All times are interpreted in the America/New_York timezone by default. You can specify a different timezone using the optional timezone parameter with any valid IANA timezone name (e.g., 'Europe/London', 'Asia/Tokyo').

Output

The actor stores the results in a dataset with the following structure for each record:

{
    "url": "string",            // Original URL used for scraping
    "data": {                   // Raw data from Algolia API
        "hits": [               // Array of story items
            {
                "title": "string",          // Story title
                "url": "string",            // Story URL
                "author": "string",         // Author username
                "points": number,           // Number of upvotes
                "num_comments": number,     // Number of comments
                "story_id": number,         // Unique story ID
                "created_at_i": number,     // Unix timestamp of creation
                "created_at": "string",     // ISO timestamp of creation (e.g., "2024-01-01T16:24:53Z")
                "updated_at": "string",     // ISO timestamp of last update
                "_tags": string[],          // Array of tags (e.g., ["story", "author_username", "story_id"])
                "children": number[],       // Array of child comment IDs
                "objectID": "string",       // Unique object ID
                "story_text": "string",     // Optional: Text content for self posts
                "_highlightResult": {       // Search highlighting information
                    "title": {
                        "value": "string",
                        "matchLevel": "string",
                        "matchedWords": string[]
                    },
                    "url": {
                        "value": "string",
                        "matchLevel": "string",
                        "matchedWords": string[]
                    },
                    "author": {
                        "value": "string",
                        "matchLevel": "string",
                        "matchedWords": string[]
                    }
                }
            }
        ],
        "nbHits": number,       // Total number of hits
        "page": number,         // Current page number
        "nbPages": number,      // Total number of pages
        "hitsPerPage": number,  // Number of hits per page
        "processingTimeMS": number  // API processing time
    },
    "scrapedAt": "string",     // ISO timestamp of when the data was collected
    "startTime": "string",     // Unix timestamp of interval start
    "endTime": "string",       // Unix timestamp of interval end
    "page": number            // Page number in results
}

Usage

Subscribe to the actor in the Apify Store
Input the desired date range using any supported format
Optionally specify a timezone
Run the actor
Access results in the "Dataset" tab

Example Use Cases

Content Analysis: Track trending topics and discussions over time
Research: Analyze historical Hacker News data for patterns
Monitoring: Keep track of specific topics or companies
Data Mining: Build datasets for machine learning or analysis
Time-Sensitive Analysis: Analyze posts during specific time windows (e.g., business hours)

Resource Requirements

Memory: 2048 MB
Compute Units: Based on date range and number of results

On this page

Hacker News Story Scraper

Share Actor:

Hacker News Data Scraper

epctex/hackernews-scraper

Extract Y Combinator's Hacker News based on any search criteria. Crawl the front page, Show HN, Ask HN, news, job listings, and historical data. Get links, titles, comments, ratings, and more!

epctex

111

Hackernews Scraper Pro

red.cars/hackernews-scraper-pro

NO PROXY REQUIRED | Enterprise-Grade Data Extraction | Tech Trend Analysis Extract comprehensive data from Hacker News without API keys or proxy configurations. Perfect for developer relations, competitive intelligence, startup research, and tech trend analysis.

AutomateLab

Hacker News Data Scraper & Activity Monitoring

lucen_data/hacker-news-data-scraper-activity-monitoring

Unofficial Hacker News API to extract data from all Hacker News categories. You can filter posts based on specific keywords, track trends using the unique activity monitoring feature, and receive updates directly in your Slack channel.

Lucen

Hacker News Top Sites Scraper

fearless_sharpener/hacker-news-top-sites-scraper

Unlock even more valuable insights from the popular news website, Hacker News. Our updated Top Sites Scraper extracts titles, scores, links and more, all in one easy-to-use Apify actor. With this powerful tool, you can gather comprehensive data from Hacker News and gain a competitive edge.

Tom

Y Combinator Extractor

jupri/ycombinator

💫 All-In-One YCombinator.com Scraper

cat

125

5.0

PR⭕DUCT HUNT Scraper HD ⭐💯

jupri/producthunt

💫 All-in-one Producthunt.com Scraper

cat

5.0

PR⭕DUCT HUNT Leaderboard Scraper

jupri/producthunt-leaderboard

💫 Scrape ProductHunt Leaderboard

cat

5.0

🔥 Y Combinator Scraper (API)

clearpath/ycombinator-api-scraper

Extract complete Y Combinator ecosystem data - 5000+ companies, 8000+ founders, 3500+ jobs. Perfect for VCs, recruiters, and researchers. Get startup intelligence, funding trends, team data, and job listings. Reliable Python scraper with proxy support. Start at $3.50.

ClearPath

AI Web Scraper - Powered by Crawl4AI

raizen/ai-web-scraper

A blazing-fast AI web scraper powered by Crawl4AI. Perfect for LLMs, AI agents, AI automation, model training, sentiment analysis, and content generation. Supports deep crawling, multiple extraction strategies and flexible output (Markdown/JSON). Seamlessly integrates with Make.com, n8n, and Zapier.

Raizen Technology

199

1.0

Producthunt Scraper

red.cars/producthunt-scraper

🚀 No URLs needed! Extract today's trending products, makers, and data from ProductHunt automatically. Perfect for market research, startup analysis, and tracking product launches.

AutomateLab

Medium Publication Scraper

red.cars/medium-publication-scraper

The ONLY Publication-Focused Medium Analytics Platform - Extract comprehensive data from Medium's top publications and authors. No API key required, instant access to the world's largest professional publishing platform!

AutomateLab