Hacker News Scraper avatar

Hacker News Scraper

Under maintenance
Try for free

1 day trial then $1.00/month - No credit card required now

Go to Store
This Actor is under maintenance.

This Actor may be unreliable while under maintenance. Would you like to try a similar Actor instead?

See alternative Actors
Hacker News Scraper

Hacker News Scraper

gmgn/hacker-news-scraper
Try for free

1 day trial then $1.00/month - No credit card required now

Scrape Hacker News stories within specified date ranges using this Actor. It handles pagination, timezone adjustments, and delivers structured datasets with all relevant metadata.

Hacker News Story Scraper

This actor scrapes Hacker News stories within a specified date range using the Algolia API. It collects all stories and their metadata, paginating through results efficiently. Data is collected in 8-hour intervals for optimal performance.

Pricing

  • Monthly subscription: $1/month
  • Pay as you go: Based on compute units used

Features

  • Scrapes Hacker News stories between specified dates
  • Supports precise datetime ranges with timezone handling
  • Handles pagination automatically
  • Stores results in a structured dataset

Input

The actor accepts the following input parameters:

1{
2    "startDate": "2024-12-31T15:30:00",  // Date in ISO 8601 format
3    "endDate": "2025-01-15T09:45:00",    // Date in ISO 8601 format
4    "timezone": "Europe/London"           // Optional: defaults to America/New_York
5}

Date Format Options

You can specify dates in two formats:

  1. Date only:

    1{
    2  "startDate": "2024-12-31",  // Will use 2024-12-31T00:00:00 in specified timezone
    3  "endDate": "2025-01-15"     // Will use 2025-01-15T23:59:59 in specified timezone
    4}
  2. Date with time:

    1{
    2  "startDate": "2024-12-31T15:30:00",  // Will use exact time in specified timezone
    3  "endDate": "2025-01-15T09:45:00"
    4}

All times are interpreted in the America/New_York timezone by default. You can specify a different timezone using the optional timezone parameter with any valid IANA timezone name (e.g., 'Europe/London', 'Asia/Tokyo').

Output

The actor stores the results in a dataset with the following structure for each record:

1{
2    "url": "string",            // Original URL used for scraping
3    "data": {                   // Raw data from Algolia API
4        "hits": [               // Array of story items
5            {
6                "title": "string",          // Story title
7                "url": "string",            // Story URL
8                "author": "string",         // Author username
9                "points": number,           // Number of upvotes
10                "num_comments": number,     // Number of comments
11                "story_id": number,         // Unique story ID
12                "created_at_i": number,     // Unix timestamp of creation
13                "created_at": "string",     // ISO timestamp of creation (e.g., "2024-01-01T16:24:53Z")
14                "updated_at": "string",     // ISO timestamp of last update
15                "_tags": string[],          // Array of tags (e.g., ["story", "author_username", "story_id"])
16                "children": number[],       // Array of child comment IDs
17                "objectID": "string",       // Unique object ID
18                "story_text": "string",     // Optional: Text content for self posts
19                "_highlightResult": {       // Search highlighting information
20                    "title": {
21                        "value": "string",
22                        "matchLevel": "string",
23                        "matchedWords": string[]
24                    },
25                    "url": {
26                        "value": "string",
27                        "matchLevel": "string",
28                        "matchedWords": string[]
29                    },
30                    "author": {
31                        "value": "string",
32                        "matchLevel": "string",
33                        "matchedWords": string[]
34                    }
35                }
36            }
37        ],
38        "nbHits": number,       // Total number of hits
39        "page": number,         // Current page number
40        "nbPages": number,      // Total number of pages
41        "hitsPerPage": number,  // Number of hits per page
42        "processingTimeMS": number  // API processing time
43    },
44    "scrapedAt": "string",     // ISO timestamp of when the data was collected
45    "startTime": "string",     // Unix timestamp of interval start
46    "endTime": "string",       // Unix timestamp of interval end
47    "page": number            // Page number in results
48}

Usage

  1. Subscribe to the actor in the Apify Store
  2. Input the desired date range using any supported format
  3. Optionally specify a timezone
  4. Run the actor
  5. Access results in the "Dataset" tab

Example Use Cases

  1. Content Analysis: Track trending topics and discussions over time
  2. Research: Analyze historical Hacker News data for patterns
  3. Monitoring: Keep track of specific topics or companies
  4. Data Mining: Build datasets for machine learning or analysis
  5. Time-Sensitive Analysis: Analyze posts during specific time windows (e.g., business hours)

Resource Requirements

  • Memory: 2048 MB
  • Compute Units: Based on date range and number of results
Developer
Maintained by Community

Actor Metrics

  • 1 monthly user

  • 0 No stars yet

  • Created in Jan 2025

  • Modified 3 days ago