Hacker News Scraper avatar
Hacker News Scraper

Deprecated

Pricing

$1.00/month + usage

Go to Store
Hacker News Scraper

Hacker News Scraper

Deprecated

Developed by

gmgn

gmgn

Maintained by Community

Scrape Hacker News stories within specified date ranges using this Actor. It handles pagination, timezone adjustments, and delivers structured datasets with all relevant metadata.

0.0 (0)

Pricing

$1.00/month + usage

0

Total users

1

Monthly users

1

Last modified

5 months ago

Hacker News Story Scraper

This actor scrapes Hacker News stories within a specified date range using the Algolia API. It collects all stories and their metadata, paginating through results efficiently. Data is collected in 8-hour intervals for optimal performance.

Pricing

  • Monthly subscription: $1/month
  • Pay as you go: Based on compute units used

Features

  • Scrapes Hacker News stories between specified dates
  • Supports precise datetime ranges with timezone handling
  • Handles pagination automatically
  • Stores results in a structured dataset

Input

The actor accepts the following input parameters:

{
"startDate": "2024-12-31T15:30:00", // Date in ISO 8601 format
"endDate": "2025-01-15T09:45:00", // Date in ISO 8601 format
"timezone": "Europe/London" // Optional: defaults to America/New_York
}

Date Format Options

You can specify dates in two formats:

  1. Date only:

    {
    "startDate": "2024-12-31", // Will use 2024-12-31T00:00:00 in specified timezone
    "endDate": "2025-01-15" // Will use 2025-01-15T23:59:59 in specified timezone
    }
  2. Date with time:

    {
    "startDate": "2024-12-31T15:30:00", // Will use exact time in specified timezone
    "endDate": "2025-01-15T09:45:00"
    }

All times are interpreted in the America/New_York timezone by default. You can specify a different timezone using the optional timezone parameter with any valid IANA timezone name (e.g., 'Europe/London', 'Asia/Tokyo').

Output

The actor stores the results in a dataset with the following structure for each record:

{
"url": "string", // Original URL used for scraping
"data": { // Raw data from Algolia API
"hits": [ // Array of story items
{
"title": "string", // Story title
"url": "string", // Story URL
"author": "string", // Author username
"points": number, // Number of upvotes
"num_comments": number, // Number of comments
"story_id": number, // Unique story ID
"created_at_i": number, // Unix timestamp of creation
"created_at": "string", // ISO timestamp of creation (e.g., "2024-01-01T16:24:53Z")
"updated_at": "string", // ISO timestamp of last update
"_tags": string[], // Array of tags (e.g., ["story", "author_username", "story_id"])
"children": number[], // Array of child comment IDs
"objectID": "string", // Unique object ID
"story_text": "string", // Optional: Text content for self posts
"_highlightResult": { // Search highlighting information
"title": {
"value": "string",
"matchLevel": "string",
"matchedWords": string[]
},
"url": {
"value": "string",
"matchLevel": "string",
"matchedWords": string[]
},
"author": {
"value": "string",
"matchLevel": "string",
"matchedWords": string[]
}
}
}
],
"nbHits": number, // Total number of hits
"page": number, // Current page number
"nbPages": number, // Total number of pages
"hitsPerPage": number, // Number of hits per page
"processingTimeMS": number // API processing time
},
"scrapedAt": "string", // ISO timestamp of when the data was collected
"startTime": "string", // Unix timestamp of interval start
"endTime": "string", // Unix timestamp of interval end
"page": number // Page number in results
}

Usage

  1. Subscribe to the actor in the Apify Store
  2. Input the desired date range using any supported format
  3. Optionally specify a timezone
  4. Run the actor
  5. Access results in the "Dataset" tab

Example Use Cases

  1. Content Analysis: Track trending topics and discussions over time
  2. Research: Analyze historical Hacker News data for patterns
  3. Monitoring: Keep track of specific topics or companies
  4. Data Mining: Build datasets for machine learning or analysis
  5. Time-Sensitive Analysis: Analyze posts during specific time windows (e.g., business hours)

Resource Requirements

  • Memory: 2048 MB
  • Compute Units: Based on date range and number of results