
Hacker News Scraper
Deprecated
Pricing
$1.00/month + usage

Hacker News Scraper
Deprecated
Scrape Hacker News stories within specified date ranges using this Actor. It handles pagination, timezone adjustments, and delivers structured datasets with all relevant metadata.
0.0 (0)
Pricing
$1.00/month + usage
0
Total users
1
Monthly users
1
Last modified
5 months ago
Hacker News Story Scraper
This actor scrapes Hacker News stories within a specified date range using the Algolia API. It collects all stories and their metadata, paginating through results efficiently. Data is collected in 8-hour intervals for optimal performance.
Pricing
- Monthly subscription: $1/month
- Pay as you go: Based on compute units used
Features
- Scrapes Hacker News stories between specified dates
- Supports precise datetime ranges with timezone handling
- Handles pagination automatically
- Stores results in a structured dataset
Input
The actor accepts the following input parameters:
{"startDate": "2024-12-31T15:30:00", // Date in ISO 8601 format"endDate": "2025-01-15T09:45:00", // Date in ISO 8601 format"timezone": "Europe/London" // Optional: defaults to America/New_York}
Date Format Options
You can specify dates in two formats:
-
Date only:
{"startDate": "2024-12-31", // Will use 2024-12-31T00:00:00 in specified timezone"endDate": "2025-01-15" // Will use 2025-01-15T23:59:59 in specified timezone} -
Date with time:
{"startDate": "2024-12-31T15:30:00", // Will use exact time in specified timezone"endDate": "2025-01-15T09:45:00"}
All times are interpreted in the America/New_York timezone by default. You can specify a different timezone using the optional timezone
parameter with any valid IANA timezone name (e.g., 'Europe/London', 'Asia/Tokyo').
Output
The actor stores the results in a dataset with the following structure for each record:
{"url": "string", // Original URL used for scraping"data": { // Raw data from Algolia API"hits": [ // Array of story items{"title": "string", // Story title"url": "string", // Story URL"author": "string", // Author username"points": number, // Number of upvotes"num_comments": number, // Number of comments"story_id": number, // Unique story ID"created_at_i": number, // Unix timestamp of creation"created_at": "string", // ISO timestamp of creation (e.g., "2024-01-01T16:24:53Z")"updated_at": "string", // ISO timestamp of last update"_tags": string[], // Array of tags (e.g., ["story", "author_username", "story_id"])"children": number[], // Array of child comment IDs"objectID": "string", // Unique object ID"story_text": "string", // Optional: Text content for self posts"_highlightResult": { // Search highlighting information"title": {"value": "string","matchLevel": "string","matchedWords": string[]},"url": {"value": "string","matchLevel": "string","matchedWords": string[]},"author": {"value": "string","matchLevel": "string","matchedWords": string[]}}}],"nbHits": number, // Total number of hits"page": number, // Current page number"nbPages": number, // Total number of pages"hitsPerPage": number, // Number of hits per page"processingTimeMS": number // API processing time},"scrapedAt": "string", // ISO timestamp of when the data was collected"startTime": "string", // Unix timestamp of interval start"endTime": "string", // Unix timestamp of interval end"page": number // Page number in results}
Usage
- Subscribe to the actor in the Apify Store
- Input the desired date range using any supported format
- Optionally specify a timezone
- Run the actor
- Access results in the "Dataset" tab
Example Use Cases
- Content Analysis: Track trending topics and discussions over time
- Research: Analyze historical Hacker News data for patterns
- Monitoring: Keep track of specific topics or companies
- Data Mining: Build datasets for machine learning or analysis
- Time-Sensitive Analysis: Analyze posts during specific time windows (e.g., business hours)
Resource Requirements
- Memory: 2048 MB
- Compute Units: Based on date range and number of results