Hacker News Scraper
Pricing
$1.00/month + usage
Hacker News Scraper
Scrape Hacker News stories within specified date ranges using this Actor. It handles pagination, timezone adjustments, and delivers structured datasets with all relevant metadata.
Pricing
$1.00/month + usage
Rating
0.0
(0)
Developer

gmgn
Actor stats
0
Bookmarked
1
Total users
1
Monthly active users
10 months ago
Last modified
Categories
Share
Hacker News Story Scraper
This actor scrapes Hacker News stories within a specified date range using the Algolia API. It collects all stories and their metadata, paginating through results efficiently. Data is collected in 8-hour intervals for optimal performance.
Pricing
- Monthly subscription: $1/month
- Pay as you go: Based on compute units used
Features
- Scrapes Hacker News stories between specified dates
- Supports precise datetime ranges with timezone handling
- Handles pagination automatically
- Stores results in a structured dataset
Input
The actor accepts the following input parameters:
{"startDate": "2024-12-31T15:30:00", // Date in ISO 8601 format"endDate": "2025-01-15T09:45:00", // Date in ISO 8601 format"timezone": "Europe/London" // Optional: defaults to America/New_York}
Date Format Options
You can specify dates in two formats:
-
Date only:
{"startDate": "2024-12-31", // Will use 2024-12-31T00:00:00 in specified timezone"endDate": "2025-01-15" // Will use 2025-01-15T23:59:59 in specified timezone} -
Date with time:
{"startDate": "2024-12-31T15:30:00", // Will use exact time in specified timezone"endDate": "2025-01-15T09:45:00"}
All times are interpreted in the America/New_York timezone by default. You can specify a different timezone using the optional timezone parameter with any valid IANA timezone name (e.g., 'Europe/London', 'Asia/Tokyo').
Output
The actor stores the results in a dataset with the following structure for each record:
{"url": "string", // Original URL used for scraping"data": { // Raw data from Algolia API"hits": [ // Array of story items{"title": "string", // Story title"url": "string", // Story URL"author": "string", // Author username"points": number, // Number of upvotes"num_comments": number, // Number of comments"story_id": number, // Unique story ID"created_at_i": number, // Unix timestamp of creation"created_at": "string", // ISO timestamp of creation (e.g., "2024-01-01T16:24:53Z")"updated_at": "string", // ISO timestamp of last update"_tags": string[], // Array of tags (e.g., ["story", "author_username", "story_id"])"children": number[], // Array of child comment IDs"objectID": "string", // Unique object ID"story_text": "string", // Optional: Text content for self posts"_highlightResult": { // Search highlighting information"title": {"value": "string","matchLevel": "string","matchedWords": string[]},"url": {"value": "string","matchLevel": "string","matchedWords": string[]},"author": {"value": "string","matchLevel": "string","matchedWords": string[]}}}],"nbHits": number, // Total number of hits"page": number, // Current page number"nbPages": number, // Total number of pages"hitsPerPage": number, // Number of hits per page"processingTimeMS": number // API processing time},"scrapedAt": "string", // ISO timestamp of when the data was collected"startTime": "string", // Unix timestamp of interval start"endTime": "string", // Unix timestamp of interval end"page": number // Page number in results}
Usage
- Subscribe to the actor in the Apify Store
- Input the desired date range using any supported format
- Optionally specify a timezone
- Run the actor
- Access results in the "Dataset" tab
Example Use Cases
- Content Analysis: Track trending topics and discussions over time
- Research: Analyze historical Hacker News data for patterns
- Monitoring: Keep track of specific topics or companies
- Data Mining: Build datasets for machine learning or analysis
- Time-Sensitive Analysis: Analyze posts during specific time windows (e.g., business hours)
Resource Requirements
- Memory: 2048 MB
- Compute Units: Based on date range and number of results