Hacker News Scraper
1 day trial then $1.00/month - No credit card required now
This Actor may be unreliable while under maintenance. Would you like to try a similar Actor instead?
See alternative ActorsHacker News Scraper
1 day trial then $1.00/month - No credit card required now
Scrape Hacker News stories within specified date ranges using this Actor. It handles pagination, timezone adjustments, and delivers structured datasets with all relevant metadata.
Hacker News Story Scraper
This actor scrapes Hacker News stories within a specified date range using the Algolia API. It collects all stories and their metadata, paginating through results efficiently. Data is collected in 8-hour intervals for optimal performance.
Pricing
- Monthly subscription: $1/month
- Pay as you go: Based on compute units used
Features
- Scrapes Hacker News stories between specified dates
- Supports precise datetime ranges with timezone handling
- Handles pagination automatically
- Stores results in a structured dataset
Input
The actor accepts the following input parameters:
1{ 2 "startDate": "2024-12-31T15:30:00", // Date in ISO 8601 format 3 "endDate": "2025-01-15T09:45:00", // Date in ISO 8601 format 4 "timezone": "Europe/London" // Optional: defaults to America/New_York 5}
Date Format Options
You can specify dates in two formats:
-
Date only:
1{ 2 "startDate": "2024-12-31", // Will use 2024-12-31T00:00:00 in specified timezone 3 "endDate": "2025-01-15" // Will use 2025-01-15T23:59:59 in specified timezone 4}
-
Date with time:
1{ 2 "startDate": "2024-12-31T15:30:00", // Will use exact time in specified timezone 3 "endDate": "2025-01-15T09:45:00" 4}
All times are interpreted in the America/New_York timezone by default. You can specify a different timezone using the optional timezone
parameter with any valid IANA timezone name (e.g., 'Europe/London', 'Asia/Tokyo').
Output
The actor stores the results in a dataset with the following structure for each record:
1{ 2 "url": "string", // Original URL used for scraping 3 "data": { // Raw data from Algolia API 4 "hits": [ // Array of story items 5 { 6 "title": "string", // Story title 7 "url": "string", // Story URL 8 "author": "string", // Author username 9 "points": number, // Number of upvotes 10 "num_comments": number, // Number of comments 11 "story_id": number, // Unique story ID 12 "created_at_i": number, // Unix timestamp of creation 13 "created_at": "string", // ISO timestamp of creation (e.g., "2024-01-01T16:24:53Z") 14 "updated_at": "string", // ISO timestamp of last update 15 "_tags": string[], // Array of tags (e.g., ["story", "author_username", "story_id"]) 16 "children": number[], // Array of child comment IDs 17 "objectID": "string", // Unique object ID 18 "story_text": "string", // Optional: Text content for self posts 19 "_highlightResult": { // Search highlighting information 20 "title": { 21 "value": "string", 22 "matchLevel": "string", 23 "matchedWords": string[] 24 }, 25 "url": { 26 "value": "string", 27 "matchLevel": "string", 28 "matchedWords": string[] 29 }, 30 "author": { 31 "value": "string", 32 "matchLevel": "string", 33 "matchedWords": string[] 34 } 35 } 36 } 37 ], 38 "nbHits": number, // Total number of hits 39 "page": number, // Current page number 40 "nbPages": number, // Total number of pages 41 "hitsPerPage": number, // Number of hits per page 42 "processingTimeMS": number // API processing time 43 }, 44 "scrapedAt": "string", // ISO timestamp of when the data was collected 45 "startTime": "string", // Unix timestamp of interval start 46 "endTime": "string", // Unix timestamp of interval end 47 "page": number // Page number in results 48}
Usage
- Subscribe to the actor in the Apify Store
- Input the desired date range using any supported format
- Optionally specify a timezone
- Run the actor
- Access results in the "Dataset" tab
Example Use Cases
- Content Analysis: Track trending topics and discussions over time
- Research: Analyze historical Hacker News data for patterns
- Monitoring: Keep track of specific topics or companies
- Data Mining: Build datasets for machine learning or analysis
- Time-Sensitive Analysis: Analyze posts during specific time windows (e.g., business hours)
Resource Requirements
- Memory: 2048 MB
- Compute Units: Based on date range and number of results
Actor Metrics
1 monthly user
-
0 No stars yet
Created in Jan 2025
Modified 3 days ago