Hacker News Scraper avatar

Hacker News Scraper

Pricing

from $2.00 / 1,000 results

Go to Apify Store
Hacker News Scraper

Hacker News Scraper

Scrape Hacker News stories, comments, and user profiles via official Firebase API. Get top, new, best, ask, show stories with scores, comments, and author data.

Pricing

from $2.00 / 1,000 results

Rating

0.0

(0)

Developer

cloud9

cloud9

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

8 hours ago

Last modified

Categories

Share

Hacker News Story Scraper

Extract tech stories, jobs, and discussions from Hacker News using the official Firebase API. Get top/best/new stories, Ask HN, Show HN, and job postings with full metadata.

Features

  • Official API-based - Zero blocking, 100% reliability (uses hacker-news.firebaseio.com)
  • 6 story types - Top, Best, New, Ask HN, Show HN, Jobs
  • Score filtering - Filter by minimum upvotes
  • Keyword search - Search in titles (case-insensitive)
  • Full metadata - Title, URL, score, author, comments, timestamp, HN link

Use Cases

  • Tech trend monitoring - Track trending technologies and discussions
  • Startup/product research - Discover new products and startup launches
  • Competitive intelligence - Monitor competitor mentions and discussions
  • Content curation - Find quality content for newsletters/social media
  • Recruitment - Browse job postings from tech companies
  • Market research - Analyze what the tech community is interested in

Input Parameters

FieldTypeRequiredDefaultDescription
storyTypestringYes"top"Type of stories: top, best, new, ask, show, job
maxResultsnumberNo50Maximum stories to extract (1-500)
minScorenumberNo-Only include stories with this many upvotes or more
keywordstringNo-Only include stories with this keyword in title

Story Types

  • Top Stories - Currently trending stories
  • Best Stories - Best stories based on HN algorithm
  • New Stories - Most recently submitted stories
  • Ask HN - Questions and discussions
  • Show HN - Project/product showcases
  • Jobs - Job postings

Output Format

Each story includes:

{
"id": 39631123,
"title": "Show HN: I built a tool to analyze Hacker News trends",
"url": "https://example.com/hn-analyzer",
"score": 342,
"author": "techfounder",
"commentCount": 87,
"postedAt": "2024-02-12T10:30:00.000Z",
"type": "story",
"hnUrl": "https://news.ycombinator.com/item?id=39631123",
"scrapedAt": "2024-02-12T15:45:00.000Z"
}

Field Descriptions

  • id - Unique HN story ID
  • title - Story title
  • url - External URL (null for Ask HN/text posts)
  • score - Number of upvotes
  • author - HN username of submitter
  • commentCount - Number of comments/discussions
  • postedAt - Submission timestamp (ISO 8601)
  • type - Story type (story, job, poll, etc.)
  • hnUrl - Direct link to HN discussion page
  • scrapedAt - Timestamp when data was extracted

Example Usage

{
"storyType": "top",
"maxResults": 30,
"minScore": 50,
"keyword": "AI"
}

Recent Show HN projects

{
"storyType": "show",
"maxResults": 100,
"minScore": 10
}

Job postings from YC companies

{
"storyType": "job",
"maxResults": 50,
"keyword": "YC"
}

Best Ask HN questions

{
"storyType": "ask",
"maxResults": 25,
"minScore": 100
}

Pricing

Approximately $2.50 per 1,000 stories (based on compute units)

Cost Estimation

StoriesApprox. CostDuration
50$0.12~30 seconds
100$0.25~1 minute
500$1.25~5 minutes

Costs include API calls and rate limiting (0.5s between requests)

Tips & Best Practices

Filtering Strategy

If you need 50 stories with specific filters (minScore/keyword):

  • Set maxResults higher (100-150) to account for filtered items
  • The actor fetches up to 2x maxResults to ensure enough matches

Story Type Selection

  • Top - Most balanced view of current trending content
  • Best - Highest quality stories (better signal-to-noise)
  • New - Real-time monitoring, catch stories early
  • Ask - Community discussions, Q&A, career advice
  • Show - New product launches, side projects
  • Job - Tech job opportunities, mostly from startups

Rate Limiting

  • Actor respects HN API with 0.5s delay between requests
  • 50 stories = ~30 seconds
  • 500 stories = ~5 minutes
  • No risk of being blocked (official API)

Data Freshness

  • Stories are fetched in real-time from HN API
  • Top/Best/New lists update frequently (every few minutes)
  • Job postings update less frequently

Keyword Matching

  • Case-insensitive search
  • Matches anywhere in title
  • Examples: "AI", "LLM", "YC", "startup", "open source"
  • For multiple keywords, run separate actors and merge results

Technical Details

API Endpoints Used

  • Story IDs: https://hacker-news.firebaseio.com/v0/{type}stories.json
  • Story details: https://hacker-news.firebaseio.com/v0/item/{id}.json

Rate Limiting

  • 0.5 second delay between story detail requests
  • Public API, no authentication required
  • No IP blocking or rate limits

Error Handling

  • Continues on individual story fetch failures
  • Logs warnings for failed requests
  • Returns all successfully fetched stories

Data Quality

  • All data comes directly from HN official API
  • No web scraping, no parsing errors
  • 100% reliability and accuracy

Common Use Cases

1. Startup Trend Analysis

Track what startups are launching and getting traction:

{
"storyType": "show",
"maxResults": 200,
"minScore": 20
}

2. AI/ML News Monitoring

Stay updated on AI developments:

{
"storyType": "best",
"maxResults": 100,
"keyword": "AI"
}

3. Job Board Scraping

Build a job aggregator:

{
"storyType": "job",
"maxResults": 500
}

4. Content Curation

Find high-quality content for newsletters:

{
"storyType": "best",
"maxResults": 50,
"minScore": 100
}

Limitations

  • Maximum 500 stories per run (API limitation)
  • Keyword search is simple substring match (not full-text search)
  • Rate limited to ~120 stories/minute (to respect HN API)
  • No access to comment content (only comment counts)

Support

For issues or feature requests, please contact the actor maintainer.

License

This actor is provided as-is for use on the Apify platform.