Hacker News Data Scraper avatar
Hacker News Data Scraper

Pricing

Pay per usage

Go to Apify Store
Hacker News Data Scraper

Hacker News Data Scraper

Unlock the pulse of the tech world by scraping Hacker News effortlessly. Extract top stories, comments, and jobs from Y Combinator's platform. Perfect for market research, sentiment analysis, and staying ahead of startup trends with fast, structured data.

Pricing

Pay per usage

Rating

0.0

(0)

Developer

Shahid Irfan

Shahid Irfan

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

3 days ago

Last modified

Share

Extract comprehensive data from Hacker News using the official API. Collect stories, comments, and job postings from different categories including top stories, new stories, best stories, Ask HN, Show HN, and job listings. Perfect for monitoring trends, analyzing community engagement, and building datasets for research.

Features

  • Complete Story Data — Extract titles, scores, comments, and metadata
  • Multiple Categories — Collect from top, new, best, ask, show, and job stories
  • Fast API Extraction — Direct access to official Hacker News data
  • Structured JSON Output — Consistent format for all data types
  • Rate Limit Respect — Built-in delays for responsible data collection

Use Cases

Community Research

Analyze trending topics and user engagement patterns on Hacker News. Understand what content resonates with the tech community and track discussion trends over time.

Job Market Intelligence

Monitor startup job postings and career opportunities. Track hiring trends across different tech companies and identify emerging roles in the industry.

Content Analysis

Build comprehensive datasets for machine learning and natural language processing. Study user behavior, content patterns, and community dynamics.

News Monitoring

Stay updated on the latest tech news and discussions. Automatically collect and analyze stories that matter to your research or business.


Input Parameters

ParameterTypeRequiredDefaultDescription
storyTypeStringNotopstoriesType of stories to collect: topstories, newstories, beststories, askstories, showstories, jobstories
results_wantedIntegerNo20Maximum number of stories to collect (1-500)
proxyConfigurationObjectNo{"useApifyProxy": false}Proxy settings (optional for HN API)

Output Data

Each item in the dataset contains:

FieldTypeDescription
idIntegerUnique story ID
typeStringItem type (story, comment, job, etc.)
titleStringStory title
byStringAuthor username
scoreIntegerStory score/upvotes
descendantsIntegerNumber of comments
timeIntegerUnix timestamp
timestampStringISO 8601 timestamp
urlStringOriginal story URL (if external)
textStringStory text content (HTML format)
text_cleanStringStory text content (clean text format)
hn_urlStringHacker News discussion URL
kidsArrayComment IDs
deletedBooleanWhether the item is deleted
deadBooleanWhether the item is dead
parentIntegerParent item ID (for comments)
pollIntegerAssociated poll ID (for poll options)
partsArrayRelated poll option IDs (for polls)

Usage Examples

Collect Top Stories

Extract the most popular stories from Hacker News:

{
"storyType": "topstories",
"results_wanted": 50
}

Get New Stories

Collect the latest submissions to Hacker News:

{
"storyType": "newstories",
"results_wanted": 30
}

Collect Job Postings

Gather startup job listings from the community:

{
"storyType": "jobstories",
"results_wanted": 100
}

Sample Output

{
"id": 45006801,
"type": "story",
"title": "Show HN: I built a tool to help developers write better commit messages",
"by": "developer123",
"score": 245,
"descendants": 67,
"time": 1735689600,
"timestamp": "2025-01-01T00:00:00.000Z",
"url": "https://github.com/developer123/commit-helper",
"text": "<p>A simple tool that analyzes your commit messages and suggests improvements based on conventional commit standards.</p>",
"text_clean": "A simple tool that analyzes your commit messages and suggests improvements based on conventional commit standards.",
"hn_url": "https://news.ycombinator.com/item?id=45006801",
"kids": [45006802, 45006803, 45006804],
"deleted": false,
"dead": false,
"parent": null,
"poll": null,
"parts": null
}

Tips for Best Results

Choose Story Types Wisely

  • Use topstories for trending content and popular discussions
  • Select newstories for the latest submissions and fresh content
  • Pick jobstories for career opportunities and hiring trends

Optimize Collection Size

  • Start with small numbers (20-50) for testing and exploration
  • Increase to 100-200 for comprehensive data collection
  • Balance between data volume and processing time

Handle Large Datasets

  • Export results to JSON or CSV for analysis
  • Use filtering and sorting in your analysis tools
  • Consider pagination for very large collections

Integrations

Connect your Hacker News data with:

  • Google Sheets — Export for collaborative analysis
  • Airtable — Build searchable story databases
  • Slack — Get notifications for trending stories
  • Make — Create automated content workflows
  • Zapier — Trigger actions based on story data

Export Formats

Download data in multiple formats:

  • JSON — For developers and API integrations
  • CSV — For spreadsheet analysis and reporting
  • Excel — For business intelligence dashboards

Frequently Asked Questions

What's the difference between story types?

topstories are ranked by score and popularity, newstories by recency, beststories by a special algorithm, while askstories, showstories, and jobstories are specific post types.

Can I collect comments along with stories?

The current version collects story metadata. Comments can be fetched separately using the kids array with additional API calls to the Hacker News API.

Is this using the official API?

Yes, this scraper uses the official Hacker News API provided by Y Combinator, ensuring reliable and compliant data collection.

How many stories can I collect?

You can collect up to 500 stories per run. The API provides access to the most recent and popular content.

What if some fields are empty?

Some fields may be empty depending on the story type. For example, job postings may not have external URLs, and some stories may not have text content.


Support

For issues or feature requests, contact support through the Apify Console.

Resources


This scraper uses the official Hacker News API and complies with their terms of service. The API is provided by Y Combinator for public use. Users are responsible for ensuring compliance with applicable laws and using data responsibly.