Hacker News Scraper — Stories, Search & Front Page Data avatar

Hacker News Scraper — Stories, Search & Front Page Data

Pricing

$5.99/month + usage

Go to Apify Store
Hacker News Scraper — Stories, Search & Front Page Data

Hacker News Scraper — Stories, Search & Front Page Data

Scrape Hacker News stories from the front page or by keyword search. Get title, URL, score, author, comments, date, and tags. Sort by popular or newest. Filter by date range. No login, no proxy needed. $5.99/month. 2-hour free trial.

Pricing

$5.99/month + usage

Rating

0.0

(0)

Developer

Scrape Pilot

Scrape Pilot

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

5 days ago

Last modified

Share

🟠 Hacker News Scraper — Stories, Search & Front Page Data

The most complete Hacker News Scraper on Apify. Extract Hacker News stories from the front page, keyword search, or any topic — title, URL, score, author, comment count, post date, story text, and tags. Sort by popularity or date. Filter by date range. No login. No API key. No proxy needed. Instant structured output.


📌 Table of Contents


🔍 What Is This Actor?

Hacker News Scraper is a production-ready Apify actor that extracts structured data from Hacker News — including front page stories, keyword search results, and date-filtered story archives.

Choose between two modes: front page to get the current top Hacker News stories in real time, or search to find stories by any keyword or topic. Every record includes the story title, external URL, Hacker News discussion URL, upvote score, author username, comment count, post date, story text, and content tags — giving you the most complete Hacker News stories dataset available on Apify.

No login. No proxy needed. No API key. Just clean, structured Hacker News data on demand.


🚀 Why Use This Hacker News Scraper?

FeatureThis ActorManual BrowsingHN Algolia UIOther Scrapers
Front page stories — real-time✅ Slow⚠️
Keyword search across all HN⚠️⚠️
Sort by popular or newest✅ Built-in
Date range filter✅ Built-in
Score + comment count⚠️
Story text included⚠️
Tags per story⚠️
Bulk results — up to 1000/run⚠️
No proxy required⚠️
Structured JSON — export-ready⚠️

Bottom line: This Hacker News scraper is the only actor that combines real-time front page extraction, keyword search with date filtering, and sort options — delivering complete Hacker News stories data in a single, export-ready structured run.


🎬 Supported Modes

This Hacker News scraper operates in two modes — selected via the mode input:

ModeWhat It DoesBest For
front_pageExtracts current top Hacker News stories in real timeMonitoring trending tech topics, daily briefings
searchSearches all Hacker News stories by keyword with sort and date filtersResearch, topic tracking, archival

Default mode: front_page. Switch to search and provide a query to find stories on any topic across all of Hacker News history.


🎯 Use Cases

📰 Tech News Monitoring & Daily Briefings

  • Scrape Hacker News stories from the front page daily to monitor what the tech community is discussing
  • Build automated tech news digest pipelines using scheduled front page scraper runs
  • Track trending topics and emerging technologies by monitoring front page story titles over time

🔍 Research & Topic Archiving

  • Search all Hacker News stories by keyword to build topic-specific archives — "LLM", "Rust", "startups"
  • Use date range filters to study how discussion around a technology evolved on Hacker News over time
  • Collect story URLs and scores for academic citation analysis or technology trend research

🤖 AI & NLP Datasets

  • Build training datasets from Hacker News stories titles and text for tech-focused NLP models
  • Collect story titles with upvote scores as engagement-labeled data for headline quality models
  • Scrape story text and comment counts for content popularity prediction research

📊 Community & Engagement Analytics

  • Analyze upvote score distributions across topics and time periods on Hacker News
  • Track author activity by scraping stories filtered by username
  • Study correlation between story type, tags, and engagement metrics

🛠️ Developer & Content Integrations

  • Feed Hacker News stories into your own tech dashboard, Slack bot, or newsletter tool
  • Build a Hacker News trend tracker that monitors score velocity for specific keywords
  • Integrate structured HN story data into knowledge bases, wikis, or content curation platforms

🎓 Academic & Social Computing Research

  • Study information diffusion and community curation on Hacker News over multi-year archives
  • Analyze posting patterns, peak hours, and seasonal trends in tech community discussion
  • Build datasets of tech-topic stories for computational social science research

⚙️ Input Parameters

{
"mode": "front_page",
"query": "",
"sort": "popular",
"tags": "story",
"max_results": 30,
"date_from": null,
"date_to": null,
"proxyConfiguration": {
"useApifyProxy": false
}
}
ParameterTypeDefaultDescription
modestring"front_page"Extraction mode — "front_page" for current top stories, "search" for keyword search
querystring""Search keyword — required when mode is "search". E.g. "AI", "open source", "YC"
sortstring"popular"Sort order for search results — "popular" for highest score first, "date" for newest first
tagsstring"story"Content type filter — "story" for articles, "comment" for comments, "ask_hn" for Ask HN posts
max_resultsinteger30Maximum number of stories to return
date_fromstringnullFilter stories posted after this date — Unix timestamp or ISO date string
date_tostringnullFilter stories posted before this date — Unix timestamp or ISO date string
proxyConfigurationobjectOffOptional proxy config — not required for Hacker News

Tip: For front_page mode, query, sort, and date_from/date_to are ignored — the actor fetches the current live top stories directly. Switch to search mode to use all filters.


📋 Output Fields

Every record from this Hacker News scraper includes:

FieldTypeDescriptionExample
story_idstringHacker News internal story ID"39847293"
titlestringStory headline"Show HN: I built an open-source LLM router"
urlstringExternal article URL"https://github.com/..."
hn_urlstringDirect HN discussion thread URL"https://news.ycombinator.com/item?id=..."
scoreintegerTotal upvote score847
authorstringHN username of the poster"pg"
num_commentsintegerTotal number of comments312
created_atstringPost date and time"2024-03-15T09:30:00.000Z"
story_textstringStory body text for Ask HN / text posts (max 3000 chars)"I spent 6 months building..."
tagsarrayHN content tags["story", "author_pg", "front_page"]
typestringContent type"story"

📦 Example Input & Output

Input — front page:

{
"mode": "front_page",
"max_results": 5
}

Input — keyword search:

{
"mode": "search",
"query": "open source AI",
"sort": "popular",
"max_results": 20
}

Output (one record):

{
"story_id": "39847293",
"title": "Show HN: I built an open-source LLM router in Go",
"url": "https://github.com/user/llm-router",
"hn_url": "https://news.ycombinator.com/item?id=39847293",
"score": 847,
"author": "techfounder",
"num_comments":312,
"created_at": "2024-03-15T09:30:00.000Z",
"story_text": null,
"tags": ["story", "author_techfounder", "front_page"],
"type": "story"
}

💰 Pricing & Free Trial

PlanPriceIncludes
Free Trial$02 hours full access — no credit card required
Monthly$5.99 / monthUnlimited runs, front page + search, all filters

Everything included in every plan:

  • ✅ Real-time front page Hacker News stories
  • ✅ Keyword search across all HN history
  • ✅ Sort by popularity or newest
  • ✅ Date range filter
  • ✅ Content type filter (stories, Ask HN, comments)
  • ✅ Up to 1,000 results per run
  • ✅ No proxy required — works out of the box
  • ✅ JSON + CSV + Excel export from Apify dataset
  • ✅ Scheduled runs for automated daily briefings

Start your 2-hour free trial now — no credit card needed. Click Try for free at the top of this page.


⚡ Performance & Limits

ModeCountEstimated Time
Front page30 stories~30–60 seconds
Front page100 stories~1–2 minutes
Search50 stories~20–40 seconds
Search500 stories~2–4 minutes
  • Results pushed to the Apify dataset in real time as each page is processed
  • Automatic pagination — fetches as many pages as needed to reach max_results
  • No proxy required — Hacker News data is publicly accessible without any IP restrictions
  • Lightweight and fast — no browser, no JavaScript rendering overhead

❓ FAQ

Q: Do I need a proxy to use this Hacker News scraper? A: No. Hacker News data is fully publicly accessible — no proxy, no login, and no API key required. The actor works out of the box for any volume.

Q: What is the difference between front_page and search mode? A: front_page mode fetches the current live top stories from Hacker News in real time — exactly what you see on the HN homepage right now. search mode queries the full HN story archive by keyword, with optional sort order and date range filters.

Q: Can I filter Hacker News stories by date range? A: Yes — in search mode. Set date_from and date_to to Unix timestamps or ISO date strings to restrict results to a specific time window. Date filters have no effect in front_page mode.

Q: What does the tags field contain? A: The tags array contains HN content classification tags — typically including the content type ("story", "ask_hn", "show_hn"), the author tag ("author_username"), and placement tags like "front_page".

Q: Can I scrape Ask HN or Show HN posts specifically? A: Yes. Set tags to "ask_hn" or "show_hn" in search mode to filter results to only those post types.

Q: What is the maximum number of stories I can get per run? A: Up to 1,000 stories per run. Set max_results to any value up to 1,000 and the actor paginates automatically.

Q: Can I schedule this to run daily for a tech news digest? A: Yes. Set up an Apify scheduled task with mode: "front_page" to automatically collect the current top Hacker News stories every day — or every hour.

Q: Can I export to Excel or CSV? A: Yes. All results are pushed to the Apify dataset, which can be exported to JSON, CSV, Excel, and more directly from the Apify Console after each run.


📜 Changelog

v1.0.0 (Current)

  • ✅ Front page mode — real-time top Hacker News stories
  • ✅ Search mode — full HN archive keyword search
  • ✅ Sort by popularity or newest
  • ✅ Date range filter for search mode
  • ✅ Content type filter — stories, Ask HN, Show HN, comments
  • ✅ Full output: title, URL, HN URL, score, author, comments, date, text, tags
  • ✅ Automatic pagination up to max_results
  • ✅ No proxy required
  • ✅ Real-time dataset push as each page is processed

🏷️ Tags

hacker news scraper hacker news stories hacker news hn scraper tech news scraper ycombinator scraper hacker news data hn stories hacker news search tech community data hn front page show hn scraper


This actor accesses publicly visible story data on Hacker News in the same way a regular user browses the platform.

Please note:

  • Use extracted Hacker News stories only for lawful purposes — research, news monitoring, NLP datasets, content curation, and academic study are common legitimate uses
  • Hacker News story content belongs to the original authors — do not republish scraped content without appropriate attribution
  • Respect Hacker News community norms — do not use this tool to spam, manipulate rankings, or harvest user data for targeting
  • The actor developer is not responsible for how extracted data is used

🤝 Support & Feedback

  • Bug report? Contact us via the Apify actor page
  • Feature request? Post in the Apify Community forum
  • Loving it? Please leave a ⭐ review — it helps other users find this actor!

Built with ❤️ on Apify
The most complete Hacker News Scraper — front page, search, date filters, instant output

💰 $5.99/month · 🆓 2-hour free trial · No credit card required