Pricing

from $8.00 / 1,000 results

Y Combinator News Scraper

Get the latest news from the Y Combinator Hacker News page. The output fields are: title, score, author, timing, discussion link, and body. Only saves rows when the article text comes through. Pick 20–200 stories (default 100). Export CSV or JSON.

Pricing

from $8.00 / 1,000 results

Rating

0.0

(0)

Developer

Marco Rodrigues

Actor stats

Bookmarked

Total users

Monthly active users

9 days ago

Last modified

🚀 Y Combinator News Scraper

Want the latest submissions from Y Combinator Hacker News together with the actual article text from each linked source? This actor does both in one run.

It always starts from Y Combinator’s HN “newest” feed at news.ycombinator.com/newest. There it reads titles, scores, authors, and each story’s outbound URL (GitHub, blogs, newspapers, another HN item page—whatever the submitter linked). Then it opens those destination sites in a browser and extracts main body text there. So the corpus is not “only HN HTML”: metadata comes from the YC-run listing; content depends on each submission’s source.

Y Combinator Hacker News

💡 Perfect for...

Researchers & analysts: Track what’s being submitted and pull readable text from the original publisher or repo page.
Newsletters & dashboards: Combine HN metadata (points, author, hn_discuss_url) with article excerpts for digests.
📚 RAG systems: Index title, news_link, and content so answers can cite both the HN context and what the linked page actually says.

✨ Why you'll love this scraper

🧹 Clean saves: Rows are pushed to the dataset only when non-empty body text was extracted—blocked or empty pages are skipped, not stored as hollow rows.
👤 Structured HN fields: Every saved item includes ids, title, outbound link, site label, score, author, timestamps, discussion URL, plus content.

📦 What's inside the data?

For every story that yields extractable text, you will get:

HN listing: id, title, news_link, site_domain, points, author, posted_at_iso, posted_at_human, hn_discuss_url
From the linked source: content (plain article text from the destination page when extraction succeeds)

🚀 Quick start

Decide how many stories you want (max_news). The actor collects that many unique items from /newest (using More if needed).
Start the actor on Apify—no listing URL to paste; the feed URL is fixed.
Export the default dataset as CSV, Excel, or JSON when the run finishes.

Tech details for developers 🧑‍💻

Input Example:

{
  "max_news": 100
}

Output Example:

{
  "id": "47824343",
  "title": "HTTP11Probe – Probe web frameworks for compliance",
  "news_link": "https://www.http-probe.com/",
  "site_domain": "http-probe.com",
  "points": 1,
  "author": "MDA2AV",
  "posted_at_iso": "2026-04-19T13:55:47",
  "posted_at_human": "1 minute ago",
  "hn_discuss_url": "https://news.ycombinator.com/item?id=47824343",
  "content": "An open testing platform that probes HTTP/1.1 servers against RFC 9110/9112 requirements, smuggling vectors, and malformed input handling. Add your framework, get compliance results automatically.\nHttp11Probe sends a suite of crafted HTTP requests to each server and checks whether the response matches the exact expected behavior from the RFCs. Every server is tested identically, producing a side-by-side compliance comparison.\nHttp11Probe is open source and built for contributions. Add your HTTP server to the leaderboard, or write new test cases to expand coverage.\nEvery new framework added makes the comparison more useful for the entire community, and every new test strengthens the compliance bar for all servers on the platform. If you’ve found an edge case that isn’t covered, or you maintain a framework that isn’t listed yet, your contribution directly improves HTTP security and interoperability for everyone."
}

Parameters:

Parameter	Type	Required	Description
`max_news`	integer	No	Target number of unique stories from `/newest` (via More), then opened for content. Default 100, min 20, max 200 (see `.actor/input_schema.json`).

Stack: Python, Apify SDK, Crawlee PlaywrightCrawler, Playwright, trafilatura. Article requests ignore 401 / 403 / 429 as session-killers so the handler can still attempt extraction; empty text still means no dataset row.

Local run: From this actor directory, install dependencies and run playwright install, then use apify run so input.json is applied, or wire input the way your environment expects for python -m src.

Y Combinator Scraper

scraped/y-combinator-scraper

Scrape companies from Y Combinator

scraped

5.0

Hacker News Live Feed

desmond-dev/hacker-news-tech-trends

Real-time top stories from Hacker News (Y Combinator). Fetches title, URL, score, and comments. Perfect for tracking tech trends, AI news, and startup buzz.

Desmond Chigariro

Y Combinator Jobs Scraper

seemuapps/simple-yc-jobs-scraper

Scrape job listings from Y Combinator startups by role and location — title, company, batch, salary, location, and apply link.

Andrew

Hacker News Data Scraper

epctex/hackernews-scraper

Extract Y Combinator's Hacker News based on any search criteria. Crawl the front page, Show HN, Ask HN, news, job listings, and historical data. Get links, titles, comments, ratings, and more!

epctex

169

5.0

🎉 Y Combinator Founders

prog-party/y-combinator-founders

This Y Combinator Founder Actor retrieves data from Y Combinator, allowing to filter, and returns a list of founders as a Dataset.

Prog Party

Hacker News Data Scraper

shahidirfan/hacker-news-data-scraper

Unlock the pulse of the tech world by scraping Hacker News effortlessly. Extract top stories, comments, and jobs from Y Combinator's platform. Perfect for market research, sentiment analysis, and staying ahead of startup trends with fast, structured data.

Shahid Irfan

5.0

Y Combinator Scraper with Founders & Emails

fatihtahta/y-combinator-directory-scraper

Scrape the Y Combinator directory and get rich company profiles with socials, founder details + emails, hiring status/job links, and news mentions. Perfect for lead gen, market mapping, recruiting, and competitor tracking.

Fatih Tahta

168

4.2

Y Combinator Companies Scraper

crawlerbros/y-combinator-scraper

Scrape the full Y Combinator company directory with company profiles, founders, open jobs, batch, industry, status, and social links. HTTP-only, no login required.