Crunchbase News Scraper avatar

Crunchbase News Scraper

Pricing

from $1.00 / 1,000 results

Go to Apify Store
Crunchbase News Scraper

Crunchbase News Scraper

Extract startup, funding, M&A, and tech news articles from news.crunchbase.com like title, content, author, date, categories, tags, featured image. Uses the public WordPress REST API. No proxy required.

Pricing

from $1.00 / 1,000 results

Rating

5.0

(4)

Developer

Crawler Bros

Crawler Bros

Maintained by Community

Actor stats

7

Bookmarked

2

Total users

1

Monthly active users

4 days ago

Last modified

Share

Extract startup, funding, M&A, and tech news articles from news.crunchbase.com — Crunchbase's editorial news site with 8,500+ articles covering venture capital, AI, startups, IPOs, and more. Returns full content, author, date, categories, tags, and featured image.

Features

  • 16 output fields per article — flat schema with typed defaults (zero nulls)
  • 8,500+ articles indexed — going back years
  • Filter by keyword, category, or date
  • Public WordPress REST API — no authentication, no proxy, no cookies
  • Pagination — walks pages until maxItems reached (100 per page)
  • Pre-resolved category and tag names (not just IDs)

Note on Crunchbase Data

The main crunchbase.com site (organization profiles, funding rounds, search) is gated behind login + Cloudflare and cannot be scraped without authenticated session cookies. This scraper instead targets news.crunchbase.com, which runs WordPress and exposes its content via the public REST API. The news site covers most of the same topics (funding rounds, M&A, IPOs, market analysis) as the gated database, just in editorial article form.

Input

FieldTypeDescription
searchStringFree-text keyword (searches title + content)
categoryStringCategory slug to filter by (e.g., venture, startups, ai, crypto, cybersecurity, ipo, ma, web3, saas). Friendly aliases like artificial-intelligenceai and fintechfintech-ecommerce are also accepted.
afterStringOnly articles published after this date (YYYY-MM-DD or ISO 8601)
maxItemsIntegerMaximum articles to return (default 50, max 1000)

Example Input

{
"search": "Series A",
"category": "artificial-intelligence",
"after": "2026-01-01",
"maxItems": 100
}

Output

Each article has 16 fields. All fields are always present — empty strings, zero, or empty array as typed defaults, never null.

Identity

FieldTypeDescription
idIntegerArticle ID
urlStringFull article URL
slugStringURL slug
titleStringArticle title (HTML stripped)

Content

FieldTypeDescription
excerptStringShort summary (HTML stripped, ~500 chars)
contentStringFull article content (HTML stripped, truncated to 5,000 chars)

Dates

FieldTypeDescription
publishedDateStringPublication date (ISO 8601)
modifiedDateStringLast modified date (ISO 8601)

Author & Taxonomy

FieldTypeDescription
authorIdIntegerPrimary author ID
authorNameStringPrimary author name
categoryNamesArrayCategory names (e.g., ["AI", "Business"])
categoryIdsArrayCategory IDs
tagNamesArrayTag names

Media

FieldTypeDescription
featuredImageUrlStringFeatured image URL
featuredImageIdIntegerFeatured image ID

Metadata

FieldTypeDescription
scrapedAtStringISO 8601 scrape timestamp

FAQ

Q: Do I need a proxy? No. news.crunchbase.com runs WordPress and exposes its REST API publicly at /wp-json/wp/v2/posts. Works directly from datacenter IPs.

Q: How many articles per page? 100 (the WordPress API maximum). The scraper paginates automatically.

Q: How do I find category slugs? Browse to news.crunchbase.com, click any category, and the URL is /category/{slug}/. Verified working slugs (by article volume): venture, startups, business, ai, public, fintech-ecommerce, health-wellness-biotech, cybersecurity, transportation, clean-tech-and-energy, crypto, ma, data, liquidity, ipo, media-entertainment, seed, enterprise, agtech-foodtech, web3, real-estate-property-tech, semiconductors-and-5g, robotics, retail, saas. Friendly aliases (artificial-intelligence, fintech, health, m&a, etc.) are auto-mapped to the canonical slug.

Q: Can I get organization data (logos, funding rounds, etc.)? Not from this scraper. The main crunchbase.com database is gated. This actor scrapes the news site only. Use it when you need editorial commentary, market analysis, and funding-round summaries.

Q: How fresh is the data? news.crunchbase.com publishes new articles daily. The WordPress API serves the live database — new articles appear within minutes of publication.

Use Cases

  • Funding round monitoring — track new VC deals via news coverage
  • Market analysis — extract content from sector reports (AI, fintech, climate)
  • Competitive intelligence — search for mentions of competitors / partners
  • Content syndication — feed Crunchbase News into your own newsroom
  • Trend tracking — aggregate by tag/category to identify rising topics