G2 & Capterra Review Scraper avatar

G2 & Capterra Review Scraper

Pricing

from $10.00 / 1,000 results

Go to Apify Store
G2 & Capterra Review Scraper

G2 & Capterra Review Scraper

Scrape B2B software reviews, pros and cons, ratings, and reviewer metadata from G2 and Capterra pages for SaaS competitor research.

Pricing

from $10.00 / 1,000 results

Rating

0.0

(0)

Developer

太郎 山田

太郎 山田

Maintained by Community

Actor stats

0

Bookmarked

7

Total users

3

Monthly active users

4 days ago

Last modified

Share

G2 & Capterra Review Intelligence API | B2B Review Analytics

Build robust datasets for machine learning models and LLM applications by extracting normalized user feedback from G2 and Capterra review pages. This advanced web scraper is engineered for data teams, AI researchers, and developers who need to scrape vast amounts of unstructured B2B software reviews and transform them into cohesive, structured schema. When you run this extractor, it precisely navigates target websites to pull out critical data points, ensuring your sentiment analysis pipelines are powered by accurate, up-to-date information.

Easily scrape overall product ratings, individual review text, granular pros and cons, and detailed reviewer backgrounds (such as industry, role, and company size). Data engineers use this tool to schedule recurring extraction jobs, automatically ingesting fresh customer sentiment into internal databases, vector stores, or business intelligence tools.

To maintain automated workflows without interruption, this scraper features built-in resilience against layout changes and browser blocks. If a target page alters its structure, the tool generates explicit warnings while gracefully degrading to salvage essential metadata. Whether you need scraped data to train custom AI models, analyze enterprise vendor reputation, or build predictive market trends, this API delivers clean, reliable results from the most trusted B2B software platforms on the web.

Store Quickstart

  • Start with 1–5 review URLs and a modest reviewLimit (25–50).
  • Mix G2 and Capterra in one run only after you validate the schema on a small sample.
  • Use dryRun: true for validation-only checks before larger monitoring runs.

Features

  • Dual-source support: Accepts both G2 (g2.com/products/*/reviews) and Capterra (capterra.com/p/*/reviews/) URLs
  • Automatic source detection: Classifies each URL and routes to the correct parser
  • Normalized output: Unified review schema across both platforms — product metadata, ratings, individual reviews with pros/cons, rating breakdowns
  • Structural change warnings: Emits explicit warnings when page shapes change or access is blocked, rather than silently returning bad data
  • Graceful degradation: Returns partial results (metadata without reviews) when possible, with clear status indicators

Use Cases

WhoWhy
B2B product marketersBenchmark category leaders and review themes across two major software-review sites
RevOps / sales enablementTrack competitor proof points, objections, and complaint patterns
AnalystsBuild one normalized review dataset across G2 and Capterra
AgenciesMonitor multiple software products with a reusable schema

Input

FieldTypeDefaultDescription
reviewPageUrlsstring[](required)G2 or Capterra review page URLs
reviewLimitinteger200Max reviews to collect per product (1–5000)
deliverystring"dataset""dataset" or "webhook"
webhookUrlstringWebhook URL for delivery mode
dryRunbooleanfalseValidate input without fetching

Input Examples

Example: Single product on G2

{
"products": [
{
"source": "g2",
"slug": "salesforce-sales-cloud"
}
],
"maxReviewsPerProduct": 50
}

Example: Capterra alternative comparison

{
"products": [
{
"source": "capterra",
"slug": "hubspot-crm"
},
{
"source": "capterra",
"slug": "pipedrive"
}
],
"maxReviewsPerProduct": 30
}

Example: Multi-source rating delta

{
"products": [
{
"source": "g2",
"slug": "asana"
},
{
"source": "capterra",
"slug": "asana"
}
],
"emitDeltas": true
}

Output

Each product in the products array contains:

  • source"g2" or "capterra"
  • productName, overallRating, totalReviewCount, ratingBreakdown
  • vendorName, categoryName (when available)
  • reviews[] — normalized reviews with title, rating, date, author, body, pros, cons, verified
  • status"ok", "partial", "blocked", or "error"
  • fetchError — error details if fetch failed

The meta section reports implementationStatus: "live", "partial", "degraded", or "no_valid_sources".

Output Example

{
"source": "g2",
"status": "ok",
"sourceUrl": "https://www.g2.com/products/notion/reviews",
"productName": "Notion",
"vendorName": "Notion Labs",
"overallRating": 4.5,
"totalReviewCount": 4321,
"ratingBreakdown": { "5": 1200, "4": 800, "3": 300, "2": 50, "1": 10 },
"reviews": [
{
"title": "Great tool",
"rating": 5,
"author": "Alice",
"pros": "Fast setup",
"cons": "Needs offline mode",
"verified": true
}
],
"warnings": []
}

Parsing Strategy

  1. JSON-LD (structured data in <script type="application/ld+json">) — most stable
  2. Embedded app data (e.g., __NEXT_DATA__ for Capterra) — second layer
  3. Meta tags (og:title) — product name fallback
  4. HTML pattern matching — last resort for reviews

This layered approach means the actor degrades gracefully as sites change markup.

Local Run

# Edit input.json with your URLs, then:
npm start
# Dry run (validation only, no network):
# Set "dryRun": true in input.json
npm start

Tests

$npm test

Limitations

  • Reviews are extracted from server-rendered HTML; sites that render reviews only via client-side JavaScript may return partial results
  • G2 and Capterra actively protect against automated access — blocked requests are reported honestly
  • HTML parsing is inherently fragile; the actor emits warnings when expected patterns are missing

Pair this actor with other flagship intelligence APIs in the same portfolio:

Pricing & Cost Control

Apify Store pricing is usage-based, so total cost mainly follows how many reviewPageUrls you process and how deep you sample reviews. Check the Store pricing card for the current per-event rates.

  • Keep reviewPageUrls batches small while validating source mix.
  • Lower reviewLimit for faster exploratory runs.
  • Use dataset delivery first to inspect blocked or partial results.
  • Use dryRun: true before scaling to scheduled monitoring.

⭐ Was this helpful?

If this actor saved you time, please leave a ★ rating on Apify Store. It takes 10 seconds, helps other developers discover it, and keeps updates free.

Bug report or feature request? Open an issue on the Issues tab of this actor.