G2 & Capterra Review Scraper
Pricing
from $10.00 / 1,000 results
G2 & Capterra Review Scraper
Scrape B2B software reviews, pros and cons, ratings, and reviewer metadata from G2 and Capterra pages for SaaS competitor research.
Pricing
from $10.00 / 1,000 results
Rating
0.0
(0)
Developer
太郎 山田
Actor stats
0
Bookmarked
7
Total users
3
Monthly active users
4 days ago
Last modified
Categories
Share
G2 & Capterra Review Intelligence API | B2B Review Analytics
Build robust datasets for machine learning models and LLM applications by extracting normalized user feedback from G2 and Capterra review pages. This advanced web scraper is engineered for data teams, AI researchers, and developers who need to scrape vast amounts of unstructured B2B software reviews and transform them into cohesive, structured schema. When you run this extractor, it precisely navigates target websites to pull out critical data points, ensuring your sentiment analysis pipelines are powered by accurate, up-to-date information.
Easily scrape overall product ratings, individual review text, granular pros and cons, and detailed reviewer backgrounds (such as industry, role, and company size). Data engineers use this tool to schedule recurring extraction jobs, automatically ingesting fresh customer sentiment into internal databases, vector stores, or business intelligence tools.
To maintain automated workflows without interruption, this scraper features built-in resilience against layout changes and browser blocks. If a target page alters its structure, the tool generates explicit warnings while gracefully degrading to salvage essential metadata. Whether you need scraped data to train custom AI models, analyze enterprise vendor reputation, or build predictive market trends, this API delivers clean, reliable results from the most trusted B2B software platforms on the web.
Store Quickstart
- Start with 1–5 review URLs and a modest
reviewLimit(25–50). - Mix G2 and Capterra in one run only after you validate the schema on a small sample.
- Use
dryRun: truefor validation-only checks before larger monitoring runs.
Features
- Dual-source support: Accepts both G2 (
g2.com/products/*/reviews) and Capterra (capterra.com/p/*/reviews/) URLs - Automatic source detection: Classifies each URL and routes to the correct parser
- Normalized output: Unified review schema across both platforms — product metadata, ratings, individual reviews with pros/cons, rating breakdowns
- Structural change warnings: Emits explicit warnings when page shapes change or access is blocked, rather than silently returning bad data
- Graceful degradation: Returns partial results (metadata without reviews) when possible, with clear status indicators
Use Cases
| Who | Why |
|---|---|
| B2B product marketers | Benchmark category leaders and review themes across two major software-review sites |
| RevOps / sales enablement | Track competitor proof points, objections, and complaint patterns |
| Analysts | Build one normalized review dataset across G2 and Capterra |
| Agencies | Monitor multiple software products with a reusable schema |
Input
| Field | Type | Default | Description |
|---|---|---|---|
reviewPageUrls | string[] | (required) | G2 or Capterra review page URLs |
reviewLimit | integer | 200 | Max reviews to collect per product (1–5000) |
delivery | string | "dataset" | "dataset" or "webhook" |
webhookUrl | string | Webhook URL for delivery mode | |
dryRun | boolean | false | Validate input without fetching |
Input Examples
Example: Single product on G2
{"products": [{"source": "g2","slug": "salesforce-sales-cloud"}],"maxReviewsPerProduct": 50}
Example: Capterra alternative comparison
{"products": [{"source": "capterra","slug": "hubspot-crm"},{"source": "capterra","slug": "pipedrive"}],"maxReviewsPerProduct": 30}
Example: Multi-source rating delta
{"products": [{"source": "g2","slug": "asana"},{"source": "capterra","slug": "asana"}],"emitDeltas": true}
Output
Each product in the products array contains:
source—"g2"or"capterra"productName,overallRating,totalReviewCount,ratingBreakdownvendorName,categoryName(when available)reviews[]— normalized reviews withtitle,rating,date,author,body,pros,cons,verifiedstatus—"ok","partial","blocked", or"error"fetchError— error details if fetch failed
The meta section reports implementationStatus: "live", "partial", "degraded", or "no_valid_sources".
Output Example
{"source": "g2","status": "ok","sourceUrl": "https://www.g2.com/products/notion/reviews","productName": "Notion","vendorName": "Notion Labs","overallRating": 4.5,"totalReviewCount": 4321,"ratingBreakdown": { "5": 1200, "4": 800, "3": 300, "2": 50, "1": 10 },"reviews": [{"title": "Great tool","rating": 5,"author": "Alice","pros": "Fast setup","cons": "Needs offline mode","verified": true}],"warnings": []}
Parsing Strategy
- JSON-LD (structured data in
<script type="application/ld+json">) — most stable - Embedded app data (e.g.,
__NEXT_DATA__for Capterra) — second layer - Meta tags (
og:title) — product name fallback - HTML pattern matching — last resort for reviews
This layered approach means the actor degrades gracefully as sites change markup.
Local Run
# Edit input.json with your URLs, then:npm start# Dry run (validation only, no network):# Set "dryRun": true in input.jsonnpm start
Tests
$npm test
Limitations
- Reviews are extracted from server-rendered HTML; sites that render reviews only via client-side JavaScript may return partial results
- G2 and Capterra actively protect against automated access — blocked requests are reported honestly
- HTML parsing is inherently fragile; the actor emits warnings when expected patterns are missing
Related Actors
Pair this actor with other flagship intelligence APIs in the same portfolio:
- Trustpilot Review Intelligence API — add broader company reputation and reply-rate signals.
- Google Maps Review Intelligence API — compare software-brand review data with location sentiment when relevant.
- Shopify Store Intelligence API — enrich vendor or competitor research with public storefront and catalog signals.
- YouTube Channel Analytics API — combine review themes with public content cadence and audience-facing proof.
Pricing & Cost Control
Apify Store pricing is usage-based, so total cost mainly follows how many reviewPageUrls you process and how deep you sample reviews. Check the Store pricing card for the current per-event rates.
- Keep
reviewPageUrlsbatches small while validating source mix. - Lower
reviewLimitfor faster exploratory runs. - Use dataset delivery first to inspect
blockedorpartialresults. - Use
dryRun: truebefore scaling to scheduled monitoring.
⭐ Was this helpful?
If this actor saved you time, please leave a ★ rating on Apify Store. It takes 10 seconds, helps other developers discover it, and keeps updates free.
Bug report or feature request? Open an issue on the Issues tab of this actor.