Google Review Scraper avatar
Google Review Scraper

Pricing

$15.00/month + usage

Go to Store
Google Review Scraper

Google Review Scraper

Developed by

Shahzeb Naveed

Shahzeb Naveed

Maintained by Community

High-performance Google Reviews scraper built with Playwright & Node.js. Extract reviews, ratings, authors & dates from any Google business using place IDs. Optimized for speed with stealth mode, Docker-ready, Apify-compatible. Batch processing, JSON output, minimal delays.

0.0 (0)

Pricing

$15.00/month + usage

0

Total users

2

Monthly users

2

Runs succeeded

>99%

Last modified

2 days ago

A highly optimized Node.js application designed to scrape Google reviews with Playwright, built for compatibility with the Apify platform and Docker deployment. Input-driven with no database dependencies.

Features

  • πŸš€ High Performance: Batch processing with single browser instance
  • ⚑ Speed Optimized: Minimal delays (1-2s navigation, 0.1-0.8s scrolling)
  • πŸ•ΈοΈ Stealth Mode: Advanced anti-detection techniques
  • 🐳 Docker Ready: Full containerization support
  • πŸ”„ Apify Compatible: Ready for Apify platform deployment
  • πŸ“₯ Input Driven: Simply provide place IDs as input
  • πŸ“„ JSON Output: Results saved as structured JSON files
  • 🎯 Precise Parsing: Extracts author, rating, date, and review text
  • πŸ”€ Human-like Behavior: Randomized actions with minimal delays

Performance Optimizations

This scraper is highly optimized for speed while maintaining reliability:

  • Reduced Delays: Navigation delays reduced to 1-2 seconds, scroll delays to 0.3-0.8 seconds
  • Configurable Scrolling: Default 5 scrolls (vs 8-15), customizable via maxScrolls input
  • Single Browser Instance: Reuses browser context across multiple place IDs
  • Minimal Anti-Detection: Just enough randomization to avoid detection without sacrificing speed
  • Fast Parsing: Uses Cheerio for rapid HTML parsing
  • Batch Processing: Processes multiple place IDs in a single session

Prerequisites

  • Node.js (version 18.0.0 or higher)
  • Chrome browser (for local development)
  • Docker & Docker Compose (for containerized deployment)

Quick Start

Local Development

  1. Clone and setup:

    git clone https://github.com/MuhammadShahzeb123/google-review-scraper-2.git
    cd google-review-scraper
    npm install
  2. Install Playwright browsers:

    $npx playwright install chromium
  3. Run with place IDs:

    # Method 1: Command line arguments
    node src/main.js ChIJN1t_tDeuEmsRUsoyG83frY4 ChIJGVtI4by3t4kRr51d_Qm_x58
    # Method 2: Environment variable
    PLACE_IDS="ChIJN1t_tDeuEmsRUsoyG83frY4,ChIJGVtI4by3t4kRr51d_Qm_x58" npm start
    # Method 3: Run example
    node example.js

Docker Deployment

  1. Using Docker Compose:

    $PLACE_IDS="ChIJN1t_tDeuEmsRUsoyG83frY4,ChIJGVtI4by3t4kRr51d_Qm_x58" docker-compose up
  2. Using Docker only:

    docker build -t google-review-scraper .
    docker run --rm \
    -e PLACE_IDS="ChIJN1t_tDeuEmsRUsoyG83frY4,ChIJGVtI4by3t4kRr51d_Qm_x58" \
    -v $(pwd)/output:/app/output \
    google-review-scraper

Apify Platform

  1. Configure via Apify Input UI:

    All settings are now configurable through the Apify input interface:

    • Place IDs: List of Google Place IDs to scrape
    • Max Scrolls: Number of scroll actions (1-20, default: 5)
    • Headless Mode: Run browser in background (default: true)
    • Browser Timeout: Page load timeout in ms (default: 60000)
    • Cleanup HTML: Delete temp files after processing (default: true)
    • Log Level: Logging verbosity (error/warn/info/debug)
  2. Push to Apify:

    apify login
    apify push
  3. Run on Apify:

    $apify run

Configuration

All configuration is done through the Apify Input UI - no environment variables needed!

Local/Standalone Environment Variables (Legacy)

VariableDescriptionDefault
PLACE_IDSComma-separated place IDsExample IDs
HEADLESS_MODEBrowser headless modefalse
BROWSER_TIMEOUTPage timeout (ms)60000
SCROLL_COUNT_MINMin scroll actions8
SCROLL_COUNT_MAXMax scroll actions15

Input Methods

1. Command Line Arguments:

$node src/main.js PLACE_ID_1 PLACE_ID_2 PLACE_ID_3

2. Environment Variable:

export PLACE_IDS="ChIJN1t_tDeuEmsRUsoyG83frY4,ChIJGVtI4by3t4kRr51d_Qm_x58"
npm start

3. Programmatic Usage:

const { scrapeReviews } = require('./src/main');
const placeIds = ['ChIJN1t_tDeuEmsRUsoyG83frY4'];
const reviews = await scrapeReviews(placeIds);

4. Apify Input:

{
"placeIds": [
"ChIJN1t_tDeuEmsRUsoyG83frY4",
"ChIJGVtI4by3t4kRr51d_Qm_x58"
],
"headlessMode": true,
"scrollCount": 10
}

Output Format

For Apify Platform:

Each review is stored as a separate dataset item for optimal Apify integration:

{
"reviewerName": "John Doe",
"rating": 5,
"reviewText": "Great place, highly recommended!",
"reviewDate": "2 months ago",
"placeId": "ChIJN1t_tDeuEmsRUsoyG83frY4",
"scrapedAt": "2025-06-16T12:00:00.000Z",
"success": true
}

Summary statistics are saved in the OUTPUT key-value store:

{
"totalPlaceIds": 2,
"totalReviews": 45,
"successfulScrapes": 2,
"failedScrapes": 0,
"maxScrollsUsed": 5,
"processedAt": "2025-06-16T12:00:00.000Z",
"placeIds": ["ChIJN1t_tDeuEmsRUsoyG83frY4"],
"scrapingResults": {"ChIJN1t_tDeuEmsRUsoyG83frY4": true}
}

For Local/Standalone Usage:

Results are saved as JSON files with nested structure:

{
"summary": {
"totalPlaceIds": 2,
"totalReviews": 45,
"successfulScrapes": 2,
"failedScrapes": 0,
"processedAt": "2025-06-16T12:00:00.000Z",
"placeIds": ["ChIJN1t_tDeuEmsRUsoyG83frY4"]
},
"reviews": [
{
"author": "John Doe",
"stars": 5,
"date": "2 months ago",
"text": "Great place, highly recommended!",
"placeId": "ChIJN1t_tDeuEmsRUsoyG83frY4",
"scrapedAt": "2025-06-16T12:00:00.000Z",
"success": true
}
]
}

Project Structure

β”œβ”€β”€ src/
β”‚ β”œβ”€β”€ main.js # Main application logic
β”‚ β”œβ”€β”€ scraper.js # Playwright browser automation
β”‚ β”œβ”€β”€ google_reviews_parser.js # HTML parsing with Cheerio
β”‚ β”œβ”€β”€ config.js # Configuration management
β”‚ └── apify-main.js # Apify-compatible version
β”œβ”€β”€ example.js # Usage example
β”œβ”€β”€ Dockerfile # Docker container configuration
β”œβ”€β”€ docker-compose.yml # Container orchestration
β”œβ”€β”€ package.json # Node.js dependencies
β”œβ”€β”€ INPUT_SCHEMA.json # Apify input schema
└── .env.example # Environment template

How It Works

  1. Input Processing: Accepts place IDs from various sources
  2. Browser Automation: Launches Chrome with stealth settings
  3. Navigation: Visits Google Maps place pages for each place ID
  4. Review Loading: Clicks Reviews tab, sorts by newest, scrolls to load more
  5. HTML Extraction: Saves review container HTML to temporary files
  6. Parsing: Uses Cheerio to extract structured review data
  7. Output: Saves results as JSON and cleans up temporary files

Advanced Features

Stealth Mode

  • Custom user agent and viewport
  • Disabled automation flags
  • Random delays between actions
  • Human-like scrolling patterns

Error Handling

  • Graceful failure handling per place ID
  • Screenshot capture on errors
  • Comprehensive logging
  • Automatic retry mechanisms

Performance Optimization

  • Single browser instance for all place IDs
  • Parallel file processing
  • Efficient memory management
  • Automatic cleanup of temporary files

API Reference

scrapeReviews(placeIds)

Programmatically scrape reviews for given place IDs.

Parameters:

  • placeIds (Array

Returns:

  • Promise<Array>: Array of review objects with metadata

Example:

const { scrapeReviews } = require('./src/main');
const reviews = await scrapeReviews([
'ChIJN1t_tDeuEmsRUsoyG83frY4'
]);
console.log(`Found ${reviews.length} reviews`);

Troubleshooting

Common Issues

  1. Browser not found:

    $npx playwright install chromium
  2. No place IDs provided:

    • Ensure place IDs are valid Google Place IDs
    • Check input format (comma-separated for env vars)
  3. Permission errors in Docker:

    • Application runs as non-root user for security
    • Check file permissions when mounting volumes

Finding Google Place IDs

  1. Search for a business on Google Maps
  2. Look at the URL: ...place/.../@... or use the share link
  3. Extract the place ID from the URL or use Google Places API

Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Test thoroughly
  5. Submit a pull request

License

ISC License - see LICENSE file for details.