
Google Review Scraper
Pricing
$15.00/month + usage

Google Review Scraper
High-performance Google Reviews scraper built with Playwright & Node.js. Extract reviews, ratings, authors & dates from any Google business using place IDs. Optimized for speed with stealth mode, Docker-ready, Apify-compatible. Batch processing, JSON output, minimal delays.
0.0 (0)
Pricing
$15.00/month + usage
0
Total users
2
Monthly users
2
Runs succeeded
>99%
Last modified
2 days ago
A highly optimized Node.js application designed to scrape Google reviews with Playwright, built for compatibility with the Apify platform and Docker deployment. Input-driven with no database dependencies.
Features
- π High Performance: Batch processing with single browser instance
- β‘ Speed Optimized: Minimal delays (1-2s navigation, 0.1-0.8s scrolling)
- πΈοΈ Stealth Mode: Advanced anti-detection techniques
- π³ Docker Ready: Full containerization support
- π Apify Compatible: Ready for Apify platform deployment
- π₯ Input Driven: Simply provide place IDs as input
- π JSON Output: Results saved as structured JSON files
- π― Precise Parsing: Extracts author, rating, date, and review text
- π Human-like Behavior: Randomized actions with minimal delays
Performance Optimizations
This scraper is highly optimized for speed while maintaining reliability:
- Reduced Delays: Navigation delays reduced to 1-2 seconds, scroll delays to 0.3-0.8 seconds
- Configurable Scrolling: Default 5 scrolls (vs 8-15), customizable via
maxScrolls
input - Single Browser Instance: Reuses browser context across multiple place IDs
- Minimal Anti-Detection: Just enough randomization to avoid detection without sacrificing speed
- Fast Parsing: Uses Cheerio for rapid HTML parsing
- Batch Processing: Processes multiple place IDs in a single session
Prerequisites
- Node.js (version 18.0.0 or higher)
- Chrome browser (for local development)
- Docker & Docker Compose (for containerized deployment)
Quick Start
Local Development
-
Clone and setup:
git clone https://github.com/MuhammadShahzeb123/google-review-scraper-2.gitcd google-review-scrapernpm install -
Install Playwright browsers:
$npx playwright install chromium -
Run with place IDs:
# Method 1: Command line argumentsnode src/main.js ChIJN1t_tDeuEmsRUsoyG83frY4 ChIJGVtI4by3t4kRr51d_Qm_x58# Method 2: Environment variablePLACE_IDS="ChIJN1t_tDeuEmsRUsoyG83frY4,ChIJGVtI4by3t4kRr51d_Qm_x58" npm start# Method 3: Run examplenode example.js
Docker Deployment
-
Using Docker Compose:
$PLACE_IDS="ChIJN1t_tDeuEmsRUsoyG83frY4,ChIJGVtI4by3t4kRr51d_Qm_x58" docker-compose up -
Using Docker only:
docker build -t google-review-scraper .docker run --rm \-e PLACE_IDS="ChIJN1t_tDeuEmsRUsoyG83frY4,ChIJGVtI4by3t4kRr51d_Qm_x58" \-v $(pwd)/output:/app/output \google-review-scraper
Apify Platform
-
Configure via Apify Input UI:
All settings are now configurable through the Apify input interface:
- Place IDs: List of Google Place IDs to scrape
- Max Scrolls: Number of scroll actions (1-20, default: 5)
- Headless Mode: Run browser in background (default: true)
- Browser Timeout: Page load timeout in ms (default: 60000)
- Cleanup HTML: Delete temp files after processing (default: true)
- Log Level: Logging verbosity (error/warn/info/debug)
-
Push to Apify:
apify loginapify push -
Run on Apify:
$apify run
Configuration
Apify Platform (Recommended)
All configuration is done through the Apify Input UI - no environment variables needed!
Local/Standalone Environment Variables (Legacy)
Variable | Description | Default |
---|---|---|
PLACE_IDS | Comma-separated place IDs | Example IDs |
HEADLESS_MODE | Browser headless mode | false |
BROWSER_TIMEOUT | Page timeout (ms) | 60000 |
SCROLL_COUNT_MIN | Min scroll actions | 8 |
SCROLL_COUNT_MAX | Max scroll actions | 15 |
Input Methods
1. Command Line Arguments:
$node src/main.js PLACE_ID_1 PLACE_ID_2 PLACE_ID_3
2. Environment Variable:
export PLACE_IDS="ChIJN1t_tDeuEmsRUsoyG83frY4,ChIJGVtI4by3t4kRr51d_Qm_x58"npm start
3. Programmatic Usage:
const { scrapeReviews } = require('./src/main');const placeIds = ['ChIJN1t_tDeuEmsRUsoyG83frY4'];const reviews = await scrapeReviews(placeIds);
4. Apify Input:
{"placeIds": ["ChIJN1t_tDeuEmsRUsoyG83frY4","ChIJGVtI4by3t4kRr51d_Qm_x58"],"headlessMode": true,"scrollCount": 10}
Output Format
For Apify Platform:
Each review is stored as a separate dataset item for optimal Apify integration:
{"reviewerName": "John Doe","rating": 5,"reviewText": "Great place, highly recommended!","reviewDate": "2 months ago","placeId": "ChIJN1t_tDeuEmsRUsoyG83frY4","scrapedAt": "2025-06-16T12:00:00.000Z","success": true}
Summary statistics are saved in the OUTPUT key-value store:
{"totalPlaceIds": 2,"totalReviews": 45,"successfulScrapes": 2,"failedScrapes": 0,"maxScrollsUsed": 5,"processedAt": "2025-06-16T12:00:00.000Z","placeIds": ["ChIJN1t_tDeuEmsRUsoyG83frY4"],"scrapingResults": {"ChIJN1t_tDeuEmsRUsoyG83frY4": true}}
For Local/Standalone Usage:
Results are saved as JSON files with nested structure:
{"summary": {"totalPlaceIds": 2,"totalReviews": 45,"successfulScrapes": 2,"failedScrapes": 0,"processedAt": "2025-06-16T12:00:00.000Z","placeIds": ["ChIJN1t_tDeuEmsRUsoyG83frY4"]},"reviews": [{"author": "John Doe","stars": 5,"date": "2 months ago","text": "Great place, highly recommended!","placeId": "ChIJN1t_tDeuEmsRUsoyG83frY4","scrapedAt": "2025-06-16T12:00:00.000Z","success": true}]}
Project Structure
βββ src/β βββ main.js # Main application logicβ βββ scraper.js # Playwright browser automationβ βββ google_reviews_parser.js # HTML parsing with Cheerioβ βββ config.js # Configuration managementβ βββ apify-main.js # Apify-compatible versionβββ example.js # Usage exampleβββ Dockerfile # Docker container configurationβββ docker-compose.yml # Container orchestrationβββ package.json # Node.js dependenciesβββ INPUT_SCHEMA.json # Apify input schemaβββ .env.example # Environment template
How It Works
- Input Processing: Accepts place IDs from various sources
- Browser Automation: Launches Chrome with stealth settings
- Navigation: Visits Google Maps place pages for each place ID
- Review Loading: Clicks Reviews tab, sorts by newest, scrolls to load more
- HTML Extraction: Saves review container HTML to temporary files
- Parsing: Uses Cheerio to extract structured review data
- Output: Saves results as JSON and cleans up temporary files
Advanced Features
Stealth Mode
- Custom user agent and viewport
- Disabled automation flags
- Random delays between actions
- Human-like scrolling patterns
Error Handling
- Graceful failure handling per place ID
- Screenshot capture on errors
- Comprehensive logging
- Automatic retry mechanisms
Performance Optimization
- Single browser instance for all place IDs
- Parallel file processing
- Efficient memory management
- Automatic cleanup of temporary files
API Reference
scrapeReviews(placeIds)
Programmatically scrape reviews for given place IDs.
Parameters:
placeIds
(Array
Returns:
Promise<Array>
: Array of review objects with metadata
Example:
const { scrapeReviews } = require('./src/main');const reviews = await scrapeReviews(['ChIJN1t_tDeuEmsRUsoyG83frY4']);console.log(`Found ${reviews.length} reviews`);
Troubleshooting
Common Issues
-
Browser not found:
$npx playwright install chromium -
No place IDs provided:
- Ensure place IDs are valid Google Place IDs
- Check input format (comma-separated for env vars)
-
Permission errors in Docker:
- Application runs as non-root user for security
- Check file permissions when mounting volumes
Finding Google Place IDs
- Search for a business on Google Maps
- Look at the URL:
...place/.../@...
or use the share link - Extract the place ID from the URL or use Google Places API
Contributing
- Fork the repository
- Create a feature branch
- Make your changes
- Test thoroughly
- Submit a pull request
License
ISC License - see LICENSE file for details.
On this page
Share Actor: