
Tripadvisor Review Scraper
Pricing
Pay per event

Tripadvisor Review Scraper
A specialized Apify actor that extracts detailed reviews from TripAdvisor hotels, restaurants, and attractions. Features advanced anti-bot measures, residential proxy support, and comprehensive review data extraction.
0.0 (0)
Pricing
Pay per event
0
1
1
Last modified
4 days ago
A specialized Apify actor that extracts detailed reviews from TripAdvisor hotels, restaurants, and attractions. Features advanced anti-bot measures, residential proxy support, and comprehensive review data extraction.
What It Does
This scraper extracts structured review data from TripAdvisor pages including:
Field | Description |
---|---|
reviewId | Unique review identifier |
title | Review title/headline |
text | Full review text content |
rating | Star rating (1-5 scale) |
date | Review publication date |
author.name | Reviewer's name |
author.location | Reviewer's location |
helpfulCount | Number of helpful votes |
photos | Array of review photo URLs (optional) |
url | Source TripAdvisor page URL |
scrapedAt | Timestamp of data extraction |
Use Cases
- Sentiment Analysis: Analyze customer sentiment and satisfaction trends
- Competitor Research: Monitor reviews for competing hotels/restaurants
- Reputation Management: Track review patterns and identify improvement areas
- Market Research: Understand customer preferences and pain points
- Review Monitoring: Get alerts for new reviews and rating changes
Input
The actor accepts the following input format:
{"startUrls": [{ "url": "https://www.tripadvisor.com/Hotel_Review-g60763-d93452-Reviews-The_Plaza_Hotel-New_York_City_New_York.html" }],"maxReviews": 100,"includePhotos": false}
Input Parameters
Parameter | Type | Required | Default | Description |
---|---|---|---|---|
startUrls | Array | Yes | - | Array of objects with url property pointing to TripAdvisor review pages |
maxReviews | Number | No | 100 | Maximum number of reviews to extract per page (with pagination) |
includePhotos | Boolean | No | false | Whether to extract photo URLs from reviews |
Supported TripAdvisor URLs
The scraper works with TripAdvisor review pages for:
- Hotels:
https://www.tripadvisor.com/Hotel_Review-g60763-d93452-Reviews-The_Plaza_Hotel-New_York_City_New_York.html
- Restaurants:
https://www.tripadvisor.com/Restaurant_Review-g60763-d1751194-Reviews-Eleven_Madison_Park-New_York_City_New_York.html
- Attractions:
https://www.tripadvisor.com/Attraction_Review-g60763-d104365-Reviews-Statue_of_Liberty-New_York_City_New_York.html
- Activities:
https://www.tripadvisor.com/AttractionProductReview-g60763-d11966990-Reviews-Central_Park_Walking_Tour-New_York_City_New_York.html
Output
The actor outputs structured data for each review found:
{"reviewId": "review_123456","title": "Amazing stay at The Plaza!","text": "We had a wonderful time at this hotel. The service was exceptional and the location perfect for exploring NYC. The rooms were clean and comfortable with beautiful views of Central Park.","rating": 5.0,"date": "December 2024","author": {"name": "John D","location": "Los Angeles, CA"},"helpfulCount": 15,"photos": ["https://media-cdn.tripadvisor.com/media/photo-s/..."],"url": "https://www.tripadvisor.com/Hotel_Review-g60763-d93452-Reviews-The_Plaza_Hotel-New_York_City_New_York.html","scrapedAt": "2024-01-15T10:30:00.000Z"}
Example Usage
Single Hotel Reviews
{"startUrls": [{ "url": "https://www.tripadvisor.com/Hotel_Review-g60763-d93452-Reviews-The_Plaza_Hotel-New_York_City_New_York.html" }],"maxReviews": 100,"includePhotos": false}
Multiple Properties
{"startUrls": [{ "url": "https://www.tripadvisor.com/Hotel_Review-g60763-d93452-Reviews-The_Plaza_Hotel-New_York_City_New_York.html" },{ "url": "https://www.tripadvisor.com/Restaurant_Review-g60763-d1751194-Reviews-Eleven_Madison_Park-New_York_City_New_York.html" }],"maxReviews": 50,"includePhotos": false}
Reviews with Photos
{"startUrls": [{ "url": "https://www.tripadvisor.com/Hotel_Review-g60763-d93452-Reviews-The_Plaza_Hotel-New_York_City_New_York.html" }],"maxReviews": 25,"includePhotos": true}
How It Works
- Page Navigation: Uses Puppeteer with stealth mode and residential proxies to load TripAdvisor review pages
- Anti-Bot Bypass: Enhanced headers, realistic timing, and residential IP rotation to avoid detection
- Review Detection: Identifies review containers using multiple CSS selectors for maximum compatibility
- Data Extraction: Extracts review text, ratings, author info, and optional photos
- Pagination: Automatically navigates through multiple review pages to reach maxReviews limit
- Structured Output: Returns clean, structured review data ready for analysis
Features
- Residential Proxy Support: Uses Apify's residential proxy network for better anti-bot bypass
- Advanced Anti-Detection: Realistic browser headers, timing, and stealth mode
- Robust Extraction: Multiple fallback selectors to handle TripAdvisor's changing page structure
- Review Pagination: Automatically navigates through multiple pages of reviews
- Photo Extraction: Optional extraction of review photos and media
- Error Handling: Graceful error handling with detailed logging and blocking detection
- Data Validation: Ensures data quality with validation checks
Installation
- Clone this repository
- Install dependencies:
npm install
- Run the actor:
npm start
Development
npm start
- Run the actornpm run format
- Format code with Prettiernpm run lint
- Run ESLintnpm run lint:fix
- Fix ESLint issuesnode test-tripadvisor.js
- Test TripAdvisor functionalitynode test-with-proxy.js
- Test with Apify residential proxies
Architecture
src/main.js
- Main entry point and input validation for TripAdvisor URLssrc/routes.js
- Request routing with TripAdvisor URL validationsrc/handlers/tripadvisorReviews.js
- TripAdvisor review scraping logic with anti-bot measuressrc/puppeteerLauncher.js
- Puppeteer browser configuration with stealth mode
Deployment
Deploy to Apify Platform (Recommended)
$apify push
Local Testing with Apify Token
export APIFY_TOKEN=your_apify_token_herenode src/main.js
Notes
- Residential Proxies Required: TripAdvisor actively blocks datacenter IPs. Deploy to Apify platform or use valid Apify token for residential proxy access
- Anti-Bot Measures: The scraper includes advanced anti-detection measures but TripAdvisor's blocking is sophisticated
- Success Rate: Best results when deployed to Apify platform with residential proxy rotation
- Pagination Support: Automatically navigates through multiple review pages to reach maxReviews limit
- All extracted data is timestamped for tracking purposes
- The scraper is designed to be respectful of TripAdvisor's servers with realistic delays
On this page
Share Actor: