Trustpilot Reviews Scraper avatar

Trustpilot Reviews Scraper

Try for free

2 hours trial then $10.00/month - No credit card required now

View all Actors
Trustpilot Reviews Scraper

Trustpilot Reviews Scraper

memo23/apify-trustpilot-cheerio
Try for free

2 hours trial then $10.00/month - No credit card required now

Enterprise-grade Trustpilot scraper that extracts 40+ data points per review with built-in analytics and 100% verified reviews coverage. Perfect for business intelligence and market research.

Overview

This scraper allows you to extract reviews from Trustpilot.com for a specific business or domain. You can gather detailed information about each review, including review text, rating, verification details, consumer information, and any replies from the business. This data is particularly useful for sentiment analysis, competitive analysis, or understanding customer experiences.

Features

  • Scrape Reviews: Extract individual reviews, including their content, rating, and consumer details.
  • Include Company Details: Optionally extract company details to provide context to the reviews.
  • Include Statistics: Collect statistical information about the reviews and company.
  • Flexible Input: Use multiple Trustpilot.com review page URLs for targeted scraping.
  • Filtering: Filter reviews by date
  • Limits: Customizable maximum number of reviews to scrape per crawl
  • Proxies: Support for proxy configuration

How to Use

  1. Set Up: Ensure you have an Apify account and access to the Apify platform.
  2. Configure Input: Add Trustpilot review URLs that you wish to scrape. The input must be in the format of:
    • https://trustpilot.com/review/www.dugood.org
  3. Advanced Configuration:
    • Enable includeCompanyDetails to get business information
    • Enable includeStatistics to get statistical data
    • Set date filtering with only Newer Than
    • Configure concurrency settings for performance optimization
  4. Run the Scraper: Start the scraper on the Apify platform. You can monitor its progress and adjust settings if needed.
  5. Data Collection: Extracted data will be available in various formats supported by Apify, including JSON, CSV, and Excel.

Input Data

Here's an example of how to set up Trustpilot reviews scraping:

1{
2    "startUrls": [
3        {
4            "url": "https://trustpilot.com/review/www.dugood.org"
5        }
6    ],
7    "maxConcurrency": 10,
8    "minConcurrency": 1,
9    "maxRequestRetries": 100,
10    "includeCompanyDetails": true,
11    "includeStatistics": true
12}

Input Fields Explanation

  • startUrls: Array of Trustpilot review URLs to scrape.
  • maxConcurrency: Maximum number of pages processed simultaneously (default: 10).
  • minConcurrency: Minimum number of pages processed simultaneously (default: 1).
  • maxRequestRetries: Number of retries for failed requests (default: 100).
  • includeCompanyDetails: Boolean to determine whether to scrape company details (default: false).
  • includeStatistics: Boolean to determine whether to scrape review statistics (default: false).

Output Structure

The output data includes detailed information about each review. Here's a sample of the structure:

1{
2    "id": "671d4064ff600a8a63bcea11",
3    "filtered": false,
4    "pending": false,
5    "text": "The manager was awesome in helping my brother & me in the loss of our younger brother regarding his accounts. DuGood also came through & helped us with another issue when no one else would. We truly appreciate your knowledge & caring toward your customers. It did not go unnoticed & you now have several family members who are new customers because of it. Thank you again!",
6    "rating": 5,
7    "labels": {
8        "merged": null,
9        "verification": {
10            "isVerified": true,
11            "createdDateTime": "2024-10-26T21:17:56.000Z",
12            "reviewSourceName": "AFSv2",
13            "verificationSource": "invitation",
14            "verificationLevel": "verified",
15            "hasDachExclusion": false
16        }
17    },
18    "title": "The manager was awesome in helping my…",
19    "likes": 8,
20    "dates": {
21        "experiencedDate": "2024-10-23T00:00:00.000Z",
22        "publishedDate": "2024-10-26T21:17:56.000Z",
23        "updatedDate": null
24    },
25    "report": null,
26    "hasUnhandledReports": false,
27    "consumer": {
28        "id": "671d4063422e94eeb1028118",
29        "displayName": "Cheryl",
30        "imageUrl": "",
31        "numberOfReviews": 1,
32        "countryCode": "US",
33        "hasImage": false,
34        "isVerified": false
35    },
36    "reply": {
37        "message": "Hi Cheryl,\n\nIt's heartwarming to hear about the support you received during such a difficult time. We are dedicated to our members and we're glad we could help you and your family. Thank you for sharing your touching experience with us.",
38        "publishedDate": "2024-10-30T17:25:46.000Z",
39        "updatedDate": null
40    },
41    "consumersReviewCountOnSameDomain": 1,
42    "consumersReviewCountOnSameLocation": null,
43    "productReviews": [],
44    "language": "en",
45    "location": null
46}

Output Fields Explanation

  • id: Unique identifier for the review.
  • filtered: Boolean indicating if the review has been filtered by Trustpilot.
  • pending: Boolean indicating if the review is still pending approval.
  • text: Content of the review provided by the consumer.
  • rating: Rating given by the reviewer (e.g., 1-5 stars).
  • labels: Object containing additional labels and verification details.
    • merged: Indicates if the review has been merged (usually null).
    • verification: Object containing verification details of the review.
      • isVerified: Boolean indicating if the review is verified.
      • createdDateTime: Timestamp of when the verification was created.
      • reviewSourceName: Source name of the review.
      • verificationSource: Source of verification (e.g., "invitation").
      • verificationLevel: Level of verification (e.g., "verified").
      • hasDachExclusion: Boolean indicating if there is a DACH exclusion.
  • title: Short title or summary of the review.
  • likes: Number of likes the review has received.
  • dates: Object containing various dates related to the review.
    • experiencedDate: Date when the experience took place.
    • publishedDate: Date when the review was published.
    • updatedDate: Date when the review was last updated (if applicable).
  • report: Object containing report details if the review has been reported (usually null).
  • hasUnhandledReports: Boolean indicating if there are any unhandled reports for this review.
  • consumer: Object containing details about the consumer who left the review.
    • id: Unique identifier for the consumer.
    • displayName: Display name of the consumer.
    • imageUrl: URL of the consumer's profile image (if available).
    • numberOfReviews: Total number of reviews the consumer has written.
    • countryCode: Country code of the consumer.
    • hasImage: Boolean indicating if the consumer has an image.
    • isVerified: Boolean indicating if the consumer is verified.
  • reply: Object containing details of the business's reply to the review.
    • message: Content of the reply.
    • publishedDate: Date when the reply was published.
    • updatedDate: Date when the reply was last updated (if applicable).
  • consumersReviewCountOnSameDomain: Number of reviews the consumer has on the same domain.
  • consumersReviewCountOnSameLocation: Number of reviews the consumer has at the same location (if applicable).
  • productReviews: Array containing any product-specific reviews (if applicable).
  • language: Language of the review.
  • location: Location associated with the review (if available).

Company Details Output (If includeCompanyDetails is set to true)

1"company": {
2    "id": "592855190000ff0005a33f85",
3    "displayName": "DuGood Credit Union",
4    "identifyingName": "www.dugood.org",
5    "numberOfReviews": 4105,
6    "trustScore": 4.8,
7    "websiteUrl": "https://www.dugood.org",
8    "websiteTitle": "www.dugood.org",
9    "profileImageUrl": "//s3-eu-west-1.amazonaws.com/tpd/logos/592855190000ff0005a33f85/0x0.png",
10    "customHeaderUrl": "",
11    "promotion": null,
12    "hideCompetitorModule": false,
13    "stars": 5,
14    "categories": [
15        {
16            "id": "federal_credit_union",
17            "name": "Federal Credit Union",
18            "rank": "1",
19            "cardinality": "3",
20            "isPrimary": false
21        },
22        {
23            "id": "credit_union",
24            "name": "Credit Union",
25            "rank": "1",
26            "cardinality": "7",
27            "isPrimary": true
28        }
29    ],
30    "breadcrumb": {
31        "topLevelId": "money_insurance",
32        "topLevelDisplayName": "Money & Insurance",
33        "midLevelId": "credit_debt_services",
34        "midLevelDisplayName": "Credit & Debt Services",
35        "bottomLevelId": "credit_union",
36        "bottomLevelDisplayName": "Credit Union"
37    },
38    "isClaimed": true,
39    "isClosed": false,
40    "isTemporarilyClosed": false,
41    "locationsCount": 0,
42    "isCollectingReviews": true,
43    "verification": {
44        "verifiedByGoogle": true,
45        "verifiedBusiness": true,
46        "verifiedPaymentMethod": true,
47        "verifiedUserIdentity": false
48    },
49    "contactInfo": {
50        "email": "marketing@dugood.org",
51        "address": "7505 Eastex Frwy",
52        "city": "Beaumont",
53        "country": "US",
54        "phone": "(409) 899-3430",
55        "zipCode": "77708"
56    }
57}

Company Fields Explanation

  • id: Unique identifier for the company.
  • displayName: Display name of the company.
  • identifyingName: The identifying name of the company, typically the website URL.
  • numberOfReviews: Total number of reviews for the company.
  • trustScore: Trust score of the company on Trustpilot.
  • websiteUrl: Official website URL of the company.
  • websiteTitle: Title of the company’s website.
  • profileImageUrl: URL of the company’s profile image.
  • customHeaderUrl: URL for any custom header image (if applicable).
  • categories: Array of categories the company belongs to, including rank and primary status.
  • breadcrumb: Hierarchical classification of the company's industry or sector.
  • isClaimed: Boolean indicating if the company profile is claimed on Trustpilot.
  • verification: Object containing verification details about the company.
    • verifiedByGoogle, verifiedBusiness, verifiedPaymentMethod: Various verification types.
  • contactInfo: Contact details of the company, including email, address, city, country, phone, and zip code.

Statistics Output (If includeStatistics is set to true)

1"stats": {
2    "total": 4105,
3    "one": 43,
4    "two": 15,
5    "three": 46,
6    "four": 168,
7    "five": 3833,
8    "totalNumberOfReviews": 3450
9}

Statistics Fields Explanation

  • total: Total number of reviews collected.
  • one: Number of 1-star reviews.
  • two: Number of 2-star reviews.
  • three: Number of 3-star reviews.
  • four: Number of 4-star reviews.
  • five: Number of 5-star reviews.
  • totalNumberOfReviews: Total number of reviews that were actually processed.

Usage Tips

  1. For bulk scraping, use the "Bulk edit" feature to input multiple profile URLs or usernames.
  2. Adjust concurrency settings based on your network capabilities and Twitter's rate limits.
  3. Use proxy configuration for large-scale scraping to avoid IP blocks.

Explore More Scrapers

If you found this Apify Smartbuyglasses Scraper useful, be sure to check out our other powerful scrapers and actors at memo23's Apify profile. We offer a wide range of tools to enhance your web scraping and automation needs across various platforms and use cases.

Support

Additional Services

Developer
Maintained by Community
Actor metrics
  • 2 monthly users
  • 0 stars
  • 100.0% runs succeeded
  • Created in Nov 2024
  • Modified 6 days ago