Website ESG + Certifications Scraper avatar
Website ESG + Certifications Scraper
Under maintenance

Pricing

$3.00 / 1,000 websites

Go to Apify Store
Website ESG + Certifications Scraper

Website ESG + Certifications Scraper

Under maintenance

Developed by

Carlo Sant

Carlo Sant

Maintained by Community

Detect sustainability certifications and ESG practices on websites with comprehensive, robust multi-page analysis. This tool can be used on any website (not just on hotel websites).

5.0 (1)

Pricing

$3.00 / 1,000 websites

0

2

2

Last modified

7 days ago

Hotel Sustainability Scraper

A powerful Apify Actor that analyzes hotel websites to detect sustainability certifications and ESG (Environmental, Social, Governance) practices.

Requirements

  • Python 3.10 or higher (required by Apify SDK)
  • Docker (for building the Actor)

Features

  • Certification Detection: Identifies 30+ sustainability certifications including GSTC, EarthCheck, Green Globe, Green Key, Travelife, EU Ecolabel, B Corporation, LEED, and more
  • ESG Practice Analysis: Detects environmental, social, and governance practices across hotel websites
  • Multi-page Crawling: Intelligently discovers and analyzes relevant pages (sustainability, about, CSR pages)
  • Parallel Processing: Configurable worker threads for fast batch processing
  • Robust Scraping: Supports both standard HTTP requests and Playwright browser automation for JavaScript-heavy sites
  • Proxy Support: Optional proxy configuration for avoiding rate limits

Input

The Actor accepts the following input parameters:

{
"hotels": [
{
"hotel_name": "Example Eco Hotel",
"website": "https://example.com",
"place_id": "12345"
}
],
"workers": 3,
"usePlaywright": true,
"useProxy": false,
"proxyHost": "",
"proxyPort": "22225",
"proxyUsername": "",
"proxyPassword": ""
}

Input Parameters

  • hotels (required): Array of hotel objects. Each hotel must have a website field. Optional fields: hotel_name, place_id
  • workers (optional, default: 3): Number of parallel workers (1-20). Higher values increase speed but use more resources
  • usePlaywright (optional, default: true): Enable Playwright browser automation for JavaScript-rendered pages
  • useProxy (optional, default: false): Enable proxy for scraping
  • proxyHost (optional): Proxy server hostname (e.g., brd.superproxy.io)
  • proxyPort (optional, default: "22225"): Proxy server port
  • proxyUsername (optional): Proxy authentication username
  • proxyPassword (optional): Proxy authentication password

Output

The Actor outputs a dataset where each item represents one hotel with the following structure:

{
"hotel_name": "Example Eco Hotel",
"website": "https://example.com",
"place_id": "12345",
"status": "success",
"pages_crawled": 5,
"pages_attempted": 5,
"certifications": [
{
"name": "Green Globe",
"found_on_page": "https://example.com/sustainability",
"context": "We are proud to be Green Globe certified..."
}
],
"esg_practices": {
"environment": [
{
"name": "Renewable Energy",
"found_on_page": "https://example.com/sustainability",
"context": "100% of our energy comes from renewable sources..."
}
],
"social": [...],
"governance": [...]
},
"summary": {
"total_certifications": 2,
"total_environment_practices": 5,
"total_social_practices": 3,
"total_governance_practices": 1
},
"error_message": null
}

Status Values

  • success: Hotel website was successfully scraped (3+ pages crawled)
  • partial: Some pages were scraped but not all (1-2 pages crawled)
  • failed: Unable to scrape the website (0 pages crawled)

Detected Certifications

The Actor detects 30+ sustainability certifications including:

  • GSTC (Global Sustainable Tourism Council)
  • EarthCheck
  • Green Globe
  • Green Key
  • Travelife
  • EU Ecolabel
  • Green Seal
  • B Corporation
  • LEED (Leadership in Energy and Environmental Design)
  • ISO 14001
  • ISO 50001
  • Carbon Neutral
  • Climate Neutral
  • And many more...

Detected ESG Practices

Environmental

  • Renewable energy usage
  • Water conservation
  • Waste management & recycling
  • Plastic reduction
  • Carbon offsetting
  • Energy efficiency
  • Sustainable sourcing
  • Biodiversity protection

Social

  • Community engagement
  • Fair labor practices
  • Diversity & inclusion
  • Employee welfare
  • Charitable giving
  • Local partnerships

Governance

  • Transparency reporting
  • Ethical sourcing
  • Supply chain management
  • Compliance & certifications

Usage Tips

  1. Batch Size: For large batches (100+ hotels), consider splitting into smaller runs to manage costs and timeout risks
  2. Workers: Start with 3 workers. Increase to 5-10 for faster processing if your Apify plan allows
  3. Playwright: Disable if you only need basic HTML scraping (faster, cheaper). Enable for comprehensive coverage
  4. Proxy: Enable if you encounter rate limiting or IP blocks from hotel websites

Example Run

# Using Apify CLI
apify run --input '{
"hotels": [
{"hotel_name": "Eco Lodge", "website": "https://example.com"}
],
"workers": 3,
"usePlaywright": true
}'

Performance

  • Speed: ~0.5-2 hotels per second (depending on workers and website complexity)
  • Typical Run: 100 hotels in 2-5 minutes with 3 workers
  • Resource Usage: Standard Apify compute unit consumption

Support

For issues, questions, or feature requests, please contact the actor maintainer.