Website ESG + Certifications Scraper
Pricing
$3.00 / 1,000 websites
Go to Apify Store
Website ESG + Certifications Scraper
Under maintenanceDetect sustainability certifications and ESG practices on websites with comprehensive, robust multi-page analysis. This tool can be used on any website (not just on hotel websites).
5.0 (1)
Pricing
$3.00 / 1,000 websites
0
2
2
Last modified
7 days ago
Hotel Sustainability Scraper
A powerful Apify Actor that analyzes hotel websites to detect sustainability certifications and ESG (Environmental, Social, Governance) practices.
Requirements
- Python 3.10 or higher (required by Apify SDK)
- Docker (for building the Actor)
Features
- Certification Detection: Identifies 30+ sustainability certifications including GSTC, EarthCheck, Green Globe, Green Key, Travelife, EU Ecolabel, B Corporation, LEED, and more
- ESG Practice Analysis: Detects environmental, social, and governance practices across hotel websites
- Multi-page Crawling: Intelligently discovers and analyzes relevant pages (sustainability, about, CSR pages)
- Parallel Processing: Configurable worker threads for fast batch processing
- Robust Scraping: Supports both standard HTTP requests and Playwright browser automation for JavaScript-heavy sites
- Proxy Support: Optional proxy configuration for avoiding rate limits
Input
The Actor accepts the following input parameters:
{"hotels": [{"hotel_name": "Example Eco Hotel","website": "https://example.com","place_id": "12345"}],"workers": 3,"usePlaywright": true,"useProxy": false,"proxyHost": "","proxyPort": "22225","proxyUsername": "","proxyPassword": ""}
Input Parameters
- hotels (required): Array of hotel objects. Each hotel must have a
websitefield. Optional fields:hotel_name,place_id - workers (optional, default: 3): Number of parallel workers (1-20). Higher values increase speed but use more resources
- usePlaywright (optional, default: true): Enable Playwright browser automation for JavaScript-rendered pages
- useProxy (optional, default: false): Enable proxy for scraping
- proxyHost (optional): Proxy server hostname (e.g., brd.superproxy.io)
- proxyPort (optional, default: "22225"): Proxy server port
- proxyUsername (optional): Proxy authentication username
- proxyPassword (optional): Proxy authentication password
Output
The Actor outputs a dataset where each item represents one hotel with the following structure:
{"hotel_name": "Example Eco Hotel","website": "https://example.com","place_id": "12345","status": "success","pages_crawled": 5,"pages_attempted": 5,"certifications": [{"name": "Green Globe","found_on_page": "https://example.com/sustainability","context": "We are proud to be Green Globe certified..."}],"esg_practices": {"environment": [{"name": "Renewable Energy","found_on_page": "https://example.com/sustainability","context": "100% of our energy comes from renewable sources..."}],"social": [...],"governance": [...]},"summary": {"total_certifications": 2,"total_environment_practices": 5,"total_social_practices": 3,"total_governance_practices": 1},"error_message": null}
Status Values
- success: Hotel website was successfully scraped (3+ pages crawled)
- partial: Some pages were scraped but not all (1-2 pages crawled)
- failed: Unable to scrape the website (0 pages crawled)
Detected Certifications
The Actor detects 30+ sustainability certifications including:
- GSTC (Global Sustainable Tourism Council)
- EarthCheck
- Green Globe
- Green Key
- Travelife
- EU Ecolabel
- Green Seal
- B Corporation
- LEED (Leadership in Energy and Environmental Design)
- ISO 14001
- ISO 50001
- Carbon Neutral
- Climate Neutral
- And many more...
Detected ESG Practices
Environmental
- Renewable energy usage
- Water conservation
- Waste management & recycling
- Plastic reduction
- Carbon offsetting
- Energy efficiency
- Sustainable sourcing
- Biodiversity protection
Social
- Community engagement
- Fair labor practices
- Diversity & inclusion
- Employee welfare
- Charitable giving
- Local partnerships
Governance
- Transparency reporting
- Ethical sourcing
- Supply chain management
- Compliance & certifications
Usage Tips
- Batch Size: For large batches (100+ hotels), consider splitting into smaller runs to manage costs and timeout risks
- Workers: Start with 3 workers. Increase to 5-10 for faster processing if your Apify plan allows
- Playwright: Disable if you only need basic HTML scraping (faster, cheaper). Enable for comprehensive coverage
- Proxy: Enable if you encounter rate limiting or IP blocks from hotel websites
Example Run
# Using Apify CLIapify run --input '{"hotels": [{"hotel_name": "Eco Lodge", "website": "https://example.com"}],"workers": 3,"usePlaywright": true}'
Performance
- Speed: ~0.5-2 hotels per second (depending on workers and website complexity)
- Typical Run: 100 hotels in 2-5 minutes with 3 workers
- Resource Usage: Standard Apify compute unit consumption
Support
For issues, questions, or feature requests, please contact the actor maintainer.

