ESG Report Scraper avatar

ESG Report Scraper

Pricing

Pay per usage

Go to Apify Store
ESG Report Scraper

ESG Report Scraper

Scrape ESG and sustainability reports from corporate websites. Extract company name, ESG scores, emissions data, report URL, year, and framework used. Track corporate sustainability commitments. Export to JSON/CSV, run via API, schedule and monitor runs.

Pricing

Pay per usage

Rating

0.0

(0)

Developer

Vhub Systems

Vhub Systems

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

2 days ago

Last modified

Categories

Share

ESG Report Scraper: Corporate Sustainability Data Aggregator

Automatically collect and extract ESG metrics, sustainability reports, and corporate climate disclosures from multiple sources in one unified workflow.

What is ESG Report Scraper?

ESG Report Scraper is an automated data collection tool designed for ESG analysts, investors, and sustainability professionals who need to monitor corporate environmental disclosures at scale. Instead of manually downloading dozens of PDF reports and searching through news sites, this actor aggregates sustainability data from corporate websites, CDP database, and specialized ESG news sources, extracting structured metrics like GHG emissions, energy consumption, and waste generation.

The actor intelligently searches for sustainability reports, annual reports, and CSRD-compliant disclosures using DuckDuckGo, then processes each document to extract key environmental metrics. It automatically detects report types, identifies reporting years, and structures the data into a standardized JSON format ready for analysis, dashboards, or integration with ESG rating systems.

Whether you are tracking climate commitments across a portfolio of companies, conducting competitive ESG benchmarking, or researching corporate sustainability trends for academic purposes, this scraper eliminates manual data collection and delivers consistent, structured ESG data in minutes instead of hours.

Output Data Fields

FieldTypeDescription
companyNamestringName of the company or keyword searched
reportTitlestringFull title of the sustainability report or webpage
reportUrlstringDirect URL to the report (PDF or webpage)
reportTypestringType of report: "sustainability", "CSRD", or "annual"
yearintegerReporting year extracted from document (e.g., 2024)
keyMetricsobjectExtracted environmental metrics (emissions, energy, waste)
keyMetrics.emissionsstringGHG emissions data (e.g., "16.2 million metric tons CO2e")
keyMetrics.energystringEnergy consumption data (e.g., "2,950 GWh")
keyMetrics.wastestringWaste generation data (e.g., "38,000 tons")
summarystringShort summary or meta description from the source page
sourceUrlstringOriginal source URL where the data was found
scrapedAtstringISO timestamp when the data was collected

Tutorial: How to Extract ESG Data in 7 Steps

1. Open the Actor Navigate to the ESG Report Scraper actor page on Apify Console and click "Try for free".

2. Prepare Your Keywords Enter company names or ESG topics you want to track. Examples: "Tesla", "Unilever", "renewable energy", "carbon neutrality".

3. Configure Data Sources Choose which sources to scrape: corporate websites (direct reports), CDP database (climate disclosures), or ESG news sites (latest updates).

4. Set Result Limits Specify the maximum number of reports to process. For initial testing, start with 20-30 results. For comprehensive data collection, increase to 50-100.

5. Run the Actor Click "Start" and the actor will begin searching for sustainability reports across all selected sources. Processing typically takes 3-10 minutes depending on the number of keywords.

6. Review Extracted Metrics Once complete, view the dataset in JSON or Excel format. Each record includes structured metrics like GHG emissions, energy consumption, and direct links to source documents.

7. Export or Integrate Download the data as CSV, JSON, or Excel for manual analysis, or connect the dataset to your BI tools, ESG platforms, or investment research systems via Apify API.

Input Parameters

ParameterTypeRequiredDefaultDescription
keywordsarray of stringsYes-List of company names or ESG topics to search for (e.g., ["Apple", "Microsoft", "Tesla"])
maxResultsintegerNo20Maximum number of reports or pages to process per run (range: 1-200)
sourcesarray of stringsNo["corporate", "cdp", "news"]Data sources to scrape: "corporate" (company websites), "cdp" (CDP database), "news" (ESG news sites)

Example Input

{
"keywords": [
"Apple",
"Microsoft",
"Unilever",
"BP",
"Tesla"
],
"maxResults": 50,
"sources": [
"corporate",
"cdp",
"news"
]
}

Example Output

{
"companyName": "Apple",
"reportTitle": "Apple Environmental Progress Report 2024",
"reportUrl": "https://www.apple.com/environment/pdf/Apple_Environmental_Progress_Report_2024.pdf",
"reportType": "sustainability",
"year": 2024,
"keyMetrics": {
"emissions": "16.2 million metric tons CO2e",
"energy": "2,950 GWh",
"waste": "38,000 tons"
},
"summary": "Apple's comprehensive environmental report covering carbon neutrality goals, renewable energy progress, circular economy initiatives, and supply chain emissions reduction strategies for 2024.",
"sourceUrl": "https://www.apple.com/environment/",
"scrapedAt": "2024-02-15T14:22:30.000Z"
}
{
"companyName": "Microsoft",
"reportTitle": "Microsoft 2024 Environmental Sustainability Report - CSRD Disclosure",
"reportUrl": "https://query.prod.cms.rt.microsoft.com/cms/api/am/binary/RW1lMjE",
"reportType": "CSRD",
"year": 2024,
"keyMetrics": {
"emissions": "13.5 million metric tons CO2e",
"energy": "5,100 MWh"
},
"summary": "Microsoft's progress towards carbon negative commitment by 2030, including Scope 1, 2, and 3 emissions data aligned with CSRD requirements, renewable energy procurement, and water stewardship initiatives.",
"sourceUrl": "https://www.microsoft.com/en-us/sustainability",
"scrapedAt": "2024-02-15T14:25:18.000Z"
}
{
"companyName": "Unilever",
"reportTitle": "Unilever Sustainability Report 2024: Climate Action and Regenerative Agriculture",
"reportUrl": "https://www.unilever.com/files/sustainability/2024-sustainability-report.pdf",
"reportType": "sustainability",
"year": 2024,
"keyMetrics": {
"emissions": "8.7 million metric tons CO2e",
"energy": "3,200 GWh",
"waste": "125,000 tons"
},
"summary": "Unilever's annual sustainability disclosure covering climate commitments, regenerative agriculture programs, packaging waste reduction, and social impact initiatives across global operations.",
"sourceUrl": "https://www.unilever.com/sustainability/",
"scrapedAt": "2024-02-15T14:28:45.000Z"
}
{
"companyName": "Tesla",
"reportTitle": "Tesla Impact Report 2024",
"reportUrl": "https://www.tesla.com/ns_videos/2024-tesla-impact-report.pdf",
"reportType": "sustainability",
"year": 2024,
"keyMetrics": {
"emissions": "2.1 million metric tons CO2e avoided",
"energy": "6,800 GWh renewable"
},
"summary": "Tesla's 2024 impact report detailing environmental benefits of electric vehicles, Gigafactory renewable energy integration, battery recycling programs, and global emissions avoidance from vehicle fleet.",
"sourceUrl": "https://www.tesla.com/impact",
"scrapedAt": "2024-02-15T14:31:12.000Z"
}

This actor collects publicly available ESG data from corporate websites, CDP disclosures, and news sites that are intended for public consumption. All scraped data consists of sustainability reports and environmental disclosures voluntarily published by companies for investor relations, regulatory compliance, and stakeholder transparency. Users are responsible for ensuring their use of scraped data complies with applicable laws, including copyright, data protection regulations, and the terms of service of source websites.

When using this actor, you should verify that your data collection activities comply with relevant regulations such as GDPR in the European Union or CCPA in California if processing personal data. The actor is designed for business intelligence, investment research, and academic purposes. It is not intended for unauthorized data harvesting, competitive harm, or violation of website terms of service. Users should implement appropriate rate limiting and respect robots.txt directives when deploying this tool at scale.

Pricing

This actor uses Apify platform resources based on compute time and proxy usage. Typical costs range from $0.10 to $0.50 per run depending on the number of keywords, maximum results configured, and data sources selected. Runs with 5-10 keywords and 20-50 max results usually complete in 3-10 minutes and consume 0.02-0.10 compute units.

Pricing is pay-per-use with no subscription required. You only pay for actual compute time consumed during actor runs. For high-volume or scheduled data collection, consider upgrading to Apify paid plans for better rates on compute units and increased concurrency limits. Detailed pricing information is available on the Pricing tab of this actor's page.

Frequently Asked Questions

Q: Can this actor extract data from password-protected sustainability reports? A: No, the actor only scrapes publicly accessible reports and webpages. It cannot bypass authentication or paywalls. Ensure the reports you are targeting are publicly available or published on corporate websites without login requirements.

Q: How accurate is the metric extraction for emissions and energy data? A: The actor uses advanced regex patterns to identify and extract common ESG metrics from report text. Accuracy depends on how consistently companies format their disclosures. Metrics formatted in standard ways (e.g., "16.2 million metric tons CO2e") are extracted with high accuracy, while non-standard formats may require manual review. PDF text extraction is not included, so metrics are extracted from HTML pages or PDF links are provided for manual processing.

Q: Can I schedule this actor to run automatically every month to track new reports? A: Yes, you can use Apify Schedules to run this actor automatically at specified intervals (daily, weekly, monthly). This is ideal for monitoring new sustainability report publications or tracking ESG news updates. Configure a schedule from the Schedules tab in the Apify Console.

Q: What is the difference between corporate, CDP, and news data sources? A: The "corporate" source searches company websites for sustainability reports and annual reports using DuckDuckGo. The "cdp" source targets the CDP (Carbon Disclosure Project) database for climate-related disclosures submitted by companies. The "news" source scrapes ESG news sites like ESG Today for the latest sustainability news and announcements. You can enable all three sources for comprehensive coverage or select specific sources based on your needs.

Q: How do I export the scraped data to Excel or integrate it with my ESG platform? A: After a run completes, you can download the dataset in multiple formats including JSON, CSV, Excel, and HTML from the dataset view in Apify Console. For integration with ESG platforms, BI tools, or custom applications, use the Apify API to programmatically fetch the dataset. The API provides endpoints to retrieve data in JSON format for seamless integration with Tableau, Power BI, or custom ESG rating systems.

Explore other data collection actors by lanky_quantifier for comprehensive web scraping solutions:

  • Reddit Thread Scraper - Extract Reddit discussions, sentiment, and community insights for brand monitoring and market research
  • Google Maps Scraper - Collect business listings, reviews, and location data from Google Maps for competitive intelligence
  • Contact Info Scraper - Automatically extract email addresses, phone numbers, and social media profiles from websites for lead generation
  • Amazon Product Scraper - Scrape Amazon product listings, prices, reviews, and seller information for e-commerce market analysis
  • LinkedIn Company Scraper - Extract company profiles, employee counts, and industry data from LinkedIn for B2B prospecting

Built by lanky_quantifier | Automating ESG data collection for investors and sustainability teams worldwide.