# Company ESG & Sustainability Data Extractor (`technicaldost/company-esg-sustainability-extractor`) Actor

Extract ESG and sustainability metrics, carbon commitments, and net-zero targets from public company sustainability pages. Structured JSON output for finance, research, and procurement teams.

- **URL**: https://apify.com/technicaldost/company-esg-sustainability-extractor.md
- **Developed by:** [Technical Dost Solutions](https://apify.com/technicaldost) (community)
- **Categories:** Business
- **Stats:** 2 total users, 1 monthly users, 100.0% runs succeeded, 0 bookmarks
- **User rating**: No ratings yet

## Pricing

from $10.00 / 1,000 esg extractions

This Actor is paid per event. You are not charged for the Apify platform usage, but only a fixed price for specific events.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-event

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

## Company ESG & Sustainability Data Extractor

### What this Actor does

Extract ESG and sustainability metrics, carbon commitments, and net-zero targets from public company sustainability and ESG report web pages that you supply.

It processes user-provided public URLs, reads schema.org `Organization` JSON-LD for the company name, scans visible page text for ESG keywords grouped by metric category (carbon, energy, water, waste, diversity, governance), pairs those keywords with nearby numeric values and units, and optionally captures net-zero and reduction-target commitment sentences. It normalizes useful fields, deduplicates rows, and saves structured records to the Apify dataset.

### Why this Actor is useful

Sustainability analysts, investors, and procurement teams pay for this kind of extraction because it converts unstructured ESG narrative reports into clean, comparable datasets. It saves manual reading, creates repeatable monitoring, feeds spreadsheets, dashboards, or scoring models, and turns public ESG pages into API-ready data.

### Who this is for

- ESG and sustainability analysts
- Investment and ESG research teams
- Corporate sustainability and procurement teams
- Data providers and ESG rating builders
- Journalists and NGOs tracking corporate climate claims
- B2B teams enriching company sustainability profiles

### Common use cases

- Build comparable ESG metric datasets across many companies
- Track net-zero and carbon-neutral commitments and target years
- Monitor reported Scope 1/2/3 emissions over time
- Enrich company profiles with sustainability data points
- Feed ESG scoring or screening models

### Input

- `startUrls`: Public URLs to extract from. Use only pages you are allowed to access without login or bypassing access controls.
- `keywords`: Optional additional ESG or sustainability terms to match on top of the built-in keyword library.
- `includeCommitments`: Capture net-zero, carbon-neutral, and reduction-target sentences as commitment rows with an extracted target year.
- `maxItems`: Maximum number of rows to save.
- `maxConcurrency`: Number of pages processed in parallel. The default is intentionally conservative.
- `requestTimeoutSecs`: Maximum time to spend on a single page.
- `proxyConfiguration`: Optional Apify proxy configuration where permitted by your source review.

### Output

- `companyName`: Company name when exposed in `Organization` structured data.
- `sourceUrl`: URL where the data was extracted.
- `metricCategory`: Category such as carbon, energy, water, waste, diversity, governance, commitment, or other.
- `metricName`: The matched metric label (for example, Scope 1 emissions).
- `metricValue`: The numeric value found near the metric keyword.
- `unit`: Detected unit such as `%`, `tCO2e`, `MWh`, or similar.
- `reportingYear`: Reporting year detected in the same sentence when available.
- `targetYear`: Target year detected for commitment rows.
- `commitmentText`: The captured net-zero or reduction-target sentence.
- `framework`: Reporting frameworks referenced on the page (GRI, SASB, TCFD, CDP, SDG).
- `extractedAt`: Timestamp when this Actor extracted the row.
- `extractionMethod`: `structured_data`, `text_extraction`, or `commitment_text`.
- `confidenceScore`: Heuristic confidence score (structured 0.9, text-derived 0.6-0.8).
- `missingFields`: Required fields (`companyName`, `metricName`, `metricValue`, `reportingYear`) not available from the source page.

### Sample input

```json
{
  "startUrls": [
    {
      "url": "https://example.com/"
    }
  ],
  "keywords": [],
  "includeCommitments": true,
  "maxItems": 50,
  "maxConcurrency": 3,
  "requestTimeoutSecs": 30
}
````

### Sample output

```json
{
  "companyName": "Example Manufacturing Group",
  "sourceUrl": "https://example.com/",
  "metricCategory": "carbon",
  "metricName": "Scope 1 emissions",
  "metricValue": 125000,
  "unit": "tCO2e",
  "reportingYear": 2024,
  "targetYear": null,
  "commitmentText": null,
  "framework": "GRI",
  "extractedAt": "2026-06-12T00:00:00.000Z",
  "extractionMethod": "structured_data",
  "confidenceScore": 0.9,
  "missingFields": []
}
```

### How to use

Run this Actor on Apify with public URLs, export the dataset as JSON, CSV, Excel, or through the Apify API, then connect the output to Google Sheets, Make, Zapier, a webhook, your CRM, or an internal dashboard. For monitoring, save the input as an Apify task and schedule recurring runs.

### Pricing

This Actor uses a pay-per-event model: $0.01 per extraction. You pay only for the structured rows the Actor produces, which keeps costs predictable and tied directly to delivered data.

### Best practices

- Start with a small set of reviewed public ESG and sustainability report URLs.
- Prefer the main sustainability or ESG data pages rather than PDF download links.
- Add domain-specific terms via `keywords` when a company uses non-standard metric names.
- Keep `includeCommitments` enabled to capture net-zero and target language.
- Keep `maxConcurrency` low for smaller websites.
- Review source website rules before scheduling recurring runs.
- Treat text-derived values as candidates for human review before downstream scoring.

### Compliance and responsible use

This Actor is for public data only. It must not be used to bypass logins, paywalls, CAPTCHAs, or security systems, collect private data, gather sensitive personal data, or support spam or abuse. You are responsible for following applicable laws and source website rules.

### Limitations

- Output quality depends on the public ESG content available on the source pages.
- Text-derived extraction is heuristic. Numeric values and units are matched near keywords and may need human verification before use in scoring.
- The Actor reads HTML pages and does not parse PDF reports.
- Some fields may be empty when the source does not publish them, and they are reported in `missingFields` rather than inferred.
- The Actor does not claim support for any specific third-party ESG platform.
- Website markup and access policies can change.

### Troubleshooting

- Empty output usually means the page has no recognizable ESG keywords paired with numeric values.
- Invalid URL errors mean one or more input URLs are malformed.
- Slow runs can usually be improved by lowering `maxConcurrency`.
- Missing fields are source-data limitations, not inferred values.

### Changelog

- v0.2.0: Production-readiness pass with improved positioning, samples, schema descriptions, and responsible-use notes.
- v0.1.0: Initial dry-run factory generated MVP.

# Actor input Schema

## `startUrls` (type: `array`):

Public company sustainability or ESG report pages to extract from. Use only URLs you are allowed to access without login, paywall bypass, CAPTCHA bypass, or security circumvention.

## `keywords` (type: `array`):

Optional additional ESG or sustainability terms to match in visible page text, on top of the built-in keyword library.

## `includeCommitments` (type: `boolean`):

Capture net-zero, carbon-neutral, and reduction-target sentences as commitment rows with an extracted target year.

## `maxItems` (type: `integer`):

Maximum number of normalized dataset rows to save. The Actor stops pushing new rows when this limit is reached.

## `maxConcurrency` (type: `integer`):

How many pages to process at once. Lower this for small websites or cautious monitoring.

## `requestTimeoutSecs` (type: `integer`):

Maximum time to spend processing a single page before it is treated as failed.

## `proxyConfiguration` (type: `object`):

Optional Apify proxy configuration. Use only where permitted by the source website and your compliance process.

## Actor input object example

```json
{
  "startUrls": [
    {
      "url": "https://example.com/"
    }
  ],
  "keywords": [],
  "includeCommitments": true,
  "maxItems": 50,
  "maxConcurrency": 3,
  "requestTimeoutSecs": 30
}
```

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {
    "startUrls": [
        {
            "url": "https://example.com/"
        }
    ]
};

// Run the Actor and wait for it to finish
const run = await client.actor("technicaldost/company-esg-sustainability-extractor").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = { "startUrls": [{ "url": "https://example.com/" }] }

# Run the Actor and wait for it to finish
run = client.actor("technicaldost/company-esg-sustainability-extractor").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{
  "startUrls": [
    {
      "url": "https://example.com/"
    }
  ]
}' |
apify call technicaldost/company-esg-sustainability-extractor --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=technicaldost/company-esg-sustainability-extractor",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "Company ESG & Sustainability Data Extractor",
        "description": "Extract ESG and sustainability metrics, carbon commitments, and net-zero targets from public company sustainability pages. Structured JSON output for finance, research, and procurement teams.",
        "version": "0.1",
        "x-build-id": "yst649VfghGD596hr"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/technicaldost~company-esg-sustainability-extractor/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-technicaldost-company-esg-sustainability-extractor",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/technicaldost~company-esg-sustainability-extractor/runs": {
            "post": {
                "operationId": "runs-sync-technicaldost-company-esg-sustainability-extractor",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/technicaldost~company-esg-sustainability-extractor/run-sync": {
            "post": {
                "operationId": "run-sync-technicaldost-company-esg-sustainability-extractor",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "required": [
                    "startUrls"
                ],
                "properties": {
                    "startUrls": {
                        "title": "Public page URLs",
                        "type": "array",
                        "description": "Public company sustainability or ESG report pages to extract from. Use only URLs you are allowed to access without login, paywall bypass, CAPTCHA bypass, or security circumvention.",
                        "items": {
                            "type": "object",
                            "required": [
                                "url"
                            ],
                            "properties": {
                                "url": {
                                    "type": "string",
                                    "title": "URL of a web page",
                                    "format": "uri"
                                }
                            }
                        }
                    },
                    "keywords": {
                        "title": "Extra ESG keywords",
                        "type": "array",
                        "description": "Optional additional ESG or sustainability terms to match in visible page text, on top of the built-in keyword library.",
                        "default": [],
                        "items": {
                            "type": "string"
                        }
                    },
                    "includeCommitments": {
                        "title": "Capture commitments and targets",
                        "type": "boolean",
                        "description": "Capture net-zero, carbon-neutral, and reduction-target sentences as commitment rows with an extracted target year.",
                        "default": true
                    },
                    "maxItems": {
                        "title": "Maximum results",
                        "minimum": 1,
                        "maximum": 10000,
                        "type": "integer",
                        "description": "Maximum number of normalized dataset rows to save. The Actor stops pushing new rows when this limit is reached.",
                        "default": 50
                    },
                    "maxConcurrency": {
                        "title": "Maximum concurrency",
                        "minimum": 1,
                        "maximum": 20,
                        "type": "integer",
                        "description": "How many pages to process at once. Lower this for small websites or cautious monitoring.",
                        "default": 3
                    },
                    "requestTimeoutSecs": {
                        "title": "Request timeout seconds",
                        "minimum": 5,
                        "maximum": 180,
                        "type": "integer",
                        "description": "Maximum time to spend processing a single page before it is treated as failed.",
                        "default": 30
                    },
                    "proxyConfiguration": {
                        "title": "Proxy configuration",
                        "type": "object",
                        "description": "Optional Apify proxy configuration. Use only where permitted by the source website and your compliance process."
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
