πESG Scraper: Sustainability Reports & PDF Disclosures
Pricing
from $2.00 / 1,000 results
πESG Scraper: Sustainability Reports & PDF Disclosures
Powerful ESG scraper (Environmental, Social, and Governance) to automatically extract sustainability reports, PDF disclosures, articles, and content from any website. Get clean, AI-ready datasets with keyword filtering, metadata extraction, images, links, and full PDF support.
Pricing
from $2.00 / 1,000 results
Rating
0.0
(0)
Developer

PrimeParse
Actor stats
0
Bookmarked
2
Total users
0
Monthly active users
20 hours ago
Last modified
Categories
Share
π± ESG Scraper: Sustainability Reports, Articles & PDF Disclosures Extractor
Enterprise-grade ESG web scraper that automatically extracts sustainability articles, corporate reports, climate news, and PDF disclosures β clean, structured, and ready for investors, compliance teams, or AI training.
High-quality ESG & Sustainability Web Scraper for Investors, Analysts, and AI Teams
Automatically collects ESG articles, sustainability reports, corporate disclosures, climate news, and PDF reports from any website β clean, structured, ready for analysis or AI.
Built for:
- Sustainable investors & analysts
- Compliance and risk teams
- AI/ML engineers building ESG models
- Researchers and NGOs tracking climate & governance trends
β Smart ESG keyword filtering β Full clean article text extraction β PDF sustainability reports parsing β Rich metadata (date, author, description) β ESG-relevant images and related links β AI-ready dataset splitting (overview / full-text / images)
π Runs on Apify β’ No code required β’ Pay only for compute used
π Why This Scraper
β Purpose-Built for ESG Data Intelligently filters pages using custom ESG keywords (climate, emissions, governance, CSR, net zero, etc.).
β Excellent PDF Handling Full text extraction from sustainability and ESG reports (PDF) with metadata where available.
β Clean & Noise-Free Output Removes ads, navigation, scripts β only meaningful content remains.
β Rich Structured Data Title, publication date, author, description, ESG keywords, internal links, relevant images.
β AI & ML Ready Optional splitting into specialized datasets for RAG, LLM fine-tuning, or training.
β Fast & Efficient
Powered by Crawlee + Cheerio β excellent for static and content-heavy sites (news, corporate pages, PDFs). For heavily JavaScript-rendered sites, results may vary.
β Safe & Controlled Crawling Automatic domain restriction, depth limit (max 3 levels), request limits.
πΌ Use Cases
- ESG portfolio screening and risk monitoring
- Training ESG-focused LLMs or RAG systems
- Regulatory compliance and disclosure tracking
- Competitive intelligence on corporate sustainability
- Academic research on climate and governance trends
π Supported Sources
- ESG news sections (Reuters, Bloomberg, FT, Guardian, etc.)
- Corporate sustainability / ESG pages
- Annual sustainability reports (PDF)
- Climate, emissions, governance disclosures
βοΈ How It Works
- Provide start URLs (news sections, corporate pages, PDF links)
- Set custom ESG keywords and limits
- Run the Actor
- Download clean, structured ESG datasets
π§© Input Configuration
Example JSON Input
{"startUrls": [{ "url": "https://www.reuters.com/sustainability/" },{ "url": "https://www.weforum.org/stories/technological-innovation/" }],"allowedDomains": ["reuters.com"],"useApifyProxy": false,"maxRequestsPerCrawl": 500,"esgKeywords": ["ESG","sustainability","climate","emissions","net zero","governance"],"extractContent": true,"extractMetadata": true,"followLinks": true,"useSeparateDatasets": true,"cleanDefaultDataset": true,"proxyUrls": [{"url": "http://user:pass@host:port"}]}
Key Options
startUrlsβ one or more starting pages or direct PDF links (required)allowedDomainsβ restrict crawling to specific domains. If empty, automatically limited to domains fromstartUrlsmaxRequestsPerCrawlβ control cost and crawl sizeesgKeywordsβ custom list for relevance filtering (default includes common ESG terms)extractContent/extractMetadataβ toggle full text or metadata extractionfollowLinksβ enable internal crawling (limited to depth 3 for safety)useSeparateDatasetsβ recommended for large runs and AI workflowscleanDefaultDatasetβ clear previous run data
π Output Datasets
When useSeparateDatasets: true (recommended):
esg-overview(primary) β lightweight metadata for fast analysisesg-full-contentβ long articles (>5000 characters)esg-imagesβ ESG-relevant images with context- Default dataset β minimal preview records (for Apify UI visibility)
When useSeparateDatasets: false
- Single dataset with full detailed records
Example Output Record (Full Mode)
{"url": "https://www.reuters.com/sustainability/example","title": "Companies strengthen climate commitments","scrapedAt": "2025-12-15T10:30:45Z","publishedDate": "2025-12-10","author": "Jane Doe","description": "Major firms enhance ESG targets...","content": "Full clean article text...\n\nParagraphs preserved...","esgKeywords": ["climate", "emissions", "sustainability"],"relatedLinks": [{"url": "https://www.reuters.com/sustainability/esg-guide","text": "ESG Explained"}],"images": [{"url": "https://reuters.com/chart-netzero.jpg","alt": "Net zero emissions progress"}]}
PDF Example
{"url": "https://company.com/sustainability-2024.pdf","title": "Annual Sustainability Report 2024","content": "Full extracted report text...","esgKeywords": ["sustainability", "carbon", "governance"],"type": "PDF","author": "Corporate Sustainability Team","publishedDate": "2024-03-15"}
π Getting Started
- Click βTry for freeβ on Apify
- Paste ESG/sustainability URLs or direct PDF links
- Customize keywords and limits
- Run and download your dataset
π§ Support
- Email: kidaxxb@gmail.com
- Response within 24 hours
- Issues: Use Apify Issues tab
Tags: ESG, sustainability, web scraping, PDF extraction, climate data, corporate governance, RAG, LLM training, sustainable investing, compliance monitoring
Built with β€οΈ on Apify