Structured Business Data Extractor
Pricing
from $0.01 / 1,000 results
Structured Business Data Extractor
Extracts structured business information from company websites for research and data enrichment. Converts public website content into clean, machine-readable data such as company name, contact details, and metadata. Intended for research, CRM enrichment, and company profiling. No outreach.
Pricing
from $0.01 / 1,000 results
Rating
0.0
(0)
Developer

Leoncio Jr Coronado
Actor stats
0
Bookmarked
4
Total users
2
Monthly active users
18 days ago
Last modified
Categories
Share
📊 Structured Business Data Extractor
Extract business contact information (emails & phone numbers) from public websites using a fast, reliable Python actor built on Apify + Playwright.
This actor is designed for lead generation, business research, and data enrichment, with a guarantee that the output dataset is never empty (auto-test safe).
🚀 What this Actor Does
Given one or more website URLs, the actor:
Extracts email addresses
Extracts phone numbers
Detects a company / organization name
Outputs clean, structured data to an Apify dataset
Works on public websites only (no login required)
If no contact data is found, the actor still outputs a fallback record to ensure reliability.
✅ Key Features
🟢 Python-based (stable & maintainable)
🟢 Playwright-ready (handles modern websites)
🟢 No login required
🟢 No social media scraping
🟢 Dataset is never empty
🟢 Safe for Apify Store auto-tests
✅ Key Features
🟢 Python-based (stable & maintainable)
🟢 Playwright-ready (handles modern websites)
🟢 No login required
🟢 No social media scraping
🟢 Dataset is never empty
🟢 Safe for Apify Store auto-tests
📥 Input Required
Start URLs List of public website URLs to scan.
Optional
Use Playwright – Enable browser rendering (default: false)
Max pages – Maximum number of pages to process (default: 1)
Example Input { "start_urls": [ { "url": "https://www.iana.org/contact" } ], "use_playwright": false, "max_pages": 1 }
{ "start_urls": [ { "url": "https://www.iana.org/contact" } ], "use_playwright": false, "max_pages": 1 }
📤 Output
The actor outputs data to the default dataset with the following structure: { "url": "https://www.iana.org/contact", "company_name": "Contact Us", "email": "iana@iana.org", "phone": "+1-424-254-5300", "source_page": "https://www.iana.org/contact", "extracted_at": "2025-12-16T11:55:46.267170+00:00" }
Output Fields
Field Description url Website that was processed company_name Company or page title email Extracted email address (if found) phone Extracted phone number (if found) source_page Page where the data was found extracted_at UTC timestamp
🧠 How It Works (Simple)
Loads the website
Scans visible page content
Extracts emails and phone numbers using robust patterns
Saves structured results to a dataset
Ensures at least one dataset item exists
⚠️ Limitations & Notes
This actor does NOT scrape social media platforms
Only works on publicly accessible websites
No CAPTCHA bypassing
Accuracy depends on how contact data is displayed on the site
🛡️ Legal & Ethical Use
You are responsible for complying with:
Website terms of service
Local data privacy laws (e.g. GDPR)
Ethical data usage practices
Use this actor only for legitimate business purposes.
👤 Author
Leonmcio Coronado Jr. Python Web Scraping & Data Automation Specialist Apify Developer
FINAL VERSION (recommended) ⭐ Recommended Use Cases
B2B lead generation
CRM enrichment
Market research
Business directory building
Contact data validation
🔧 Customization
If you need additional features such as multi-page crawling, CSV export, or custom filtering, you can fork or extend this actor to fit your workflow.
✅ Status
Production-ready · Store-safe · Auto-test compliant