Structured Business Data Extractor avatar
Structured Business Data Extractor

Pricing

from $0.01 / 1,000 results

Go to Apify Store
Structured Business Data Extractor

Structured Business Data Extractor

Extracts structured business information from company websites for research and data enrichment. Converts public website content into clean, machine-readable data such as company name, contact details, and metadata. Intended for research, CRM enrichment, and company profiling. No outreach.

Pricing

from $0.01 / 1,000 results

Rating

0.0

(0)

Developer

Leoncio Jr Coronado

Leoncio Jr Coronado

Maintained by Community

Actor stats

0

Bookmarked

4

Total users

2

Monthly active users

18 days ago

Last modified

Share

📊 Structured Business Data Extractor

Extract business contact information (emails & phone numbers) from public websites using a fast, reliable Python actor built on Apify + Playwright.

This actor is designed for lead generation, business research, and data enrichment, with a guarantee that the output dataset is never empty (auto-test safe).

🚀 What this Actor Does

Given one or more website URLs, the actor:

Extracts email addresses

Extracts phone numbers

Detects a company / organization name

Outputs clean, structured data to an Apify dataset

Works on public websites only (no login required)

If no contact data is found, the actor still outputs a fallback record to ensure reliability.

✅ Key Features

🟢 Python-based (stable & maintainable)

🟢 Playwright-ready (handles modern websites)

🟢 No login required

🟢 No social media scraping

🟢 Dataset is never empty

🟢 Safe for Apify Store auto-tests

✅ Key Features

🟢 Python-based (stable & maintainable)

🟢 Playwright-ready (handles modern websites)

🟢 No login required

🟢 No social media scraping

🟢 Dataset is never empty

🟢 Safe for Apify Store auto-tests

📥 Input Required

Start URLs List of public website URLs to scan.

Optional

Use Playwright – Enable browser rendering (default: false)

Max pages – Maximum number of pages to process (default: 1)

Example Input { "start_urls": [ { "url": "https://www.iana.org/contact" } ], "use_playwright": false, "max_pages": 1 }

{ "start_urls": [ { "url": "https://www.iana.org/contact" } ], "use_playwright": false, "max_pages": 1 }

📤 Output

The actor outputs data to the default dataset with the following structure: { "url": "https://www.iana.org/contact", "company_name": "Contact Us", "email": "iana@iana.org", "phone": "+1-424-254-5300", "source_page": "https://www.iana.org/contact", "extracted_at": "2025-12-16T11:55:46.267170+00:00" }

Output Fields

Field Description url Website that was processed company_name Company or page title email Extracted email address (if found) phone Extracted phone number (if found) source_page Page where the data was found extracted_at UTC timestamp

🧠 How It Works (Simple)

Loads the website

Scans visible page content

Extracts emails and phone numbers using robust patterns

Saves structured results to a dataset

Ensures at least one dataset item exists

⚠️ Limitations & Notes

This actor does NOT scrape social media platforms

Only works on publicly accessible websites

No CAPTCHA bypassing

Accuracy depends on how contact data is displayed on the site

🛡️ Legal & Ethical Use

You are responsible for complying with:

Website terms of service

Local data privacy laws (e.g. GDPR)

Ethical data usage practices

Use this actor only for legitimate business purposes.

👤 Author

Leonmcio Coronado Jr. Python Web Scraping & Data Automation Specialist Apify Developer

FINAL VERSION (recommended) ⭐ Recommended Use Cases

B2B lead generation

CRM enrichment

Market research

Business directory building

Contact data validation

🔧 Customization

If you need additional features such as multi-page crawling, CSV export, or custom filtering, you can fork or extend this actor to fit your workflow.

✅ Status

Production-ready · Store-safe · Auto-test compliant