LeadScraper avatar
LeadScraper

Pricing

$2.30 / 1,000 runs

Go to Store
LeadScraper

LeadScraper

Developed by

Claire Dubiel

Claire Dubiel

Maintained by Community

Scrape a list of urls and receive business contact information, social media links, and a description of the services. This actor will scrape across multiple pages in the sitemap and returns a confidence score to every phone number and email that it finds. webscraper, scrape leads, web scraper

0.0 (0)

Pricing

$2.30 / 1,000 runs

2

Total users

19

Monthly users

8

Runs succeeded

>99%

Last modified

2 months ago

Service Company Website Scraper

An Apify actor that scrapes service company websites and extracts structured information about the business, including contact information, services offered, hours of operation, and more.

Features

  • Extracts company name, description, and contact information
  • Identifies services offered by the company
  • Extracts business hours, social media links, and reviews
  • Finds pricing information and FAQs
  • Handles multiple URLs in a single run
  • Supports SSL verification options
  • Optional Cloudflare bypass capability

Input

The actor accepts the following input parameters:

  • urls - An array of service company website URLs to scrape (required)
  • verifySSL - Whether to verify SSL certificates (default: true)
  • bypassCloudflare - Whether to attempt to bypass Cloudflare protection (default: true)
  • metadata - Optional custom metadata to include with each result

Example input:

{
"urls": [
"https://www.example1.com/",
"https://www.example2.com/"
],
"verifySSL": true,
"bypassCloudflare": true,
"metadata": {
"project_id": "example-project",
"source": "manual",
"category": "roofing"
}
}

Output

The actor outputs a JSON object for each URL containing the following information:

  • url - The URL of the scraped website
  • title - The title of the website
  • meta_description - The meta description of the website
  • main_content - The main content of the website
  • contact_information - Contact information extracted from the website
    • phones - List of phone numbers with confidence scores
    • main_phone - The main phone number with highest confidence
    • emails - List of email addresses with confidence scores
    • main_email - The main email address with highest confidence
    • address - The physical address of the business
  • services - List of services offered by the company
  • hours_of_operation - Business hours by day of the week
  • social_media_links - Links to social media profiles
  • reviews - Customer reviews found on the website
  • pricing - Pricing information for services
  • faqs - Frequently asked questions
  • success - Whether the scraping was successful
  • error - Error message if scraping failed

Example Usage

const Apify = require('apify');
Apify.main(async () => {
const input = {
urls: [
"https://www.example1.com/",
"https://www.example2.com/"
],
verifySSL: true,
bypassCloudflare: true,
metadata: {
project_id: "example-project",
source: "manual",
category: "roofing"
}
};
// Run the actor and wait for it to finish
const run = await Apify.call('your-username/service-company-scraper', input);
// Print the results
const dataset = await Apify.openDataset(run.defaultDatasetId);
const { items } = await dataset.getData();
console.log('Results:', items);
});
## Development
### Project Structure
- `main.py` - Entry point for the Apify actor
- `scraper.py` - Contains the `ServiceCompanyScraper` class
- `requirements.txt` - Python dependencies
- `INPUT_SCHEMA.json` - Input schema for the Apify actor
- `OUTPUT_SCHEMA.json` - Output schema for the Apify actor
- `Dockerfile` - Docker configuration for the Apify actor
### Adding New Features
To add new extraction capabilities:
1. Add a new method to the `ServiceCompanyScraper` class in `scraper.py`
2. Call the method from the `scrape` method
3. Update the output schema if necessary