Universal Web Scraper — Custom Data Extraction Starter Template avatar

Universal Web Scraper — Custom Data Extraction Starter Template

Pricing

Pay per usage

Go to Apify Store
Universal Web Scraper — Custom Data Extraction Starter Template

Universal Web Scraper — Custom Data Extraction Starter Template

Customizable web scraping template with proxy support, rate limiting, and error handling. Fork it and make it yours.

Pricing

Pay per usage

Rating

0.0

(0)

Developer

Creator Fusion

Creator Fusion

Maintained by Community

Actor stats

0

Bookmarked

1

Total users

0

Monthly active users

7 hours ago

Last modified

Share

My Actor

Universal web scraper template. Customizable extraction, proxy support, rate limiting, and error handling. Fork and customize for any site.

Don't want to build scraping from scratch? This is a fully-featured template with everything production scraping needs: proxy rotation, smart retry logic, rate limiting, JavaScript rendering, structured output, and error handling. Fork it, customize the CSS selectors, set your target URL, and deploy. Works for e-commerce, real estate, job boards, directories—any site with structured data.

⚡ What You Get

Universal Web Scraping Template
├── Features Included (Production-Ready)
│ ├── Configuration
│ │ ├── Target URL Input: Customizable ✓
│ │ ├── Proxy Support: Enabled ✓
│ │ ├── Rate Limiting: Configurable ✓
│ │ ├── Timeout Handling: Included ✓
│ │ └── Retry Logic: Intelligent ✓
│ ├── Extraction
│ │ ├── CSS Selectors: Fully customizable
│ │ ├── XPath Support: Yes
│ │ ├── Regex Patterns: Yes
│ │ ├── JavaScript Rendering: Full Chromium
│ │ └── Dynamic Content: Handled
│ ├── Data Processing
│ │ ├── Text Cleaning: Automatic
│ │ ├── Data Normalization: Included
│ │ ├── Type Conversion: Automatic
│ │ ├── Duplicate Removal: Optional
│ │ └── JSON Validation: Built-in
│ └── Error Handling
│ ├── Network Failures: Auto-retry with backoff
│ ├── Missing Elements: Graceful degradation
│ ├── Timeout Recovery: Configurable
│ ├── Proxy Rotation on Block: Automatic
│ └── Error Logging: Detailed
├── Example Use Cases 👈 All possible with minimal customization
│ ├── E-commerce: Product titles, prices, availability
│ ├── Real Estate: Listings, prices, property details
│ ├── Job Boards: Job titles, companies, salaries
│ ├── Directories: Names, contacts, addresses
│ ├── News: Articles, dates, authors
│ ├── Reviews: Ratings, review text, reviewer names
│ └── Custom Sites: Any structured HTML data
├── Customization Process (30 minutes)
│ ├── Step 1: Set target URL
│ ├── Step 2: Inspect element, grab CSS selectors
│ ├── Step 3: Update selector variables in code
│ ├── Step 4: Test extraction locally
│ ├── Step 5: Deploy and run
│ └── Result: Fully working scraper
├── Output Format
│ ├── Format: Clean JSON
│ ├── Schema: Validates automatically
│ ├── Fields: Customizable
│ ├── Pagination: Auto-handled
│ └── Ready for: Databases, APIs, downstream processing
└── Built-in Defaults
├── Proxy Type: Residential (included)
├── Rate Limit: 1 req/2s (respectful)
├── Timeout: 30s per request
├── Retries: 3 with exponential backoff
├── JavaScript Rendering: Enabled by default
└── Output: Validated JSON

🎯 Use Cases

  • E-commerce Data: Scrape product catalogs, prices, availability. Feed into price comparison apps or inventory tracking.
  • Real Estate: Extract listings, prices, property features. Build your own MLS alternative.
  • Job Boards: Scrape job listings your favorite boards don't have APIs for. Aggregate and resell.
  • Business Directories: Compile lists of companies, contacts, information from scattered sources.
  • Market Research: Gather pricing, features, reviews from competitor sites systematically.
  • Lead Generation: Scrape B2B directories, compile prospect lists, extract contact info.
  • Content Aggregation: Pull articles, news, research from multiple sources into one feed.

📊 Sample Output

{
"actor_configuration": {
"target_url": "https://example-ecommerce.com/products",
"proxy_enabled": true,
"proxy_type": "residential",
"rate_limit_requests_per_second": 0.5,
"timeout_seconds": 30,
"max_retries": 3,
"javascript_rendering": true
},
"extraction_config": {
"selectors": {
"product_container": ".product-item",
"product_title": "h2.product-name",
"product_price": ".price-current",
"product_rating": ".rating-value",
"product_url": "a.product-link"
},
"pagination": {
"next_page_selector": "a.next-page",
"max_pages": 10
}
},
"scrape_results": {
"total_items_scraped": 487,
"successful_requests": 487,
"failed_requests": 0,
"execution_time_seconds": 284,
"items_per_minute": 103
},
"extracted_data": [
{
"title": "Wireless Headphones Pro",
"price": 199.99,
"currency": "USD",
"rating": 4.5,
"reviews_count": 234,
"in_stock": true,
"url": "https://example-ecommerce.com/products/headphones-pro"
},
{
"title": "USB-C Charging Cable",
"price": 24.99,
"currency": "USD",
"rating": 4.7,
"reviews_count": 1245,
"in_stock": true,
"url": "https://example-ecommerce.com/products/usb-c-cable"
}
],
"data_quality": {
"validation_passed": true,
"missing_fields_count": 0,
"type_errors": 0,
"duplicate_count": 0,
"data_integrity": "excellent"
},
"performance": {
"average_request_time_ms": 587,
"average_extraction_time_ms": 245,
"proxy_rotation_count": 12,
"blocks_encountered": 0,
"ban_risk": "very_low"
},
"next_steps": [
"Customize selectors for your target site",
"Test extraction on live site",
"Configure proxy and rate limiting",
"Deploy and schedule regular runs"
]
}

Field Descriptions:

  • selectors: CSS selectors for each data field you want to extract
  • pagination: Configuration for multi-page scraping
  • extracted_data: Array of clean, structured objects
  • data_quality: Validation results (missing fields, errors)
  • performance: Speed metrics and proxy rotation statistics

🔗 Integrations & Automation

Email Results: Daily email with scraped data, summaries, and status.

Webhook to API: Push results directly to your backend database.

Schedule Recurring: Run daily, weekly, or monthly. Always-fresh data.

Slack Updates: Get notifications when scrapes complete or hit errors.

REST API: Trigger custom scraping jobs on-demand.

MCP Compatible: AI agents can run custom scraping tasks.

See integration docs →

🔌 Works Great With

💰 Cost & Performance

Typical run: Scrape 500 items from one site in 5 minutes for ~$1.80 (includes proxies).

That's $0.0036 per item — cheaper than one person copy-pasting data for 10 minutes.

Compare to manual: One person manually scraping 500 items = 3+ hours. At $25/hour, that's $75+. We do it for $1.80. Plus our data is always fresh if you schedule daily.

🛡️ Built Right

  • Proxy rotation prevents IP bans on any site
  • Smart retries with exponential backoff
  • JavaScript rendering for dynamic content
  • Rate limiting respects target server
  • Error handling doesn't fail on missing fields
  • Data validation ensures clean output
  • Duplicate detection optional, configurable
  • Timeout protection prevents hanging requests

Getting Started

  1. Fork this actor to your Apify account
  2. Update CSS selectors to match your target site
  3. Test locally (we provide test scripts)
  4. Configure proxy (residential proxies included)
  5. Set rate limit (default: 1 req/2s, respectful)
  6. Deploy and run your first scrape
  7. Schedule recurring if you need fresh data daily

Fresh data. Zero guesswork. Be the first to know.

📧 Email alerts · 🔗 Webhook triggers · 🤖 MCP compatible · 📡 API access

Built by Creator Fusion — OSINT tools that actually work.