Apify Scrappey avatar
Apify Scrappey

Pricing

Pay per usage

Go to Store
Apify Scrappey

Apify Scrappey

dormic/apify-scrappey

Developed by

Pim

Maintained by Community

A template for scraping data from web pages using the Scrappey.com API service integrated with an Apify Actor. This actor provides a robust solution for handling complex web scraping scenarios, including sites with anti-bot protection such as Cloudflare, Datadome, PerimeterX and all other forms.

5.0 (1)

Pricing

Pay per usage

0

Monthly users

2

Last modified

2 days ago

Apify Scrappey Actor

A powerful web scraping solution that combines Apify's actor infrastructure with Scrappey's advanced anti-detection capabilities. This actor helps you scrape any website while bypassing common anti-bot protections like Cloudflare, Datadome, and PerimeterX.

Scrappey API Integration

🚀 Key Features

  • Advanced Protection Bypass - Handles Cloudflare, Datadome, PerimeterX, and other anti-bot systems
  • Session Management - Maintains persistent browser sessions for efficient scraping
  • Smart Proxy Rotation - Automatic proxy management with country-specific options
  • Browser Fingerprint Randomization - Prevents detection through browser fingerprinting
  • Comprehensive Data Extraction - Captures HTML, cookies, headers, and more
  • Error Handling - Robust error handling with detailed error codes and messages

📋 Input Options

1{
2    "scrappeyApiKey": "your-api-key",
3    "url": "https://example.com",
4    "requestType": "browser",  // "browser" or "request"
5    "customHeaders": {},       // Custom HTTP headers
6    "browserActions": [],      // Automated browser actions
7    "session": null,          // Session ID for persistent browsing
8    "proxyCountry": null,     // Specific country for proxy
9    "cookiejar": null,        // Pre-set cookies
10    "includeImages": false,   // Include image URLs in response
11    "includeLinks": false     // Include link URLs in response
12}

📦 Output Data Structure

The actor stores the following data in the Apify dataset:

1{
2    "url": "scraped-url",
3    "verified": true/false,           // Request verification status
4    "cookieString": "cookie-string",  // Formatted cookie string
5    "responseHeaders": {},            // Response HTTP headers
6    "requestHeaders": {},             // Request HTTP headers
7    "html": "page-html",             // Raw HTML content
8    "innerText": "page-text",        // Page text content
9    "cookies": [],                    // Array of cookies
10    "ipInfo": {},                    // IP information
11    "status": 200,                   // HTTP status code
12    "timeElapsed": "1.2s",          // Request duration
13    "session": "session-id",         // Session identifier
14    "localStorage": {},              // Browser localStorage data
15    "timestamp": "ISO-date"         // Timestamp of scrape
16}

🛠️ Common Use Cases

  1. E-commerce Scraping

    • Product details from protected stores
    • Price monitoring
    • Inventory tracking
  2. Login-Protected Content

    • Session management for authenticated scraping
    • Cookie handling for maintaining login state
  3. Anti-Bot Protected Sites

    • Cloudflare challenge bypass
    • Datadome protection handling
    • PerimeterX mitigation

💡 Usage Examples

Basic Scraping

1{
2    "scrappeyApiKey": "your-api-key",
3    "url": "https://example.com",
4    "requestType": "browser"
5}

Session-Based Scraping

1{
2    "scrappeyApiKey": "your-api-key",
3    "url": "https://example.com",
4    "requestType": "browser",
5    "session": "my-session-id",
6    "cookiejar": [
7        {
8            "name": "sessionId",
9            "value": "abc123",
10            "domain": "example.com",
11            "path": "/"
12        }
13    ]
14}

Geo-Targeted Scraping

1{
2    "scrappeyApiKey": "your-api-key",
3    "url": "https://example.com",
4    "proxyCountry": "UnitedStates",
5    "includeImages": true,
6    "includeLinks": true
7}

⚠️ Error Handling

The actor handles common error scenarios:

CodeDescriptionSolution
CODE-0001Server overloadRetry with backoff
CODE-0002Cloudflare blockedTry different proxy
CODE-0010Datadome blockedChange proxy country
CODE-0029Too many sessionsWait for session cleanup

🚦 Best Practices

  1. Session Management

    • Use persistent sessions for related requests
    • Clean up sessions when done using sessions.destroy
  2. Proxy Usage

    • Rotate proxies for high-volume scraping
    • Use country-specific proxies for geo-restricted content
  3. Error Handling

    • Implement exponential backoff for retries
    • Monitor error rates by URL

📚 Getting Started

  1. Setup

    1git clone https://github.com/yourusername/apify-scrappey
    2cd apify-scrappey
    3npm install
  2. Configuration

    • Get your Scrappey API key from scrappey.com
    • Set up your input.json in the Apify console or locally
  3. Running Locally

    apify run
  4. Deployment

    1apify login
    2apify push

🔗 Resources

🆘 Support

📄 License

ISC License - Feel free to use this actor for your scraping needs!

Pricing

Pricing model

Pay per usage

This Actor is paid per platform usage. The Actor is free to use, and you only pay for the Apify platform usage.