ScraperCodeGenerator

Pricing

Pay per usage

Try for free

Go to Apify Store

ScraperCodeGenerator

Try for free

An intelligent web scraping tool that automatically generates custom scraping code for any website.

Pricing

Pay per usage

Rating

0.0

(0)

Developer

Ondřej Hlava

Maintained by Community

Actor stats

Bookmarked

Total users

Monthly active users

5 months ago

Last modified

🧠 AI-Powered Web Scraper & Code Generator

Stop writing scraping code manually! This intelligent actor doesn't just scrape websites - it automatically generates custom Python scraping code tailored to your specific needs.

You get both the extracted data AND the code to replicate it anytime.

🚀 What This Actor Does

The actor will automatically:

Test multiple scraping methods: Runs multiple scraping strategies (Cheerio, Web Scraper, Website Content Crawler, Playwright, etc.) in parallel for faster results
Evaluate which works best using AI: Claude AI analyzes each result and selects the best extraction
Extract your requested data: Automatically structures the extracted data based on your requirements
🔥 Generate custom Python code that scrapes YOUR website: Creates personalized Python scraping code that you can run independently
Provide the code as a downloadable script you can run anywhere: Complete, ready-to-use BeautifulSoup script saved to key-value store

✨ Key Benefits

No Technical Knowledge Required: Just describe what data you want in plain English
Resilient Scraping: Multiple strategies ensure success even if one method fails
AI-Powered: Uses Claude AI to understand content context and select optimal results
🎯 Custom Code Generation: Get personalized Python code that scrapes YOUR specific website
Production Ready: Generated code is clean, documented, and ready to run independently
Reusable: Use the generated code in your own projects, scripts, or applications

📊 Output Data

The actor saves comprehensive results to your default dataset AND saves the generated script to the key-value store.

💡 How to Access: After the actor finishes, go to the "Key-value store" tab in your run details and download the GENERATED_SCRIPT file. Rename it to have the extension: .py.

🎯 What You Get

Extracted Data: The actual data from the website, structured according to your goal
🔥 Generated Python Code: Ready-to-use BeautifulSoup script that you can run on your own computer
💾 Separate Script File: The Python code is also saved as a downloadable file in the key-value store
Quality Scores: Performance ratings for each scraping method (0-10 scale)
Best Method: Which scraping approach worked best for your website

💡 Pro Tip: The generated Python code is completely standalone - you can copy it, modify it, and use it in your own projects without needing this actor again!

🎯 Usage Examples

E-commerce Product Scraping

{
    "targetUrl": "https://books.toscrape.com/",
    "userGoal": "Get me a list of all the books on the first page. For each book, I want its title, price, star rating, and whether it is in stock.",
    "claudeApiKey": "sk-ant-..."
}

News Website Scraping

{
    "targetUrl": "https://www.theverge.com/",
    "userGoal": "I want to scrape the main articles from The Verge homepage. For each article, get me the headline, the author's name, and the link to the full article.",
    "claudeApiKey": "sk-ant-..."
}

Job Listings Scraping

{
    "targetUrl": "https://www.python.org/jobs/",
    "userGoal": "List all the jobs posted. For each job, I want the job title, the company name, the location, and the date it was posted.",
    "claudeApiKey": "sk-ant-..."
}

Quote Collection

{
    "targetUrl": "https://quotes.toscrape.com/",
    "userGoal": "I want a list of all quotes on this page. For each one, get the quote text itself, the name of the author, and a list of the tags associated with it.",
    "claudeApiKey": "sk-ant-..."
}

Business Directory Scraping

{
    "targetUrl": "https://directory.com/restaurants",
    "userGoal": "Get restaurant names, addresses, phone numbers, and ratings",
    "claudeApiKey": "sk-ant-..."
}

🔧 How to Use

Enter Target URL: Paste the website URL you want to scrape
Describe Your Goal: Be specific about what data you need (e.g., "product names and prices" not just "products")
Add Claude API Key: Your Anthropic API key for AI analysis
Configure Advanced Settings (optional): Customize Claude model, HTML processing, and actor selection
Run the Actor: Click "Start" and watch the magic happen!

⚙️ Advanced Configuration

🤖 Claude Model Selection

Choose the AI model that best fits your needs:

Claude 4 Sonnet (Default): Latest and most capable model
Claude 4 Opus: Maximum quality for the most complex tasks
Claude 3.7 Sonnet: Enhanced capabilities over 3.5
Claude 3.5 Sonnet: Reliable and well-tested
Claude 3.5 Haiku: Fastest and most cost-effective
Claude 3 Sonnet: Good balance for most tasks
Claude 3 Haiku: Basic tasks with minimal cost

🔧 HTML Processing Settings

Fine-tune how HTML content is processed:

Enable HTML Pruning: Reduces processing time by removing unnecessary content
Max List Items: Controls how many items to keep in lists/tables (1-20)
Max Text Length: Maximum text length in any element (100-2000 chars)
Prune Percentage: How much content to keep (10%-100%)

🎯 Actor Selection

Choose which scraping methods to use:

Cheerio Scraper: Fast jQuery-like scraping (enabled by default)
Web Scraper: Versatile with JavaScript support (enabled by default)
Website Content Crawler: Advanced Playwright crawler (enabled by default)
Playwright Scraper: Modern browser automation (disabled by default)
Puppeteer Scraper: Chrome-based scraping (disabled by default)

💡 Pro Tip: Enable 2-3 actors for the best balance of speed and reliability. More actors = better chances of success but slower execution.

🚀 Performance Settings

Concurrent Actors: Run multiple actors simultaneously for faster results
Test Generated Script: Validate the generated code before saving

The actor will automatically:

Test multiple scraping methods
Evaluate which works best using AI
Extract your requested data
🔥 Generate custom Python code that scrapes YOUR website
Provide the code as a downloadable script you can run anywhere

Common Use Cases

Market Research: Track competitor pricing and products + get code to monitor them daily
Content Aggregation: Collect news articles or blog posts + get code to update your database
Lead Generation: Extract business contact information + get code to scrape new listings
Data Analysis: Gather data for research projects + get code to repeat the process
Price Monitoring: Track product prices over time + get code to check prices automatically

🔍 Troubleshooting

"No content found" errors

Try different goal descriptions
Some websites block automated scraping
Check if the URL is accessible

Poor quality scores

Be more specific in your goal description
The website might have complex structure
Try simpler pages first

🔑 Getting Your Claude API Key

Go to Anthropic Console
Sign up or log in
Navigate to API Keys section
Create a new API key
Copy and paste it into the "Claude API Key" field

Claude API errors

Verify your API key is correct
Check your Claude API usage limits
Ensure you have sufficient API credits

📋 Input Parameters

Parameter	Type	Required	Description
Target URL	String	Yes	The website URL you want to scrape
User Goal	String	Yes	Describe what data you want (e.g., "Extract all product names, prices, and ratings")
Claude API Key	String	Yes	Your Anthropic Claude API key (Get one here)
Test Generated Script	Boolean	No	Whether to test the generated script (default: true)
Claude Model	String	No	AI model to use (default: Claude 4 Sonnet)
Max Retries	Number	No	Maximum retry attempts (default: 3)
Timeout	Number	No	Timeout per attempt in seconds (default: 60)
HTML Pruning Enabled	Boolean	No	Enable HTML content processing (default: true)
HTML Max List Items	Number	No	Maximum items in lists to keep (1-20, default: 3)
HTML Max Text Length	Number	No	Maximum text length in elements (50-2000, default: 200)
HTML Prune Before Evaluation	Boolean	No	Apply pruning before AI evaluation (default: true)
HTML Prune Percentage	Number	No	Percentage of content to keep (0-100, default: 80)
Actors	Array	No	Detailed actor configurations with custom inputs
Concurrent Actors	Boolean	No	Run actors simultaneously (default: true)

Advanced Configuration Examples

Custom Claude Model

{
    "targetUrl": "https://example.com",
    "userGoal": "Extract product data",
    "claudeApiKey": "sk-ant-...",
    "claudeModel": "claude-sonnet-4-20250514"
}

Custom HTML Processing

{
    "targetUrl": "https://example.com",
    "userGoal": "Extract product data",
    "claudeApiKey": "sk-ant-...",
    "htmlPruningEnabled": true,
    "htmlMaxListItems": 10,
    "htmlMaxTextLength": 1000,
    "htmlPrunePercentage": 90
}

Custom Actor Selection

{
    "targetUrl": "https://example.com",
    "userGoal": "Extract product data",
    "claudeApiKey": "sk-ant-...",
    "actors": [
        {
            "name": "cheerio-scraper",
            "enabled": true,
            "input": {
                "maxRequestRetries": 5,
                "requestTimeoutSecs": 60,
                "maxPagesPerCrawl": 1,
                "proxyConfiguration": {"useApifyProxy": true}
            }
        },
        {
            "name": "web-scraper",
            "enabled": false,
            "input": {}
        },
        {
            "name": "playwright-scraper",
            "enabled": true,
            "input": {
                "maxRequestRetries": 3,
                "requestTimeoutSecs": 90,
                "maxPagesPerCrawl": 1
            }
        }
    ],
    "concurrentActors": true
}

Full Configuration Example

{
    "targetUrl": "https://books.toscrape.com/",
    "userGoal": "Get me a list of all the books on the first page. For each book, I want its title, price, star rating, and whether it is in stock.",
    "claudeApiKey": "sk-ant-...",
    "claudeModel": "claude-sonnet-4-20250514",
    "testScript": true,
    "maxRetries": 3,
    "timeout": 60,
    "htmlPruningEnabled": true,
    "htmlMaxListItems": 5,
    "htmlMaxTextLength": 500,
    "htmlPruneBeforeEvaluation": true,
    "htmlPrunePercentage": 80,
    "concurrentActors": true,
    "actors": [
        {
            "name": "cheerio-scraper",
            "enabled": true,
            "input": {
                "maxRequestRetries": 3,
                "requestTimeoutSecs": 30,
                "maxPagesPerCrawl": 1,
                "proxyConfiguration": {"useApifyProxy": true}
            }
        },
        {
            "name": "web-scraper",
            "enabled": true,
            "input": {
                "maxRequestRetries": 3,
                "requestTimeoutSecs": 30,
                "maxPagesPerCrawl": 1,
                "proxyConfiguration": {"useApifyProxy": true}
            }
        },
        {
            "name": "playwright-scraper",
            "enabled": true,
            "input": {
                "maxRequestRetries": 2,
                "requestTimeoutSecs": 45,
                "maxPagesPerCrawl": 1
            }
        }
    ]
}

Playwright Scraper

apify/playwright-scraper

Crawls websites with the headless Chromium, Chrome, or Firefox browser and Playwright library using a provided server-side Node.js code. Supports both recursive crawling and a list of URLs. Supports login to a website.

Apify

3.8K

5.0

Vanilla JS Scraper

mstephen190/vanilla-js-scraper

Scrape the web using familiar JavaScript methods! Crawls websites using raw HTTP requests, parses the HTML with the JSDOM package, and extracts data from the pages using Node.js code. Supports both recursive crawling and lists of URLs. This actor is a non jQuery alternative to CheerioScraper.

Matthias Stephens

501

LLMScraper

ohlava/LLMScraper

Find best scraper for your website and data you need.

Ondřej Hlava

ecommerce-guardian-pro

allanjblythe/price-monitor

Professional price intelligence platform. Monitor competitor pricing across major e-commerce sites. Get instant alerts for price changes, stock status, and market movements. Uses Apify's advanced E-commerce Scraping Tool. Essential for e-commerce strategy and competitive analysis.

Allen Blythe

Ai Web Scraper - Extract Data With Ease

eloquent_mountain/ai-web-scraper-extract-data-with-ease

Ai Web Scraper enables scraping for everyone, including non-techies! It uses Google's Gemini LLM to scrape websites with natural language commands. It dynamically extracts data, no selector input needed, handles dynamic content and cookie consent, avoids bot detection, outputs JSON or other formats.

Paco

835

5.0

Puppeteer Scraper

apify/puppeteer-scraper

Crawls websites with the headless Chrome and Puppeteer library using a provided server-side Node.js code. This crawler is an alternative to apify/web-scraper that gives you finer control over the process. Supports both recursive crawling and list of URLs. Supports login to website.

Apify

10K

4.8

Smartcontext AI Web Crawler

bluelightco/smartcontext-ai-crawler

Scrape any website and extract structured data using AI-powered instructions. Provide URLs and a natural language prompt to get tailored JSON outputs.

Bluelight

5.0

AI Web Scraper - Powered by Crawl4AI

raizen/ai-web-scraper

A blazing-fast AI web scraper powered by Crawl4AI. Perfect for LLMs, AI agents, AI automation, model training, sentiment analysis, and content generation. Supports deep crawling, multiple extraction strategies and flexible output (Markdown/JSON). Seamlessly integrates with Make.com, n8n, and Zapier.

Raizen Technology

279

1.0

Trustpilot Business & Review Data Extractor

dataraptor/trustpilot-business-review-data-extractor

Powerful and flexible Trustpilot scraper to extract business information, company details, and customer reviews. Supports scraping reviews by business or category, customizable limits, and advanced filtering. Ideal for market research, sentiment analysis, lead generation, and data enrichment.

DataRaptor

Web Scraper

apify/web-scraper

Crawls arbitrary websites using a web browser and extracts structured data from web pages using a provided JavaScript function. The Actor supports both recursive crawling and lists of URLs, and automatically manages concurrency for maximum performance.

Apify

98K

4.8

ScraperCodeGenerator

ScraperCodeGenerator

🧠 AI-Powered Web Scraper & Code Generator

🚀 What This Actor Does

✨ Key Benefits

📊 Output Data

🎯 What You Get

🎯 Usage Examples

E-commerce Product Scraping

News Website Scraping

Job Listings Scraping

Quote Collection

Business Directory Scraping

🔧 How to Use

⚙️ Advanced Configuration

🤖 Claude Model Selection

🔧 HTML Processing Settings

🎯 Actor Selection

🚀 Performance Settings

Common Use Cases

🔍 Troubleshooting

"No content found" errors

Poor quality scores

🔑 Getting Your Claude API Key

Claude API errors

📋 Input Parameters

Advanced Configuration Examples

Custom Claude Model

Custom HTML Processing

Custom Actor Selection

Full Configuration Example

You might also like

Playwright Scraper

Vanilla JS Scraper

LLMScraper

ecommerce-guardian-pro

Ai Web Scraper - Extract Data With Ease

Puppeteer Scraper

Smartcontext AI Web Crawler

AI Web Scraper - Powered by Crawl4AI

Trustpilot Business & Review Data Extractor

Web Scraper

Related articles