
ScraperCodeGenerator
Pricing
Pay per usage

ScraperCodeGenerator
An intelligent web scraping tool that automatically generates custom scraping code for any website.
0.0 (0)
Pricing
Pay per usage
0
Total users
1
Monthly users
1
Last modified
17 hours ago
🧠 AI-Powered Web Scraper & Code Generator
Stop writing scraping code manually! This intelligent actor doesn't just scrape websites - it automatically generates custom Python scraping code tailored to your specific needs.
You get both the extracted data AND the code to replicate it anytime.
🚀 What This Actor Does
The actor will automatically:
- Test multiple scraping methods: Runs multiple scraping strategies (Cheerio, Web Scraper, Website Content Crawler, Playwright, etc.) in parallel for faster results
- Evaluate which works best using AI: Claude AI analyzes each result and selects the best extraction
- Extract your requested data: Automatically structures the extracted data based on your requirements
- 🔥 Generate custom Python code that scrapes YOUR website: Creates personalized Python scraping code that you can run independently
- Provide the code as a downloadable script you can run anywhere: Complete, ready-to-use BeautifulSoup script saved to key-value store
✨ Key Benefits
- No Technical Knowledge Required: Just describe what data you want in plain English
- Resilient Scraping: Multiple strategies ensure success even if one method fails
- AI-Powered: Uses Claude AI to understand content context and select optimal results
- 🎯 Custom Code Generation: Get personalized Python code that scrapes YOUR specific website
- Production Ready: Generated code is clean, documented, and ready to run independently
- Reusable: Use the generated code in your own projects, scripts, or applications
📊 Output Data
The actor saves comprehensive results to your default dataset AND saves the generated script to the key-value store.
💡 How to Access: After the actor finishes, go to the "Key-value store" tab in your run details and download the
GENERATED_SCRIPT
file. Rename it to have the extension: .py.
🎯 What You Get
- Extracted Data: The actual data from the website, structured according to your goal
- 🔥 Generated Python Code: Ready-to-use BeautifulSoup script that you can run on your own computer
- 💾 Separate Script File: The Python code is also saved as a downloadable file in the key-value store
- Quality Scores: Performance ratings for each scraping method (0-10 scale)
- Best Method: Which scraping approach worked best for your website
💡 Pro Tip: The generated Python code is completely standalone - you can copy it, modify it, and use it in your own projects without needing this actor again!
🎯 Usage Examples
E-commerce Product Scraping
{"targetUrl": "https://books.toscrape.com/","userGoal": "Get me a list of all the books on the first page. For each book, I want its title, price, star rating, and whether it is in stock.","claudeApiKey": "sk-ant-..."}
News Website Scraping
{"targetUrl": "https://www.theverge.com/","userGoal": "I want to scrape the main articles from The Verge homepage. For each article, get me the headline, the author's name, and the link to the full article.","claudeApiKey": "sk-ant-..."}
Job Listings Scraping
{"targetUrl": "https://www.python.org/jobs/","userGoal": "List all the jobs posted. For each job, I want the job title, the company name, the location, and the date it was posted.","claudeApiKey": "sk-ant-..."}
Quote Collection
{"targetUrl": "https://quotes.toscrape.com/","userGoal": "I want a list of all quotes on this page. For each one, get the quote text itself, the name of the author, and a list of the tags associated with it.","claudeApiKey": "sk-ant-..."}
Business Directory Scraping
{"targetUrl": "https://directory.com/restaurants","userGoal": "Get restaurant names, addresses, phone numbers, and ratings","claudeApiKey": "sk-ant-..."}
🔧 How to Use
- Enter Target URL: Paste the website URL you want to scrape
- Describe Your Goal: Be specific about what data you need (e.g., "product names and prices" not just "products")
- Add Claude API Key: Your Anthropic API key for AI analysis
- Configure Advanced Settings (optional): Customize Claude model, HTML processing, and actor selection
- Run the Actor: Click "Start" and watch the magic happen!
⚙️ Advanced Configuration
🤖 Claude Model Selection
Choose the AI model that best fits your needs:
- Claude 4 Sonnet (Default): Latest and most capable model
- Claude 4 Opus: Maximum quality for the most complex tasks
- Claude 3.7 Sonnet: Enhanced capabilities over 3.5
- Claude 3.5 Sonnet: Reliable and well-tested
- Claude 3.5 Haiku: Fastest and most cost-effective
- Claude 3 Sonnet: Good balance for most tasks
- Claude 3 Haiku: Basic tasks with minimal cost
🔧 HTML Processing Settings
Fine-tune how HTML content is processed:
- Enable HTML Pruning: Reduces processing time by removing unnecessary content
- Max List Items: Controls how many items to keep in lists/tables (1-20)
- Max Text Length: Maximum text length in any element (100-2000 chars)
- Prune Percentage: How much content to keep (10%-100%)
🎯 Actor Selection
Choose which scraping methods to use:
- Cheerio Scraper: Fast jQuery-like scraping (enabled by default)
- Web Scraper: Versatile with JavaScript support (enabled by default)
- Website Content Crawler: Advanced Playwright crawler (enabled by default)
- Playwright Scraper: Modern browser automation (disabled by default)
- Puppeteer Scraper: Chrome-based scraping (disabled by default)
💡 Pro Tip: Enable 2-3 actors for the best balance of speed and reliability. More actors = better chances of success but slower execution.
🚀 Performance Settings
- Concurrent Actors: Run multiple actors simultaneously for faster results
- Test Generated Script: Validate the generated code before saving
The actor will automatically:
- Test multiple scraping methods
- Evaluate which works best using AI
- Extract your requested data
- 🔥 Generate custom Python code that scrapes YOUR website
- Provide the code as a downloadable script you can run anywhere
Common Use Cases
- Market Research: Track competitor pricing and products + get code to monitor them daily
- Content Aggregation: Collect news articles or blog posts + get code to update your database
- Lead Generation: Extract business contact information + get code to scrape new listings
- Data Analysis: Gather data for research projects + get code to repeat the process
- Price Monitoring: Track product prices over time + get code to check prices automatically
🔍 Troubleshooting
"No content found" errors
- Try different goal descriptions
- Some websites block automated scraping
- Check if the URL is accessible
Poor quality scores
- Be more specific in your goal description
- The website might have complex structure
- Try simpler pages first
🔑 Getting Your Claude API Key
- Go to Anthropic Console
- Sign up or log in
- Navigate to API Keys section
- Create a new API key
- Copy and paste it into the "Claude API Key" field
Claude API errors
- Verify your API key is correct
- Check your Claude API usage limits
- Ensure you have sufficient API credits
📋 Input Parameters
Parameter | Type | Required | Description |
---|---|---|---|
Target URL | String | Yes | The website URL you want to scrape |
User Goal | String | Yes | Describe what data you want (e.g., "Extract all product names, prices, and ratings") |
Claude API Key | String | Yes | Your Anthropic Claude API key (Get one here) |
Test Generated Script | Boolean | No | Whether to test the generated script (default: true) |
Claude Model | String | No | AI model to use (default: Claude 4 Sonnet) |
Max Retries | Number | No | Maximum retry attempts (default: 3) |
Timeout | Number | No | Timeout per attempt in seconds (default: 60) |
HTML Pruning Enabled | Boolean | No | Enable HTML content processing (default: true) |
HTML Max List Items | Number | No | Maximum items in lists to keep (1-20, default: 3) |
HTML Max Text Length | Number | No | Maximum text length in elements (50-2000, default: 200) |
HTML Prune Before Evaluation | Boolean | No | Apply pruning before AI evaluation (default: true) |
HTML Prune Percentage | Number | No | Percentage of content to keep (0-100, default: 80) |
Actors | Array | No | Detailed actor configurations with custom inputs |
Concurrent Actors | Boolean | No | Run actors simultaneously (default: true) |
Advanced Configuration Examples
Custom Claude Model
{"targetUrl": "https://example.com","userGoal": "Extract product data","claudeApiKey": "sk-ant-...","claudeModel": "claude-sonnet-4-20250514"}
Custom HTML Processing
{"targetUrl": "https://example.com","userGoal": "Extract product data","claudeApiKey": "sk-ant-...","htmlPruningEnabled": true,"htmlMaxListItems": 10,"htmlMaxTextLength": 1000,"htmlPrunePercentage": 90}
Custom Actor Selection
{"targetUrl": "https://example.com","userGoal": "Extract product data","claudeApiKey": "sk-ant-...","actors": [{"name": "cheerio-scraper","enabled": true,"input": {"maxRequestRetries": 5,"requestTimeoutSecs": 60,"maxPagesPerCrawl": 1,"proxyConfiguration": {"useApifyProxy": true}}},{"name": "web-scraper","enabled": false,"input": {}},{"name": "playwright-scraper","enabled": true,"input": {"maxRequestRetries": 3,"requestTimeoutSecs": 90,"maxPagesPerCrawl": 1}}],"concurrentActors": true}
Full Configuration Example
{"targetUrl": "https://books.toscrape.com/","userGoal": "Get me a list of all the books on the first page. For each book, I want its title, price, star rating, and whether it is in stock.","claudeApiKey": "sk-ant-...","claudeModel": "claude-sonnet-4-20250514","testScript": true,"maxRetries": 3,"timeout": 60,"htmlPruningEnabled": true,"htmlMaxListItems": 5,"htmlMaxTextLength": 500,"htmlPruneBeforeEvaluation": true,"htmlPrunePercentage": 80,"concurrentActors": true,"actors": [{"name": "cheerio-scraper","enabled": true,"input": {"maxRequestRetries": 3,"requestTimeoutSecs": 30,"maxPagesPerCrawl": 1,"proxyConfiguration": {"useApifyProxy": true}}},{"name": "web-scraper","enabled": true,"input": {"maxRequestRetries": 3,"requestTimeoutSecs": 30,"maxPagesPerCrawl": 1,"proxyConfiguration": {"useApifyProxy": true}}},{"name": "playwright-scraper","enabled": true,"input": {"maxRequestRetries": 2,"requestTimeoutSecs": 45,"maxPagesPerCrawl": 1}}]}