ScraperCodeGenerator avatar
ScraperCodeGenerator

Pricing

Pay per usage

Go to Store
ScraperCodeGenerator

ScraperCodeGenerator

Developed by

Ondřej Hlava

Ondřej Hlava

Maintained by Community

An intelligent web scraping tool that automatically generates custom scraping code for any website.

0.0 (0)

Pricing

Pay per usage

0

Total users

1

Monthly users

1

Last modified

17 hours ago

🧠 AI-Powered Web Scraper & Code Generator

Stop writing scraping code manually! This intelligent actor doesn't just scrape websites - it automatically generates custom Python scraping code tailored to your specific needs.

You get both the extracted data AND the code to replicate it anytime.

🚀 What This Actor Does

The actor will automatically:

  • Test multiple scraping methods: Runs multiple scraping strategies (Cheerio, Web Scraper, Website Content Crawler, Playwright, etc.) in parallel for faster results
  • Evaluate which works best using AI: Claude AI analyzes each result and selects the best extraction
  • Extract your requested data: Automatically structures the extracted data based on your requirements
  • 🔥 Generate custom Python code that scrapes YOUR website: Creates personalized Python scraping code that you can run independently
  • Provide the code as a downloadable script you can run anywhere: Complete, ready-to-use BeautifulSoup script saved to key-value store

✨ Key Benefits

  • No Technical Knowledge Required: Just describe what data you want in plain English
  • Resilient Scraping: Multiple strategies ensure success even if one method fails
  • AI-Powered: Uses Claude AI to understand content context and select optimal results
  • 🎯 Custom Code Generation: Get personalized Python code that scrapes YOUR specific website
  • Production Ready: Generated code is clean, documented, and ready to run independently
  • Reusable: Use the generated code in your own projects, scripts, or applications

📊 Output Data

The actor saves comprehensive results to your default dataset AND saves the generated script to the key-value store.

💡 How to Access: After the actor finishes, go to the "Key-value store" tab in your run details and download the GENERATED_SCRIPT file. Rename it to have the extension: .py.

🎯 What You Get

  • Extracted Data: The actual data from the website, structured according to your goal
  • 🔥 Generated Python Code: Ready-to-use BeautifulSoup script that you can run on your own computer
  • 💾 Separate Script File: The Python code is also saved as a downloadable file in the key-value store
  • Quality Scores: Performance ratings for each scraping method (0-10 scale)
  • Best Method: Which scraping approach worked best for your website

💡 Pro Tip: The generated Python code is completely standalone - you can copy it, modify it, and use it in your own projects without needing this actor again!

🎯 Usage Examples

E-commerce Product Scraping

{
"targetUrl": "https://books.toscrape.com/",
"userGoal": "Get me a list of all the books on the first page. For each book, I want its title, price, star rating, and whether it is in stock.",
"claudeApiKey": "sk-ant-..."
}

News Website Scraping

{
"targetUrl": "https://www.theverge.com/",
"userGoal": "I want to scrape the main articles from The Verge homepage. For each article, get me the headline, the author's name, and the link to the full article.",
"claudeApiKey": "sk-ant-..."
}

Job Listings Scraping

{
"targetUrl": "https://www.python.org/jobs/",
"userGoal": "List all the jobs posted. For each job, I want the job title, the company name, the location, and the date it was posted.",
"claudeApiKey": "sk-ant-..."
}

Quote Collection

{
"targetUrl": "https://quotes.toscrape.com/",
"userGoal": "I want a list of all quotes on this page. For each one, get the quote text itself, the name of the author, and a list of the tags associated with it.",
"claudeApiKey": "sk-ant-..."
}

Business Directory Scraping

{
"targetUrl": "https://directory.com/restaurants",
"userGoal": "Get restaurant names, addresses, phone numbers, and ratings",
"claudeApiKey": "sk-ant-..."
}

🔧 How to Use

  1. Enter Target URL: Paste the website URL you want to scrape
  2. Describe Your Goal: Be specific about what data you need (e.g., "product names and prices" not just "products")
  3. Add Claude API Key: Your Anthropic API key for AI analysis
  4. Configure Advanced Settings (optional): Customize Claude model, HTML processing, and actor selection
  5. Run the Actor: Click "Start" and watch the magic happen!

⚙️ Advanced Configuration

🤖 Claude Model Selection

Choose the AI model that best fits your needs:

  • Claude 4 Sonnet (Default): Latest and most capable model
  • Claude 4 Opus: Maximum quality for the most complex tasks
  • Claude 3.7 Sonnet: Enhanced capabilities over 3.5
  • Claude 3.5 Sonnet: Reliable and well-tested
  • Claude 3.5 Haiku: Fastest and most cost-effective
  • Claude 3 Sonnet: Good balance for most tasks
  • Claude 3 Haiku: Basic tasks with minimal cost

🔧 HTML Processing Settings

Fine-tune how HTML content is processed:

  • Enable HTML Pruning: Reduces processing time by removing unnecessary content
  • Max List Items: Controls how many items to keep in lists/tables (1-20)
  • Max Text Length: Maximum text length in any element (100-2000 chars)
  • Prune Percentage: How much content to keep (10%-100%)

🎯 Actor Selection

Choose which scraping methods to use:

  • Cheerio Scraper: Fast jQuery-like scraping (enabled by default)
  • Web Scraper: Versatile with JavaScript support (enabled by default)
  • Website Content Crawler: Advanced Playwright crawler (enabled by default)
  • Playwright Scraper: Modern browser automation (disabled by default)
  • Puppeteer Scraper: Chrome-based scraping (disabled by default)

💡 Pro Tip: Enable 2-3 actors for the best balance of speed and reliability. More actors = better chances of success but slower execution.

🚀 Performance Settings

  • Concurrent Actors: Run multiple actors simultaneously for faster results
  • Test Generated Script: Validate the generated code before saving

The actor will automatically:

  • Test multiple scraping methods
  • Evaluate which works best using AI
  • Extract your requested data
  • 🔥 Generate custom Python code that scrapes YOUR website
  • Provide the code as a downloadable script you can run anywhere

Common Use Cases

  • Market Research: Track competitor pricing and products + get code to monitor them daily
  • Content Aggregation: Collect news articles or blog posts + get code to update your database
  • Lead Generation: Extract business contact information + get code to scrape new listings
  • Data Analysis: Gather data for research projects + get code to repeat the process
  • Price Monitoring: Track product prices over time + get code to check prices automatically

🔍 Troubleshooting

"No content found" errors

  • Try different goal descriptions
  • Some websites block automated scraping
  • Check if the URL is accessible

Poor quality scores

  • Be more specific in your goal description
  • The website might have complex structure
  • Try simpler pages first

🔑 Getting Your Claude API Key

  1. Go to Anthropic Console
  2. Sign up or log in
  3. Navigate to API Keys section
  4. Create a new API key
  5. Copy and paste it into the "Claude API Key" field

Claude API errors

  • Verify your API key is correct
  • Check your Claude API usage limits
  • Ensure you have sufficient API credits

📋 Input Parameters

ParameterTypeRequiredDescription
Target URLStringYesThe website URL you want to scrape
User GoalStringYesDescribe what data you want (e.g., "Extract all product names, prices, and ratings")
Claude API KeyStringYesYour Anthropic Claude API key (Get one here)
Test Generated ScriptBooleanNoWhether to test the generated script (default: true)
Claude ModelStringNoAI model to use (default: Claude 4 Sonnet)
Max RetriesNumberNoMaximum retry attempts (default: 3)
TimeoutNumberNoTimeout per attempt in seconds (default: 60)
HTML Pruning EnabledBooleanNoEnable HTML content processing (default: true)
HTML Max List ItemsNumberNoMaximum items in lists to keep (1-20, default: 3)
HTML Max Text LengthNumberNoMaximum text length in elements (50-2000, default: 200)
HTML Prune Before EvaluationBooleanNoApply pruning before AI evaluation (default: true)
HTML Prune PercentageNumberNoPercentage of content to keep (0-100, default: 80)
ActorsArrayNoDetailed actor configurations with custom inputs
Concurrent ActorsBooleanNoRun actors simultaneously (default: true)

Advanced Configuration Examples

Custom Claude Model

{
"targetUrl": "https://example.com",
"userGoal": "Extract product data",
"claudeApiKey": "sk-ant-...",
"claudeModel": "claude-sonnet-4-20250514"
}

Custom HTML Processing

{
"targetUrl": "https://example.com",
"userGoal": "Extract product data",
"claudeApiKey": "sk-ant-...",
"htmlPruningEnabled": true,
"htmlMaxListItems": 10,
"htmlMaxTextLength": 1000,
"htmlPrunePercentage": 90
}

Custom Actor Selection

{
"targetUrl": "https://example.com",
"userGoal": "Extract product data",
"claudeApiKey": "sk-ant-...",
"actors": [
{
"name": "cheerio-scraper",
"enabled": true,
"input": {
"maxRequestRetries": 5,
"requestTimeoutSecs": 60,
"maxPagesPerCrawl": 1,
"proxyConfiguration": {"useApifyProxy": true}
}
},
{
"name": "web-scraper",
"enabled": false,
"input": {}
},
{
"name": "playwright-scraper",
"enabled": true,
"input": {
"maxRequestRetries": 3,
"requestTimeoutSecs": 90,
"maxPagesPerCrawl": 1
}
}
],
"concurrentActors": true
}

Full Configuration Example

{
"targetUrl": "https://books.toscrape.com/",
"userGoal": "Get me a list of all the books on the first page. For each book, I want its title, price, star rating, and whether it is in stock.",
"claudeApiKey": "sk-ant-...",
"claudeModel": "claude-sonnet-4-20250514",
"testScript": true,
"maxRetries": 3,
"timeout": 60,
"htmlPruningEnabled": true,
"htmlMaxListItems": 5,
"htmlMaxTextLength": 500,
"htmlPruneBeforeEvaluation": true,
"htmlPrunePercentage": 80,
"concurrentActors": true,
"actors": [
{
"name": "cheerio-scraper",
"enabled": true,
"input": {
"maxRequestRetries": 3,
"requestTimeoutSecs": 30,
"maxPagesPerCrawl": 1,
"proxyConfiguration": {"useApifyProxy": true}
}
},
{
"name": "web-scraper",
"enabled": true,
"input": {
"maxRequestRetries": 3,
"requestTimeoutSecs": 30,
"maxPagesPerCrawl": 1,
"proxyConfiguration": {"useApifyProxy": true}
}
},
{
"name": "playwright-scraper",
"enabled": true,
"input": {
"maxRequestRetries": 2,
"requestTimeoutSecs": 45,
"maxPagesPerCrawl": 1
}
}
]
}