AI Extraction Agent - Smart Scraper
Pricing
from $0.01 / 1,000 results
AI Extraction Agent - Smart Scraper
AI-powered data extraction using natural language prompts. Describe what you need & let AI extract structured data from any webpage automatically.
Pricing
from $0.01 / 1,000 results
Rating
0.0
(0)
Developer

John Rippy
Actor stats
2
Bookmarked
7
Total users
2
Monthly active users
5.4 hours
Issues response
6 days ago
Last modified
Categories
Share
AI Extraction Agent - Autonomous Web Data Extraction
AI-powered web agent that autonomously extracts data from websites using natural language objectives. Uses Claude AI for intelligent extraction - NO Firecrawl dependency required. Built by John Rippy (https://www.linkedin.com/in/johnrippy/ | https://johnrippy.link/).
Features
- Automated data collection
- Structured output format
- Error handling
- Pay-per-event billing
Quick Start
{"input": "your input here"}
Demo Mode
Set demoMode: true to test with sample data (no charges). When you're ready for real results, set demoMode: false or omit it.
{"demoMode": true,...}
Input Parameters
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
url | string | Yes* | - | Starting URL to begin extraction |
objective | string | Yes* | - | Natural language description of what to extract |
schema | object | No | - | JSON schema to structure the output |
maxPages | integer | No | 5 | Maximum pages to crawl (1-50) |
followLinks | boolean | No | true | Whether to follow links to discover content |
anthropicApiKey | string | Yes* | - | Your Anthropic API key for Claude AI |
demoMode | boolean | No | false | Run with sample data (no API key required) |
*Required when not in demo mode
Output Format
{"success": true,"url": "https://example.com","objective": "Find pricing plans","data": {"plans": [{"name": "Starter","price": 29,"features": ["Feature 1", "Feature 2"]}]},"pagesScraped": 3,"pagesVisited": ["https://example.com","https://example.com/pricing"],"extractedAt": "2024-12-23T10:30:00.000Z"}
Pricing
This actor uses pay-per-event billing:
Apify Compute
- Standard Playwright actor pricing
- ~$0.25-0.50 per run (depends on pages scraped)
Anthropic API (BYOK)
- Claude API usage: ~$0.003-0.015 per extraction
- Depends on page content size
- Uses claude-sonnet-4-20250514 for best results
Cost Comparison
| Task | This Actor | Firecrawl | Diffbot |
|---|---|---|---|
| Extract pricing from 1 site | ~$0.30 | $0.001/page + $16/mo | $299/mo |
| Extract 100 product listings | ~$2.50 | $0.10 + $16/mo | $299/mo |
| Monthly cost (100 extractions) | ~$30 | ~$26 | $299 |
No monthly subscription. Pay per use.
Use Cases
1. Competitive Pricing Intelligence
{"url": "https://competitor.com","objective": "Find all pricing plans and list their names, monthly costs, annual discounts, and included features"}
2. Lead Enrichment
{"url": "https://company.com","objective": "Extract the leadership team with their names, titles, and LinkedIn profiles"}
3. Product Research
{"url": "https://store.com/products","objective": "Get all products with name, price, description, SKU, and availability status"}
4. Content Aggregation
{"url": "https://blog.company.com","objective": "Extract all blog posts with title, author, date, and summary"}
5. Job Listings
{"url": "https://company.com/careers","objective": "Find all open positions with title, department, location, and requirements"}
Common Problems & Solutions
"Invalid API key" error
Cause: Your API key is wrong, expired, or doesn't have the right permissions. Fix: Double-check your API key. Make sure you copied it exactly without extra spaces.
"Rate limit exceeded" error
Cause: You've hit the API's rate limits. Fix: Wait a few minutes, then try again. Consider reducing the number of concurrent requests.
Empty or incomplete results
Cause: The target may have anti-scraping protection or the data doesn't exist. Fix:
- Check if the URL/search query is correct
- Try with different parameters
- Some sites may block automated access
Demo data showing instead of real results
Cause: demoMode is still set to true.
Fix: Set demoMode: false and provide your API key(s).
Built by John Rippy | Actor Arsenal