AI Extraction Agent - Smart Scraper avatar
AI Extraction Agent - Smart Scraper

Pricing

from $0.01 / 1,000 results

Go to Apify Store
AI Extraction Agent - Smart Scraper

AI Extraction Agent - Smart Scraper

AI-powered data extraction using natural language prompts. Describe what you need & let AI extract structured data from any webpage automatically.

Pricing

from $0.01 / 1,000 results

Rating

0.0

(0)

Developer

John Rippy

John Rippy

Maintained by Community

Actor stats

2

Bookmarked

7

Total users

2

Monthly active users

5.4 hours

Issues response

6 days ago

Last modified

Share

AI Extraction Agent - Autonomous Web Data Extraction

AI-powered web agent that autonomously extracts data from websites using natural language objectives. Uses Claude AI for intelligent extraction - NO Firecrawl dependency required. Built by John Rippy (https://www.linkedin.com/in/johnrippy/ | https://johnrippy.link/).

Features

  • Automated data collection
  • Structured output format
  • Error handling
  • Pay-per-event billing

Quick Start

{
"input": "your input here"
}

Demo Mode

Set demoMode: true to test with sample data (no charges). When you're ready for real results, set demoMode: false or omit it.

{
"demoMode": true,
...
}

Input Parameters

ParameterTypeRequiredDefaultDescription
urlstringYes*-Starting URL to begin extraction
objectivestringYes*-Natural language description of what to extract
schemaobjectNo-JSON schema to structure the output
maxPagesintegerNo5Maximum pages to crawl (1-50)
followLinksbooleanNotrueWhether to follow links to discover content
anthropicApiKeystringYes*-Your Anthropic API key for Claude AI
demoModebooleanNofalseRun with sample data (no API key required)

*Required when not in demo mode


Output Format

{
"success": true,
"url": "https://example.com",
"objective": "Find pricing plans",
"data": {
"plans": [
{
"name": "Starter",
"price": 29,
"features": ["Feature 1", "Feature 2"]
}
]
},
"pagesScraped": 3,
"pagesVisited": [
"https://example.com",
"https://example.com/pricing"
],
"extractedAt": "2024-12-23T10:30:00.000Z"
}

Pricing

This actor uses pay-per-event billing:

Apify Compute

  • Standard Playwright actor pricing
  • ~$0.25-0.50 per run (depends on pages scraped)

Anthropic API (BYOK)

  • Claude API usage: ~$0.003-0.015 per extraction
  • Depends on page content size
  • Uses claude-sonnet-4-20250514 for best results

Cost Comparison

TaskThis ActorFirecrawlDiffbot
Extract pricing from 1 site~$0.30$0.001/page + $16/mo$299/mo
Extract 100 product listings~$2.50$0.10 + $16/mo$299/mo
Monthly cost (100 extractions)~$30~$26$299

No monthly subscription. Pay per use.


Use Cases

1. Competitive Pricing Intelligence

{
"url": "https://competitor.com",
"objective": "Find all pricing plans and list their names, monthly costs, annual discounts, and included features"
}

2. Lead Enrichment

{
"url": "https://company.com",
"objective": "Extract the leadership team with their names, titles, and LinkedIn profiles"
}

3. Product Research

{
"url": "https://store.com/products",
"objective": "Get all products with name, price, description, SKU, and availability status"
}

4. Content Aggregation

{
"url": "https://blog.company.com",
"objective": "Extract all blog posts with title, author, date, and summary"
}

5. Job Listings

{
"url": "https://company.com/careers",
"objective": "Find all open positions with title, department, location, and requirements"
}


Common Problems & Solutions

"Invalid API key" error

Cause: Your API key is wrong, expired, or doesn't have the right permissions. Fix: Double-check your API key. Make sure you copied it exactly without extra spaces.

"Rate limit exceeded" error

Cause: You've hit the API's rate limits. Fix: Wait a few minutes, then try again. Consider reducing the number of concurrent requests.

Empty or incomplete results

Cause: The target may have anti-scraping protection or the data doesn't exist. Fix:

  • Check if the URL/search query is correct
  • Try with different parameters
  • Some sites may block automated access

Demo data showing instead of real results

Cause: demoMode is still set to true. Fix: Set demoMode: false and provide your API key(s).


Built by John Rippy | Actor Arsenal