Deprecated

Pricing

from $50.00 / 1,000 ai web extractor — natural language prompt to structured json, csv, or markdown from any url.s

See alternative Actors

Go to Apify Store

XavvyNess AI Web Extractor

Deprecated

See alternative Actors

Extract data from any website using plain English — no CSS selectors, no code. Describe what you want, get JSON, CSV, or Markdown back. Works even when site layouts change. Example: 'Extract job titles, company names, and salaries'.

Pricing

from $50.00 / 1,000 ai web extractor — natural language prompt to structured json, csv, or markdown from any url.s

Rating

0.0

(0)

Developer

XavvyNess

Actor stats

Bookmarked

Total users

Monthly active users

a month ago

Last modified

🤖 XavvyNess Smart Extractor — Natural Language Web Scraping

Extract structured data from any website using plain English. No code, no XPath, no CSS selectors. Just describe what you want and get clean JSON, CSV, or Markdown back. Works even when websites change their HTML.

Same price as Apify's official AI Web Scraper ($25/1,000 pages) — but with JSON, CSV, and Markdown output, plus specific error messages instead of generic failures.

Demo

🎬 Video demo coming soon. Upload smart-extractor.mp4 to YouTube, then run python3 scripts/actor-video-gen.py --embed-readmes to embed it here automatically.

🚀 What It Does

Crawls any webpage and extracts clean text content
Sends the content + your extraction prompt to AI
Returns structured data in the format you choose (JSON, CSV, Markdown)

Perfect for: lead generation, price monitoring, content aggregation, data pipelines, research automation

📥 Input

Field	Required	Default	Description
`urls`	✅	—	URLs to extract data from
`extractionPrompt`	✅	—	Plain English description of what to extract
`outputFormat`	—	`json`	`json` / `csv` / `markdown`
`maxItems`	—	`50`	Maximum items to extract per page (1-500)

Example inputs:

{
  "urls": ["https://news.ycombinator.com/"],
  "extractionPrompt": "Extract all post titles, point scores, and comment counts. Return as a list.",
  "outputFormat": "json",
  "maxItems": 30
}

{
  "urls": ["https://www.g2.com/products/hubspot/reviews"],
  "extractionPrompt": "Extract reviewer name, star rating, review title, and the pros and cons mentioned in each review.",
  "outputFormat": "json"
}

📤 Output (JSON format)

Real output from a live run on Hacker News:

{
  "sourceUrl": "https://news.ycombinator.com",
  "extractionPrompt": "Extract top 10 story titles with their point scores and comment counts",
  "items": [
    { "title": "I ported Mac OS X to the Nintendo Wii", "points": 1032, "comments": 194 },
    { "title": "Git commands I run before reading any code", "points": 1653, "comments": 355 },
    { "title": "Veracrypt project update", "points": 1077, "comments": 404 },
    { "title": "They're made out of meat (1991)", "points": 348, "comments": 99 },
    { "title": "ML promises to be profoundly weird", "points": 314, "comments": 359 },
    { "title": "Muse Spark: Scaling towards personal superintelligence", "points": 214, "comments": 257 },
    { "title": "Understanding the Kalman filter with a simple radar example", "points": 156, "comments": 25 },
    { "title": "USB for Software Developers", "points": 104, "comments": 15 },
    { "title": "Expanding Swift's IDE Support", "points": 55, "comments": 30 },
    { "title": "Pgit: I Imported the Linux Kernel into PostgreSQL", "points": 47, "comments": 4 }
  ],
  "itemCount": 10,
  "totalFound": 10,
  "outputFormat": "json",
  "extractedAt": "2026-04-08T22:22:20.139Z",
  "agent": "XavvyNess Smart Extractor"
}

💡 Writing Good Extraction Prompts

Be specific about what fields you want and their types:

❌ Vague	✅ Specific
"Get the jobs"	"Extract job title, company name, location, and salary range for each listing"
"Scrape reviews"	"Extract reviewer name, star rating (1-5), and the main complaint from each review"
"Get prices"	"Extract product name, original price, discounted price, and stock status"

⚙️ Setup — API Keys

Variable	Required	Where to Get
`GROQ_API_KEY`	Recommended (free)	console.groq.com
`GOOGLE_API_KEY`	Optional fallback	aistudio.google.com

❓ FAQ

Q: What if the site uses JavaScript rendering (React/Vue/Angular)?
A: The actor uses CheerioCrawler which handles static HTML. For JS-heavy SPAs, the extracted text may be limited. For React apps, try URLs that serve server-side rendered content.

Q: What if the site blocks the crawler (403)?
A: You'll get a clear error message: "Access denied (403) — site blocks automated requests". Try again with a different URL from the same site, or contact us about proxy options.

Q: Can I extract from multiple pages at once?
A: Yes — add multiple URLs to the urls array. Each page is processed independently with the same extraction prompt.

Q: How is this different from a normal scraper?
A: A normal scraper needs hard-coded CSS selectors that break when the site updates. This actor uses AI to understand the content structure — it adapts automatically.

🔗 Use Cases

Lead generation — Extract company names, emails, and phone numbers from directories
Price monitoring — Track competitor pricing across e-commerce sites
Review aggregation — Collect G2, Trustpilot, or Amazon reviews for sentiment analysis
Job board scraping — Extract job listings with titles, requirements, and salaries
News monitoring — Pull headlines and summaries from any news site
Research automation — Extract structured data from academic or government pages

📊 Performance

✅ Most pages: under 15 seconds
✅ Handles dynamic prompt structures — no hardcoding required
✅ Clear error messages for every failure mode
✅ Groq → Gemini fallback — resilient to API outages
✅ Failed runs are not charged — you only pay for successful extractions

📊 vs. Competitors

	XavvyNess Smart Extractor	Apify AI Web Scraper
Price	$25/1,000 pages	$25/1,000 pages
AI provider	Groq/Gemini (free tier)	OpenAI (paid)
Natural language prompts	✅	✅
Output formats	JSON, CSV, Markdown	JSON
Error messages	Specific, actionable	Generic

Integration

Via Apify JavaScript client

import { ApifyClient } from 'apify-client';

const client = new ApifyClient({ token: 'YOUR_APIFY_TOKEN' });

const run = await client.actor('IN4O5pGUjye34xW0O').call({
  urls: ['https://news.ycombinator.com/', 'https://producthunt.com/'],
  extractionPrompt: 'Extract all post titles, upvote counts, and URLs.',
  outputFormat: 'json',
  maxItems: 50,
});

const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach(result => {
  console.log(result.sourceUrl);  // URL scraped
  console.log(result.items);      // extracted data array
  console.log(result.itemCount);  // how many items found
});

Via HTTP API

curl -X POST \
  "https://api.apify.com/v2/acts/IN4O5pGUjye34xW0O/runs?token=YOUR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "urls": ["https://news.ycombinator.com/"],
    "extractionPrompt": "Extract all post titles and scores."
  }'

Via Make.com / Zapier

Use the Apify module → Run Actor action. Actor ID: IN4O5pGUjye34xW0O. Describe what to extract in plain English in the extractionPrompt field — no code required.

Built by XavvyNess — AI agent services that do real work.

Upwork Job Scraper – Ranked Leads, Client Scores & 59+ Filters

hyperbach/upwork-scraper-ai

Don't just scrape Upwork — rank it. Every job scored 0–100 on client pay, hire rate & spend, with 59+ filters to surface the jobs worth bidding on. Real-time new-jobs feed, AI skills & summaries, screening questions. For freelancers, agencies & job boards. $2/1,000, no monthly fee.

Hyperbach

5.0

(2)

AI Web Scraper

apify/ai-web-scraper

AI-first web scraper that extracts structured data from any website using natural-language prompts. No programming knowledge required. No hard-coded logic that breaks when a website changes.

Apify

4.3

(12)

Ai Web Scraper - Extract Data With Ease

eloquent_mountain/ai-web-scraper-extract-data-with-ease

Ai Web Scraper enables scraping for everyone, including non-techies! It uses Google's Gemini LLM to scrape websites with natural language commands. It dynamically extracts data, no selector input needed, handles dynamic content and cookie consent, avoids bot detection, outputs JSON or other formats.

Paco

1.4K

1.0

(2)

AI-Ready Web Content Crawler (LLM/RAG Optimized)

brilliant_gum/web-content-crawler

Deep-crawl websites and extract LLM-ready Markdown with OG tags, JSON-LD, author, dates, token estimates, native RAG chunking, language filtering, content-hash dedup, and per-page error reporting. Enforced timeouts. Zero silent failures.

Yuliia Kulakova

AI Web Scraper - Powered by Crawl4AI

raizen/ai-web-scraper

A blazing-fast AI web scraper powered by Crawl4AI. Perfect for LLMs, AI agents, AI automation, model training, sentiment analysis, and content generation. Supports deep crawling, multiple extraction strategies and flexible output (Markdown/JSON). Seamlessly integrates with Make.com, n8n, and Zapier.

Raizen Technology

355

1.0

(1)

AI Web Crawler

hounderd/ai-web-crawler

Crawl websites and extract clean, LLM-ready markdown content with stealth browser rendering, anti-bot hardening, smart content filtering, and structured metadata extraction. Built for RAG pipelines, AI agents, and data workflows.

Hounderd

AI Web Scraper

crawlworks/ai-web-scraper

Scrape any webpage with a URL and a plain-English prompt. Get structured JSON output powered by AI — no coding, no selectors, no configuration.

Crawlworks

Scrape GPT - Universal AI Web Scraper Agent

paradox-analytics/scrape-gpt---universal-ai-web-scraper-agent

AI-powered universal web scraper that works on ANY website without configuration. Extract data from e-commerce, news sites, social media, and more using intelligent LLM-based field mapping. Features JSON-first extraction, automatic pagination, anti-bot bypass, and cost-effective caching.

Paradox Analytics

Smart AI Web Scraper

cockroachapi/smart-ai-web-scraper

Unlock the power of Smart AI Web Scraper! Efficiently scrape dynamic content, simulate browser behavior, and extract targeted data.

Cockroach API

5.0

(2)

Best AI Web Scraper

hgservices/Best-AI-Web-Scraper

Extract any data from any website by simply describing what you want in plain English. AI-powered web scraping with no code, no selectors, and no per-site setup.