XavvyNess AI Web Extractor
Pricing
from $25.00 / 1,000 ai-extracted pages
XavvyNess AI Web Extractor
Extract data from any website using plain English — no CSS selectors, no code. Describe what you want, get JSON, CSV, or Markdown back. Works even when site layouts change. Example: 'Extract job titles, company names, and salaries'. Support email: hello@xavvyness.ai
Pricing
from $25.00 / 1,000 ai-extracted pages
Rating
0.0
(0)
Developer
XavvyNess
Actor stats
0
Bookmarked
2
Total users
1
Monthly active users
10 hours ago
Last modified
Categories
Share
🤖 XavvyNess Smart Extractor — Natural Language Web Scraping
Extract structured data from any website using plain English. No code, no XPath, no CSS selectors. Just describe what you want and get clean JSON, CSV, or Markdown back. Works even when websites change their HTML.
Same price as Apify's official AI Web Scraper ($25/1,000 pages) — but with JSON, CSV, and Markdown output, plus specific error messages instead of generic failures.
Demo
🎬 Video demo coming soon. Upload
smart-extractor.mp4to YouTube, then runpython3 scripts/actor-video-gen.py --embed-readmesto embed it here automatically.
🚀 What It Does
- Crawls any webpage and extracts clean text content
- Sends the content + your extraction prompt to AI
- Returns structured data in the format you choose (JSON, CSV, Markdown)
Perfect for: lead generation, price monitoring, content aggregation, data pipelines, research automation
📥 Input
| Field | Required | Default | Description |
|---|---|---|---|
urls | ✅ | — | URLs to extract data from |
extractionPrompt | ✅ | — | Plain English description of what to extract |
outputFormat | — | json | json / csv / markdown |
maxItems | — | 50 | Maximum items to extract per page (1-500) |
Example inputs:
{"urls": ["https://news.ycombinator.com/"],"extractionPrompt": "Extract all post titles, point scores, and comment counts. Return as a list.","outputFormat": "json","maxItems": 30}
{"urls": ["https://www.g2.com/products/hubspot/reviews"],"extractionPrompt": "Extract reviewer name, star rating, review title, and the pros and cons mentioned in each review.","outputFormat": "json"}
📤 Output (JSON format)
Real output from a live run on Hacker News:
{"sourceUrl": "https://news.ycombinator.com","extractionPrompt": "Extract top 10 story titles with their point scores and comment counts","items": [{ "title": "I ported Mac OS X to the Nintendo Wii", "points": 1032, "comments": 194 },{ "title": "Git commands I run before reading any code", "points": 1653, "comments": 355 },{ "title": "Veracrypt project update", "points": 1077, "comments": 404 },{ "title": "They're made out of meat (1991)", "points": 348, "comments": 99 },{ "title": "ML promises to be profoundly weird", "points": 314, "comments": 359 },{ "title": "Muse Spark: Scaling towards personal superintelligence", "points": 214, "comments": 257 },{ "title": "Understanding the Kalman filter with a simple radar example", "points": 156, "comments": 25 },{ "title": "USB for Software Developers", "points": 104, "comments": 15 },{ "title": "Expanding Swift's IDE Support", "points": 55, "comments": 30 },{ "title": "Pgit: I Imported the Linux Kernel into PostgreSQL", "points": 47, "comments": 4 }],"itemCount": 10,"totalFound": 10,"outputFormat": "json","extractedAt": "2026-04-08T22:22:20.139Z","agent": "XavvyNess Smart Extractor"}
💡 Writing Good Extraction Prompts
Be specific about what fields you want and their types:
| ❌ Vague | ✅ Specific |
|---|---|
| "Get the jobs" | "Extract job title, company name, location, and salary range for each listing" |
| "Scrape reviews" | "Extract reviewer name, star rating (1-5), and the main complaint from each review" |
| "Get prices" | "Extract product name, original price, discounted price, and stock status" |
⚙️ Setup — API Keys
| Variable | Required | Where to Get |
|---|---|---|
GROQ_API_KEY | Recommended (free) | console.groq.com |
GOOGLE_API_KEY | Optional fallback | aistudio.google.com |
❓ FAQ
Q: What if the site uses JavaScript rendering (React/Vue/Angular)?
A: The actor uses CheerioCrawler which handles static HTML. For JS-heavy SPAs, the extracted text may be limited. For React apps, try URLs that serve server-side rendered content.
Q: What if the site blocks the crawler (403)?
A: You'll get a clear error message: "Access denied (403) — site blocks automated requests". Try again with a different URL from the same site, or contact us about proxy options.
Q: Can I extract from multiple pages at once?
A: Yes — add multiple URLs to the urls array. Each page is processed independently with the same extraction prompt.
Q: How is this different from a normal scraper?
A: A normal scraper needs hard-coded CSS selectors that break when the site updates. This actor uses AI to understand the content structure — it adapts automatically.
🔗 Use Cases
- Lead generation — Extract company names, emails, and phone numbers from directories
- Price monitoring — Track competitor pricing across e-commerce sites
- Review aggregation — Collect G2, Trustpilot, or Amazon reviews for sentiment analysis
- Job board scraping — Extract job listings with titles, requirements, and salaries
- News monitoring — Pull headlines and summaries from any news site
- Research automation — Extract structured data from academic or government pages
📊 Performance
- ✅ Most pages: under 15 seconds
- ✅ Handles dynamic prompt structures — no hardcoding required
- ✅ Clear error messages for every failure mode
- ✅ Groq → Gemini fallback — resilient to API outages
- ✅ Failed runs are not charged — you only pay for successful extractions
📊 vs. Competitors
| XavvyNess Smart Extractor | Apify AI Web Scraper | |
|---|---|---|
| Price | $25/1,000 pages | $25/1,000 pages |
| AI provider | Groq/Gemini (free tier) | OpenAI (paid) |
| Natural language prompts | ✅ | ✅ |
| Output formats | JSON, CSV, Markdown | JSON |
| Error messages | Specific, actionable | Generic |
Integration
Via Apify JavaScript client
import { ApifyClient } from 'apify-client';const client = new ApifyClient({ token: 'YOUR_APIFY_TOKEN' });const run = await client.actor('IN4O5pGUjye34xW0O').call({urls: ['https://news.ycombinator.com/', 'https://producthunt.com/'],extractionPrompt: 'Extract all post titles, upvote counts, and URLs.',outputFormat: 'json',maxItems: 50,});const { items } = await client.dataset(run.defaultDatasetId).listItems();items.forEach(result => {console.log(result.sourceUrl); // URL scrapedconsole.log(result.items); // extracted data arrayconsole.log(result.itemCount); // how many items found});
Via HTTP API
curl -X POST \"https://api.apify.com/v2/acts/IN4O5pGUjye34xW0O/runs?token=YOUR_TOKEN" \-H "Content-Type: application/json" \-d '{"urls": ["https://news.ycombinator.com/"],"extractionPrompt": "Extract all post titles and scores."}'
Via Make.com / Zapier
Use the Apify module → Run Actor action. Actor ID: IN4O5pGUjye34xW0O. Describe what to extract in plain English in the extractionPrompt field — no code required.
Built by XavvyNess — AI agent services that do real work.