XavvyNess AI Web Extractor avatar

XavvyNess AI Web Extractor

Pricing

from $25.00 / 1,000 ai-extracted pages

Go to Apify Store
XavvyNess AI Web Extractor

XavvyNess AI Web Extractor

Extract data from any website using plain English — no CSS selectors, no code. Describe what you want, get JSON, CSV, or Markdown back. Works even when site layouts change. Example: 'Extract job titles, company names, and salaries'. Support email: hello@xavvyness.ai

Pricing

from $25.00 / 1,000 ai-extracted pages

Rating

0.0

(0)

Developer

XavvyNess

XavvyNess

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

10 hours ago

Last modified

Share

🤖 XavvyNess Smart Extractor — Natural Language Web Scraping

Extract structured data from any website using plain English. No code, no XPath, no CSS selectors. Just describe what you want and get clean JSON, CSV, or Markdown back. Works even when websites change their HTML.

Same price as Apify's official AI Web Scraper ($25/1,000 pages) — but with JSON, CSV, and Markdown output, plus specific error messages instead of generic failures.

Demo

🎬 Video demo coming soon. Upload smart-extractor.mp4 to YouTube, then run python3 scripts/actor-video-gen.py --embed-readmes to embed it here automatically.


🚀 What It Does

  1. Crawls any webpage and extracts clean text content
  2. Sends the content + your extraction prompt to AI
  3. Returns structured data in the format you choose (JSON, CSV, Markdown)

Perfect for: lead generation, price monitoring, content aggregation, data pipelines, research automation


📥 Input

FieldRequiredDefaultDescription
urlsURLs to extract data from
extractionPromptPlain English description of what to extract
outputFormatjsonjson / csv / markdown
maxItems50Maximum items to extract per page (1-500)

Example inputs:

{
"urls": ["https://news.ycombinator.com/"],
"extractionPrompt": "Extract all post titles, point scores, and comment counts. Return as a list.",
"outputFormat": "json",
"maxItems": 30
}
{
"urls": ["https://www.g2.com/products/hubspot/reviews"],
"extractionPrompt": "Extract reviewer name, star rating, review title, and the pros and cons mentioned in each review.",
"outputFormat": "json"
}

📤 Output (JSON format)

Real output from a live run on Hacker News:

{
"sourceUrl": "https://news.ycombinator.com",
"extractionPrompt": "Extract top 10 story titles with their point scores and comment counts",
"items": [
{ "title": "I ported Mac OS X to the Nintendo Wii", "points": 1032, "comments": 194 },
{ "title": "Git commands I run before reading any code", "points": 1653, "comments": 355 },
{ "title": "Veracrypt project update", "points": 1077, "comments": 404 },
{ "title": "They're made out of meat (1991)", "points": 348, "comments": 99 },
{ "title": "ML promises to be profoundly weird", "points": 314, "comments": 359 },
{ "title": "Muse Spark: Scaling towards personal superintelligence", "points": 214, "comments": 257 },
{ "title": "Understanding the Kalman filter with a simple radar example", "points": 156, "comments": 25 },
{ "title": "USB for Software Developers", "points": 104, "comments": 15 },
{ "title": "Expanding Swift's IDE Support", "points": 55, "comments": 30 },
{ "title": "Pgit: I Imported the Linux Kernel into PostgreSQL", "points": 47, "comments": 4 }
],
"itemCount": 10,
"totalFound": 10,
"outputFormat": "json",
"extractedAt": "2026-04-08T22:22:20.139Z",
"agent": "XavvyNess Smart Extractor"
}

💡 Writing Good Extraction Prompts

Be specific about what fields you want and their types:

❌ Vague✅ Specific
"Get the jobs""Extract job title, company name, location, and salary range for each listing"
"Scrape reviews""Extract reviewer name, star rating (1-5), and the main complaint from each review"
"Get prices""Extract product name, original price, discounted price, and stock status"

⚙️ Setup — API Keys

VariableRequiredWhere to Get
GROQ_API_KEYRecommended (free)console.groq.com
GOOGLE_API_KEYOptional fallbackaistudio.google.com

❓ FAQ

Q: What if the site uses JavaScript rendering (React/Vue/Angular)?
A: The actor uses CheerioCrawler which handles static HTML. For JS-heavy SPAs, the extracted text may be limited. For React apps, try URLs that serve server-side rendered content.

Q: What if the site blocks the crawler (403)?
A: You'll get a clear error message: "Access denied (403) — site blocks automated requests". Try again with a different URL from the same site, or contact us about proxy options.

Q: Can I extract from multiple pages at once?
A: Yes — add multiple URLs to the urls array. Each page is processed independently with the same extraction prompt.

Q: How is this different from a normal scraper?
A: A normal scraper needs hard-coded CSS selectors that break when the site updates. This actor uses AI to understand the content structure — it adapts automatically.


🔗 Use Cases

  1. Lead generation — Extract company names, emails, and phone numbers from directories
  2. Price monitoring — Track competitor pricing across e-commerce sites
  3. Review aggregation — Collect G2, Trustpilot, or Amazon reviews for sentiment analysis
  4. Job board scraping — Extract job listings with titles, requirements, and salaries
  5. News monitoring — Pull headlines and summaries from any news site
  6. Research automation — Extract structured data from academic or government pages

📊 Performance

  • ✅ Most pages: under 15 seconds
  • ✅ Handles dynamic prompt structures — no hardcoding required
  • ✅ Clear error messages for every failure mode
  • ✅ Groq → Gemini fallback — resilient to API outages
  • Failed runs are not charged — you only pay for successful extractions

📊 vs. Competitors

XavvyNess Smart ExtractorApify AI Web Scraper
Price$25/1,000 pages$25/1,000 pages
AI providerGroq/Gemini (free tier)OpenAI (paid)
Natural language prompts
Output formatsJSON, CSV, MarkdownJSON
Error messagesSpecific, actionableGeneric

Integration

Via Apify JavaScript client

import { ApifyClient } from 'apify-client';
const client = new ApifyClient({ token: 'YOUR_APIFY_TOKEN' });
const run = await client.actor('IN4O5pGUjye34xW0O').call({
urls: ['https://news.ycombinator.com/', 'https://producthunt.com/'],
extractionPrompt: 'Extract all post titles, upvote counts, and URLs.',
outputFormat: 'json',
maxItems: 50,
});
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach(result => {
console.log(result.sourceUrl); // URL scraped
console.log(result.items); // extracted data array
console.log(result.itemCount); // how many items found
});

Via HTTP API

curl -X POST \
"https://api.apify.com/v2/acts/IN4O5pGUjye34xW0O/runs?token=YOUR_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"urls": ["https://news.ycombinator.com/"],
"extractionPrompt": "Extract all post titles and scores."
}'

Via Make.com / Zapier

Use the Apify module → Run Actor action. Actor ID: IN4O5pGUjye34xW0O. Describe what to extract in plain English in the extractionPrompt field — no code required.


Built by XavvyNess — AI agent services that do real work.