Web Scraper For Llms avatar

Web Scraper For Llms

Pricing

from $3.00 / 1,000 results

Go to Apify Store
Web Scraper For Llms

Web Scraper For Llms

Stealth web scraping engine built for LLMs. Converts any web page to clean markdown or HTML

Pricing

from $3.00 / 1,000 results

Rating

0.0

(0)

Developer

AbotAPI

AbotAPI

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

3 days ago

Last modified

Share

Stealth web scraping engine built for LLMs. Converts any web page to clean markdown or HTML, ready for RAG pipelines, AI knowledge bases, and content analysis. Automatically bypasses Cloudflare and anti-bot protection using a stealth browser with undetectable fingerprints.

Quick Start

Scrape a list of URLs:

{
"urls": ["https://example.com", "https://medium.com/"]
}

Crawl a website and scrape all discovered pages:

{
"urls": ["https://docs.example.com"],
"crawl": true,
"crawlDepth": 2,
"crawlMaxPages": 50
}

Input Parameters

ParameterTypeDefaultDescription
urlsArrayrequiredURLs to scrape or crawl from
crawlBooleanfalseFollow links to discover additional pages
crawlDepthInteger1Link hops from seed URL (crawl only)
crawlMaxPagesInteger20Max pages to discover per seed (crawl only)
formatsArray["markdown"]Output formats: markdown, html, or both
concurrencyInteger3Parallel URL processing
maxRetriesInteger2Retry attempts for failed URLs (scrape only)
timeoutMsInteger30000Timeout per URL in milliseconds
onlyMainContentBooleantrueStrip nav/header/footer/sidebar (scrape only)
removeAdsBooleantrueRemove ads and tracking elements
removeBase64ImagesBooleantrueRemove inline base64 images
includeTagsArray-CSS selectors to keep (scrape only)
excludeTagsArray-CSS selectors to remove (scrape only)
includePatternsArray-Regex URL filters (include only matching)
excludePatternsArray-Regex URL filters (skip matching)
waitForSelectorString-Wait for CSS selector before extraction (scrape only)
proxyConfigurationObject-Apify proxy settings

Output

{
"url": "https://medium.com/",
"title": "Medium: Read and write stories.",
"description": null,
"markdown": "## Human stories & ideas\n\nA place to read, write, and deepen your understanding...",
"html": null,
"metadata": {
"title": "Medium: Read and write stories.",
"language": "en",
"favicon": "https://miro.medium.com/...",
"canonical": "https://medium.com/",
"openGraph": null,
"twitter": null
},
"duration": 5725,
"scrapedAt": "2026-02-24T03:36:28.990Z",
"success": true,
"error": null
}

Use Cases

  • RAG pipelines - Feed clean markdown into LLM knowledge bases
  • Content monitoring - Track changes across a set of pages
  • Research - Bulk extract articles, documentation, or product pages
  • Site migration - Crawl and export an entire site as markdown
  • Data extraction - Scrape structured content from specific CSS selectors