Web Scraper For Llms
Pricing
from $3.00 / 1,000 results
Go to Apify Store
Web Scraper For Llms
Stealth web scraping engine built for LLMs. Converts any web page to clean markdown or HTML
Pricing
from $3.00 / 1,000 results
Rating
0.0
(0)
Developer

AbotAPI
Maintained by Community
Actor stats
0
Bookmarked
2
Total users
1
Monthly active users
3 days ago
Last modified
Categories
Share
Stealth web scraping engine built for LLMs. Converts any web page to clean markdown or HTML, ready for RAG pipelines, AI knowledge bases, and content analysis. Automatically bypasses Cloudflare and anti-bot protection using a stealth browser with undetectable fingerprints.
Quick Start
Scrape a list of URLs:
{"urls": ["https://example.com", "https://medium.com/"]}
Crawl a website and scrape all discovered pages:
{"urls": ["https://docs.example.com"],"crawl": true,"crawlDepth": 2,"crawlMaxPages": 50}
Input Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
urls | Array | required | URLs to scrape or crawl from |
crawl | Boolean | false | Follow links to discover additional pages |
crawlDepth | Integer | 1 | Link hops from seed URL (crawl only) |
crawlMaxPages | Integer | 20 | Max pages to discover per seed (crawl only) |
formats | Array | ["markdown"] | Output formats: markdown, html, or both |
concurrency | Integer | 3 | Parallel URL processing |
maxRetries | Integer | 2 | Retry attempts for failed URLs (scrape only) |
timeoutMs | Integer | 30000 | Timeout per URL in milliseconds |
onlyMainContent | Boolean | true | Strip nav/header/footer/sidebar (scrape only) |
removeAds | Boolean | true | Remove ads and tracking elements |
removeBase64Images | Boolean | true | Remove inline base64 images |
includeTags | Array | - | CSS selectors to keep (scrape only) |
excludeTags | Array | - | CSS selectors to remove (scrape only) |
includePatterns | Array | - | Regex URL filters (include only matching) |
excludePatterns | Array | - | Regex URL filters (skip matching) |
waitForSelector | String | - | Wait for CSS selector before extraction (scrape only) |
proxyConfiguration | Object | - | Apify proxy settings |
Output
{"url": "https://medium.com/","title": "Medium: Read and write stories.","description": null,"markdown": "## Human stories & ideas\n\nA place to read, write, and deepen your understanding...","html": null,"metadata": {"title": "Medium: Read and write stories.","language": "en","favicon": "https://miro.medium.com/...","canonical": "https://medium.com/","openGraph": null,"twitter": null},"duration": 5725,"scrapedAt": "2026-02-24T03:36:28.990Z","success": true,"error": null}
Use Cases
- RAG pipelines - Feed clean markdown into LLM knowledge bases
- Content monitoring - Track changes across a set of pages
- Research - Bulk extract articles, documentation, or product pages
- Site migration - Crawl and export an entire site as markdown
- Data extraction - Scrape structured content from specific CSS selectors