AI RAG Feeder V2
Pricing
$1.00 / 1,000 pages
Go to Apify Store
AI RAG Feeder V2
Turn any website into AI-ready Markdown. Scrapes entire domains, removes ads/clutter, and formats text specifically for RAG pipelines and LLM training data.
Pricing
$1.00 / 1,000 pages
Rating
0.0
(0)
Developer

Mickey Moore
Maintained by Community
Actor stats
0
Bookmarked
4
Total users
2
Monthly active users
a month ago
Last modified
Categories
Share
AI RAG Feeder V2 is a specialized scraper designed to feed data into LLM (Large Language Model) and RAG (Retrieval-Augmented Generation) pipelines. It navigates websites and converts the HTML content into clean, token-efficient Markdown.
✨ Features
- Clean Markdown Extraction: Automatically removes ads, navbars, and footers to save tokens.
- Recursive Crawling: Can follow links to scrape entire documentation sites.
- Smart Formatting: Preserves headers, code blocks, and tables for better embedding quality.
- Proxy Support: Built-in rotation to avoid IP blocking.
🚀 How to use
- Start URLs: Enter the list of URLs you want to scrape.
- Max Depth: Set how deep the crawler should go (e.g.,
1for direct links,0for just the page). - Run: The actor will output a JSON dataset ready for vector databases.
📦 Output
The results are stored in the default Apify dataset. Each item contains:
{"url": "[https://example.com/docs](https://example.com/docs)","title": "Documentation","markdown": "# Documentation\n\nThis is the clean text...","metadata": { "depth": 1 }}