AI RAG Feeder V2 avatar
AI RAG Feeder V2

Pricing

$1.00 / 1,000 pages

Go to Apify Store
AI RAG Feeder V2

AI RAG Feeder V2

Turn any website into AI-ready Markdown. Scrapes entire domains, removes ads/clutter, and formats text specifically for RAG pipelines and LLM training data.

Pricing

$1.00 / 1,000 pages

Rating

0.0

(0)

Developer

Mickey Moore

Mickey Moore

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

3 days ago

Last modified

Share

🤖 AI RAG Feeder (HTML to Markdown)

Turn any website into clean, AI-ready training data.

This Actor is designed specifically for RAG (Retrieval Augmented Generation) pipelines. It scrapes websites and converts the messy HTML into clean, structured Markdown, which is the preferred format for LLMs like GPT-4, Claude, and Llama 3.

🚀 Features

  • Smart Cleaning: Removes navigation bars, footers, ads, and scripts.
  • Markdown Output: Delivers pure text optimized for vector databases.
  • Fast Crawling: built on the Crawlee engine for high performance.

📦 Output Data

The actor stores results in a dataset with the following fields:

  • url: The source URL.
  • title: The page title.
  • markdown: The full clean text content.

💡 Use Cases

  • Custom GPTs: Train a chatbot on your company documentation.
  • RAG Pipelines: Feed clean data into Pinecone/Weaviate.
  • Content Migration: Convert legacy CMS content to Markdown.