Universal Knowledge Base Scraper (RAG Ready)
Pricing
$49.00/month + usage
Universal Knowledge Base Scraper (RAG Ready)
Turn any Help Center into LLM-ready Markdown. Supports Zendesk, Intercom, Docusaurus, and generic sites. Perfect for RAG and AI Agents.
Pricing
$49.00/month + usage
Rating
0.0
(0)
Developer

Actums
Actor stats
0
Bookmarked
2
Total users
1
Monthly active users
2 days ago
Last modified
Categories
Share
๐ง Universal Knowledge Base Scraper (RAG Ready)
Feed your AI Agents with clean, structured Markdown. Stop feeding them HTML garbage.
๐ What is Universal RAG Scraper?
Universal RAG Scraper is an "ETL-in-a-Box" for AI Developers. It turns messy Help Centers (Zendesk, Intercom, Docusaurus, Notion) into pure, train-ready Markdown (.md) files.
If you are building RAG Pipelines (Retrieval-Augmented Generation) or AI Agents, you know that HTML noise (navbars, footers, cookie banners) ruins your vector embeddings. This Actor solves that problem instantly.
Why not just use a generic scraper?
Generic scrapers give you the page. We give you the content.
- Auto-Detect: We identify the platform (e.g., Zendesk) and apply surgical clean-up rules.
- Markdown Native: We don't just "strip tags"; we convert tables, lists, and code blocks into perfect Markdown.
- Metadata Rich: We extract the Title, URL, and Last Updated Date for your Vector DB.
โก Enterprise-Grade Features
Built for scale and reliability:
- ๐ก๏ธ Zero-Config Proxies: Scrape protected Help Centers without getting 403 Blocked. Request rotation is built-in.
- โฐ Auto-Sync Scheduling: Set it to run every Friday night. Keep your RAG Knowledge Base in sync with your product docs automatically.
- ๐พ Infinite Storage: Scrape 10,000 pages or 10 million. All data is stored, indexed, and ready for export (JSON, CSV, Excel).
- ๐ Native Integrations: Pipe the Markdown directly to Pinecone, LangChain, or Zapier. No glue code needed.
๐ฏ Supported Platforms (Auto-Detected)
| Platform | Capability |
|---|---|
| Zendesk | Full support. Strips "Related Articles" & sidebars. |
| Intercom | Full support. Handles dynamic loading. |
| Docusaurus | Perfect for V2/V3 docs. Preserves code block languages. |
| Notion | Scrapes public Notion Knowledge Bases. |
| Generic | Smart Fallback: If we don't recognize the platform, we use advanced readability algorithms to extract the main content. |
๐ How to scrape a Knowledge Base in 3 steps
- Paste the URL: Go to the input tab and enter the URL of the Help Center home page (e.g.,
https://support.zoom.us/hc/en-us). - Set Depth: Choose how many links to follow (default: 2 levels deep).
- Run: Click "Start". In minutes, you can download a JSON file containing all articles in Markdown.
๐ฐ Pricing & Usage
This is a Rental Actor.
- Free Trial: You can test the scraper for a limited time to verify the Markdown quality.
- Rental Plan: Access unlimited scale, high-frequency scheduling, and priority support.
Cost Estimation:
- Scraping a typical Help Center (500 pages) takes ~5-10 minutes.
- The output is "Vector Ready" - no post-processing costs.
๐ค Input & Output
Input Configuration
Simple, developer-friendly input:
{"startUrls": [ { "url": "https://docs.apify.com" } ],"maxDepth": 10,"outputFormat": "markdown"}
Output (JSON/Dataset)
Each item in the dataset is one article:
{"url": "https://docs.apify.com/academy/web-scraping","title": "Web Scraping Academy","platform": "Docusaurus","scrapedAt": "2023-10-27T10:00:00Z","markdown": "# Web Scraping Academy\n\nLearn how to scrape..."}
โ FAQ
Can I scrape a custom-built Help Center?
Yes. The Actor uses a "Smart Fallback" (Readability algorithm). If it doesn't detect Zendesk/Intercom, it will still scan the page, identify the visual "main content" area, and extract it.
Does this handle dynamic Javascript sites?
Yes. We use Playwright (headless browser) under the hood. We render the full page, execute JavaScript, and then scrape. This works even on React/Vue/Angular apps.
How do I feed this into my LLM?
- Run the Actor.
- Download the
JSONoutput. - Use the
markdownfield as thecontentin your LLM Prompt or Embedding request.
๐ Support & Feedback
Found a site we can't scrape? Missing a platform?
- Report a Bug: Use the "Issues" tab.
- Request a Feature: We add new Platforms (e.g., Gitbook, ReadTheDocs) based on user votes!