Pricing

Pay per usage

Llm Ready Documentation Scraper

Developers and AI agents need to read documentation (e.g. Stripe Docs, Next.js Docs), but standard scrapers return noisy HTML that includes: navigation bars headers / footers ads / cookie banners This Actor must return pure content-only Markdown, suitable for vectorization and semantic search.

Pricing

Pay per usage

Rating

0.0

(0)

Developer

Sean

Actor stats

Bookmarked

Total users

Monthly active users

3 months ago

Last modified

Categories

Developer tools

Agents

You can access the Llm Ready Documentation Scraper programmatically from your own applications by using the Apify API. You can also choose the language preference from below. To use the Apify API, you’ll need an Apify account and your API token, found in Integrations settings in Apify Console.

Python

JavaScript

CLI

OpenAPI

HTTP

MCP

1from apify_client import ApifyClient
2
3# Initialize the ApifyClient with your Apify API token
4# Replace '<YOUR_API_TOKEN>' with your token.
5client = ApifyClient("<YOUR_API_TOKEN>")
6
7# Prepare the Actor input
8run_input = {
9    "startUrl": "https://crawlee.dev/docs/introduction",
10    "maxDepth": 10,
11    "maxPages": 100,
12    "includeGlobs": [],
13    "excludeGlobs": [
14        "**/blog/**",
15        "**/changelog/**",
16        "**/api-reference/**",
17    ],
18    "excludeElements": "nav, footer, .sidebar, script, style, .ads, header, .navigation, .menu, .toc, .breadcrumb, .edit-page, .feedback, .newsletter, aside",
19    "contentSelector": "main, article, .content, .documentation",
20    "mergeOutput": True,
21    "includeMetadata": True,
22}
23
24# Run the Actor and wait for it to finish
25run = client.actor("direct_duty/llm-ready-documentation-scraper").call(run_input=run_input)
26
27# Fetch and print Actor results from the run's dataset (if there are any)
28print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
29for item in client.dataset(run["defaultDatasetId"]).iterate_items():
30    print(item)
31
32# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

Llm Ready Documentation Scraper API in Python

The Apify API client for Python is the official library that allows you to use Llm Ready Documentation Scraper API in Python, providing convenience functions and automatic retries on errors.

Install the apify-client

$pip install apify-client

Other API clients include:

Llm Ready Documentation Scraper API in JavaScript

Llm Ready Documentation Scraper API through CLI

Llm Ready Documentation Scraper OpenAPI definition

Llm Ready Documentation Scraper API

Tech Docs to LLM-Ready Markdown

hedelka/tech-docs-scraper

Scrapes technical documentation sites (Docusaurus, GitBook, MkDocs, ReadTheDocs) and converts them to clean, structured Markdown for RAG pipelines, LLM training, and AI assistants. Automatically detects documentation framework and removes navigation elements.

Dmitry Goncharov

Docs Markdown Rag Ready Crawler

devwithbobby/docs-markdown-rag-ready-crawler

Turn any documentation site or website into clean, structured markdown—ready for RAG, embeddings, and AI agents.

Dev with Bobby

Docs To Rag

gabrielaxy/docs-to-rag

Transform documentation websites into RAG-ready chunks with semantic understanding, quality scoring, and direct vector database integration.

Gabriel Antony Xaviour

Zendesk to RAG Markdown Scraper

inclusive_insect/Zendesk-to-RAG-Markdown-Pipeline

Crawl any Zendesk Help Center and extract pristine, semantic Markdown optimized for LLMs, RAG pipelines, and Vector Databases. Automatically strips HTML junk, navigation bars, and footers to provide high-accuracy AI training data.

Gonds Studio

Docs to Markdown + AI Embeddings → Vector DB Crawler

badruddeen/docs-to-markdown-ai-embeddings---vector-db-crawler

Turn any documentation site into clean Markdown, intelligently chunked content with embeddings (Azure/OpenAI), and directly upsert into MongoDB Atlas, Pinecone, Weaviate, Qdrant, or Milvus — ready for RAG, AI assistants, and semantic search in minutes.

Badruddeen Naseem

5.0

Return Prediction API

vivid_astronaut/return-prediction

Fabio Suizu

Google Docs Mcp

aluminum_jam/google-docs-mcp

The Google Docs MCP Actor functions as a model context protocol server, facilitating interactions between AI assistants, automation tools, and Google Docs. It helps in linking AI models to Google Workspace, enabling intelligent document processing, content generation, and collaborative workflows.

anuj upadhyay

AI Markdown Maker

onescales/bulk-ai-markdown-maker

Convert any web page into clean, AI ready markdown format in seconds. Perfect for feeding content to AI models, creating documentation, or archiving web content in a portable format. In addition it intelligently parse web content, removing ads, navigation, and other clutter. Generate Markdown Today!

One Scales

5.0

Website Content to Markdown for LLM Training

easyapi/website-content-to-markdown-for-llm-training

🚀 Transform web content into clean, LLM-ready Markdown! 📘 Scrape multiple pages, extract main content, and convert to Markdown format. Perfect for AI researchers, data scientists, and LLM developers. Fast, efficient, and customizable. Supercharge your AI training data today! 🌐📝🧠