Convert Documentation Pages to Markdown for RAG

Created by

nezha

Actor

Website Content Extractor for RAG: Markdown, HTML, Text

Crawl a documentation site and extract clean Markdown, text, and metadata for RAG pipelines, vector databases, AI assistants, and internal search.

Website Content Extractor for RAG: Markdown, HTML, Textnezha/website-content-crawler

Url

Title

Description

Content format

+6 fields

Input

Website, Docs, or Help Center URLs(required)

url:https://docs.apify.com/academy

Max Pages to Extract:5

Page Discovery Method:auto

Link Depth:1

Target Scope Only:true

Main Content Format:markdown

Output fields

Url

Title

Description

Content format

Word count

Language

Canonical url

Depth

Http status code

Crawled at

How it works

Sign up on Apify01

Create your Apify account to access the Website Content Extractor for RAG: Markdown, HTML, Text.

Start the run02

The Actor will start running based on the input automatically.

Receive the output03

Monitor the progress in real-time. You will be notified as soon as your dataset is complete and ready for review.

Integrate into your workflow04

The final output is delivered in JSON, CSV, or Excel format, ready to be plugged into your workflow.

Integrate Actor directly into your workflow

Choose from one of 100+ integration options we provide or integrate via API

Webhook

n8n

Make

Zapier

Airbyte

Keboola

IFTTT

Hubspot

GDrive

Gmail

Apify MCP

GitHub

Slack

LangChain

LlamaIndex

Flowise

Pinecone

OpenAI

Mastra

Clay