Docusaurus Docs Chunker
Created by
Stas Persiianenko
Actor
Docs-to-RAG Crawler
Chunk Docusaurus documentation pages into markdown records for embeddings, including headings, URLs, token estimates, and content.
Docs-to-RAG Crawlerautomation-lab/docs-rag-crawler
Chunk ID
Title
URL
Words
+4 fieldsTextNumberBooleanListObject
Input
π Documentation URL(required):https://docusaurus.io/docs
π¦ Max pages:80
π» Include code blocks:true
βοΈ Chunk mode:heading
π Max chunk size (words):250
π« Exclude URL patterns:*/blog/*+1
Output fields
Chunk ID
Title
URL
Words
Characters
Est. tokens
Breadcrumb
Markdown
Sign up on Apify01
Create your Apify account to access the Docs-to-RAG Crawler.
Start the run02
The Actor will start running based on the input automatically.
Receive the output03
Monitor the progress in real-time. You will be notified as soon as your dataset is complete and ready for review.
Integrate into your workflow04
The final output is delivered in JSON, CSV, or Excel format, ready to be plugged into your workflow.
