Docs & Help Center to RAG JSONL
Pricing
$1.00 / 1,000 docs page processeds
Docs & Help Center to RAG JSONL
Paste a docs or help center URL and get clean Markdown, breadcrumbs, page records, and JSONL chunks for RAG.
Pricing
$1.00 / 1,000 docs page processeds
Rating
0.0
(0)
Developer
Orbiscribe Labs
Maintained by CommunityActor stats
0
Bookmarked
2
Total users
1
Monthly active users
2 days ago
Last modified
Categories
Share
Use this Actor when you need a chatbot-ready snapshot of public documentation, a help center, or a developer docs site.
It crawls from known docs URLs, extracts readable article content, keeps breadcrumb-style context, detects common documentation platforms, and writes Markdown plus JSONL chunks.
Why use this instead of a generic crawler?
Generic website crawlers are useful when you need every reachable page. This Actor is for docs and help center ingestion, where the useful output is clean article text, stable chunks, breadcrumbs, and a quick page inventory.
- paste a docs or help center URL
- keep the crawl focused with
includeUrlPatterns - start with a small live default run
- export
DOCS_RAG_CHUNKS_JSONLdirectly to a vector pipeline - pay a low flat page price instead of guessing compute-unit cost
What you get
- Dataset rows for docs pages and chunks.
- Clean Markdown, main text, headings, links, canonical URL, content hash, platform hint, and inferred breadcrumbs.
- Key-value outputs:
RAG_CHUNKS_JSONL,DOCS_RAG_CHUNKS_JSONL,DOC_PAGES,BREADCRUMB_INDEX,DOCS_URL_INVENTORY,DOCS_MARKDOWN_BUNDLE,BUYER_BRIEF, andRUN_SUMMARY.
Common workflows
- Build a support bot from public help center articles.
- Snapshot developer docs before loading them into a vector database.
- Export docs pages with breadcrumb metadata for citations.
- Schedule repeat docs crawls and compare artifacts downstream.
Input
Provide one or more docsUrls. Use maxDepth and sameDomainOnly to control crawl breadth. Use platformHint when you know the docs platform and want that stored with the output.
The default input runs a tiny live Apify docs sample:
{"docsUrls": [{ "url": "https://docs.apify.com/academy/getting-started" },{ "url": "https://docs.apify.com/academy/web-scraping-for-beginners" },{ "url": "https://docs.apify.com/academy/actor-marketing-playbook/actor-basics/actor-description" }],"includeUrlPatterns": ["/academy/"],"excludeUrlPatterns": [],"maxPages": 5,"maxDepth": 1,"sameDomainOnly": true,"dryRun": false}
Use dryRun: true when you want bundled demo records without crawling live
pages or calling custom pay-per-event charges.
Pricing
Recommended monetization: Pay per Event at $0.001 per docs-rag-page.
That is $1 per 1,000 processed docs pages, plus normal Apify platform usage. When pay-per-event pricing is enabled, dry runs are uncharged and free-plan callers get the first 25 processed sources without this Actor's custom event charge. Users should still set Apify spending limits before large crawls.
Limits and compliance
Public docs only. This Actor does not bypass logins, paywalls, robots policies, or access controls. Output quality depends on the site HTML structure and navigation.