URL List to RAG & Vector JSONL
Pricing
$1.00 / 1,000 url converted to vector jsonls
URL List to RAG & Vector JSONL
Paste a curated URL list and get clean Markdown, document JSONL, vector chunks, ingest manifest, and failed URL report.
Pricing
$1.00 / 1,000 url converted to vector jsonls
Rating
0.0
(0)
Developer
Orbiscribe Labs
Maintained by CommunityActor stats
0
Bookmarked
2
Total users
1
Monthly active users
2 days ago
Last modified
Categories
Share
Use this Actor when you already know the URLs you want to ingest and need a controlled conversion step for a vector database or RAG pipeline.
It fetches each public URL, extracts readable content, creates stable chunks, and writes JSONL artifacts plus a manifest so failed URLs are easy to inspect. It does not crawl around the site unless you explicitly provide more URLs.
What you get
- Dataset rows for document records and chunks.
- Clean Markdown, main text, headings, links, canonical URL, content hash, and output preset metadata.
- Key-value outputs:
RAG_CHUNKS_JSONL,VECTOR_CHUNKS_JSONL,DOCUMENTS_JSONL,INGEST_MANIFEST,FAILED_URLS,MARKDOWN_BUNDLE,BUYER_BRIEF, andRUN_SUMMARY.
Common workflows
- Convert a curated URL export into vector-store JSONL.
- Reprocess a known page list without a crawler wandering through the site.
- Send failed URLs to a cleanup queue.
- Keep document and chunk records side by side for debugging retrieval.
Input
Provide urls and choose an outputPreset such as openai_vector_store, langchain, llamaindex, pinecone, qdrant, or generic_jsonl. The preset is included in chunk metadata so downstream jobs can route or transform the JSONL.
Use includeUrlPatterns, excludeUrlPatterns, maxUrls, chunkSizeChars, and chunkOverlapChars to control scope, cost, and chunk shape. The default run processes three public Apify docs URLs so a first Store run produces real Markdown and JSONL without extra setup.
{"urls": [{ "url": "https://docs.apify.com/academy/getting-started" },{ "url": "https://docs.apify.com/academy/web-scraping-for-beginners" },{ "url": "https://docs.apify.com/academy/actor-marketing-playbook/actor-basics/actor-description" }],"outputPreset": "openai_vector_store","includeUrlPatterns": ["/academy/"],"excludeUrlPatterns": [],"maxUrls": 3,"chunkSizeChars": 2500,"chunkOverlapChars": 250,"dryRun": false}
Pricing
Recommended monetization: Pay per Event at $0.001 per vector-jsonl-url.
When pay-per-event pricing is enabled, dry runs are uncharged and free-plan callers get the first 25 processed sources without this Actor's custom event charge. Users should still set Apify spending limits before large batches.
Limits and compliance
Public URLs only. This Actor does not bypass logins, paywalls, robots policies, or access controls. It is intentionally a URL-list converter, not a broad crawler.