Pricing

from $5.00 / 1,000 text chunkeds

Text Splitter & Chunker for RAG / LLMs

Split text into clean, overlapping chunks ready for embeddings, vector databases, RAG and LLM context. Configurable size, overlap, and split strategy.

Pricing

from $5.00 / 1,000 text chunkeds

Rating

0.0

(0)

Developer

Rosario Vitale

Actor stats

Bookmarked

Total users

Monthly active users

2 months ago

Last modified

Categories

Developer tools

Automation

You can access the Text Splitter & Chunker for RAG / LLMs programmatically from your own applications by using the Apify API. You can also choose the language preference from below. To use the Apify API, you’ll need an Apify account and your API token, found in Integrations settings in Apify Console.

Python

JavaScript

CLI

OpenAPI

HTTP

MCP

1import { ApifyClient } from 'apify-client';
2
3// Initialize the ApifyClient with your Apify API token
4// Replace the '<YOUR_API_TOKEN>' with your token
5const client = new ApifyClient({
6    token: '<YOUR_API_TOKEN>',
7});
8
9// Prepare Actor input
10const input = {
11    "text": `Retrieval-Augmented Generation (RAG) combines a language model with an external knowledge base. Instead of relying only on what the model memorized during training, RAG retrieves relevant chunks of text and feeds them to the model as context.
12
13To build a RAG system you first split your documents into chunks, create embeddings for each chunk, and store them in a vector database. At query time you embed the user's question, find the most similar chunks, and pass them to the model alongside the prompt.
14
15Chunking matters a lot. Chunks that are too large dilute relevance and waste tokens, while chunks that are too small lose context. A common starting point is around 1000 characters per chunk with a small overlap, so that ideas spanning a boundary are not lost between neighbouring chunks.`
16};
17
18// Run the Actor and wait for it to finish
19const run = await client.actor("zenomastro/text-splitter-for-llm").call(input);
20
21// Fetch and print Actor results from the run's dataset (if any)
22console.log('Results from dataset');
23console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
24const { items } = await client.dataset(run.defaultDatasetId).listItems();
25items.forEach((item) => {
26    console.dir(item);
27});
28
29// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

Text Chunker for RAG, Embeddings & LLMs API in JavaScript

The Apify API client for JavaScript is the official library that allows you to use Text Splitter & Chunker for RAG / LLMs API in JavaScript or TypeScript, providing convenience functions and automatic retries on errors.

Install the apify-client

$npm install apify-client

Other API clients include:

Text Splitter & Chunker for RAG / LLMs API in Python

Text Splitter & Chunker for RAG / LLMs API through CLI

Text Splitter & Chunker for RAG / LLMs OpenAPI definition

Text Splitter & Chunker for RAG / LLMs API

Text Chunker: Split Text & Documents into Chunks for RAG

raional/text-chunker

Split long text or documents into properly sized, sentence-aware chunks with overlap for embeddings, vector databases, and RAG pipelines. Choose recursive, sentence-boundary, or fixed-token chunking. Fetch from URLs or paste text directly. Powered by Chonkie.

Raion Al

RAG-Ready Markdown Converter & Chunker

foxpink/apify-rag-markdown-chunker

Convert raw HTML/text into clean Markdown and split into ready-to-ingest chunks for RAG pipelines, Vector DBs, and LLM fine-tuning workflows.

Nguyễn Anh Duy

4.7

Tender RAG Chunker — Text Chunks for LLM

adobeflex/tender-rag-chunker

Chunk tender text for RAG/embeddings with stable ids and metadata.

Yahor

PDF to RAG Markdown Chunks for Embeddings

awesome_highboy/docforge

Convert PDFs into token-bounded Markdown chunks for RAG, embeddings, and vector databases (Pinecone, Chroma, Weaviate, Qdrant). Set maxTokens + overlap; get clean chunks with page number, token count, and SHA-256 content hash for dedup. JSON dataset ready for any LLM pipeline.

Adam

Rag Content Chunker

labrat011/rag-content-chunker

Turn raw text, Markdown, or Apify datasets into token-perfect RAG chunks with deterministic IDs, source metadata, and a billing-ready summary—ready for embeddings or vector DBs without extra glue code.

mick_

RAG-Ready Web Scraper & Smart Chunker for AI Knowledge Bases

adinfosys-labs/rag-ready-web-scraper-smart-chunker-for-ai-knowledge-bases

RAG-ready web scraper that collects, cleans, deduplicates, filters, and chunks web content into structured datasets for AI pipelines. Generates high-quality knowledge-base data optimized for LLMs, embeddings, and vector databases

Artashes Arakelyan

RAG Post Processor - Text Cleaner & Chunker for LLM Pipelines

jalicia/rag-post-processor

Clean and chunk scraped text for RAG and LLM pipelines. Strips HTML, collapses whitespace, splits into overlapping chunks ready for embedding. Works standalone or chained after any scraper. Per-row billing.

Jordan Wagner

PDF to Text API | Document Extraction for LLMs & RAG

andok/pdf-text-converter

Convert bulk PDF documents via URL into clean, raw text. The perfect document scraper for LLMs, vector databases, and RAG pipelines.

Andok

Rag Embedding Generator

labrat011/rag-embedding-generator

Generate vector embeddings from text or chunked datasets using OpenAI or Cohere. Chains with RAG Content Chunker for end-to-end RAG pipelines. Outputs raw vectors ready for any vector database.

mick_

RAG Text Chunker — heading & sentence aware, Japanese ready

shoebill-dev27/rag-text-chunker

Split Markdown or plain text into retrieval-ready chunks for RAG pipelines: cuts at headings, packs whole sentences up to a size limit with optional overlap, and tags every chunk with its heading breadcrumb. Handles Japanese sentence boundaries. No LLM cost.