Rag Pipeline avatar

Rag Pipeline

Pricing

from $5.00 / 1,000 results

Go to Apify Store
Rag Pipeline

Rag Pipeline

One-click RAG pipeline: chunks text, generates embeddings, and stores vectors in Pinecone or Qdrant. Provide your content and API keys -- the orchestrator handles the rest.

Pricing

from $5.00 / 1,000 results

Rating

0.0

(0)

Developer

Mick

Mick

Maintained by Community

Actor stats

0

Bookmarked

1

Total users

0

Monthly active users

4 days ago

Last modified

Share

One-click RAG pipeline on Apify. Chunk text, generate embeddings, and store vectors in Pinecone or Qdrant -- all in a single actor run. MCP-ready for AI agent integration.

What It Does

This actor orchestrates three sub-actors in sequence to build a complete RAG (Retrieval-Augmented Generation) pipeline. Feed it your content and it handles chunking, embedding, and vector storage automatically. Returns a pipeline summary -- ready for orchestration or consumption by AI agents via MCP.

Your content
-> RAG Content Chunker (chunk by paragraphs, sentences, or Markdown headers)
-> RAG Embedding Generator (OpenAI or Cohere embeddings)
-> RAG Vector Store Writer (upsert to Pinecone or Qdrant)

You provide your content, API keys, and vector DB config. The pipeline handles dataset handoff between steps automatically.

Input

ParameterRequiredDefaultDescription
textOne of text or source_dataset_id-Plain text or Markdown to process
source_dataset_idOne of text or source_dataset_id-Apify dataset ID from a crawler
source_dataset_fieldNotextField to read from source dataset items
chunking_strategyNorecursiverecursive, markdown, or sentence
chunk_sizeNo512Target chunk size in tokens (64-8192)
chunk_overlapNo64Overlap between chunks in tokens (0-2048)
embedding_api_keyYes-OpenAI or Cohere API key
embedding_providerNoopenaiopenai or cohere
embedding_modelNotext-embedding-3-smallEmbedding model name
embedding_batch_sizeNo128Texts per API request
vector_db_api_keyYes-Pinecone or Qdrant API key
vector_db_providerNopineconepinecone or qdrant
index_nameYes-Index (Pinecone) or collection (Qdrant) name
qdrant_urlIf Qdrant-Qdrant Cloud cluster URL
pinecone_namespaceNo""Pinecone namespace
qdrant_distance_metricNoCosineCosine, Dot, or Euclid

Output

A single summary item in the default dataset:

{
"_summary": true,
"pipeline": {
"total_duration_seconds": 12.345,
"steps": {
"chunker": { "actor": "labrat011/rag-content-chunker", "status": "SUCCEEDED", "duration_seconds": 3.2 },
"embedder": { "actor": "labrat011/rag-embedding-generator", "status": "SUCCEEDED", "duration_seconds": 5.1 },
"writer": { "actor": "labrat011/rag-vector-store-writer", "status": "SUCCEEDED", "duration_seconds": 4.0 }
}
},
"result": {
"total_upserted": 42,
"vector_db_provider": "qdrant",
"index_name": "my-collection"
}
}

Pricing

The orchestrator charges $0.005 per pipeline run ($5.00 per 1,000 runs). Sub-actors charge separately:

ActorRate
RAG Content Chunker$0.0005/chunk
RAG Embedding Generator$0.0003/embedding
RAG Vector Store Writer$0.0004/vector

You also pay the embedding provider (OpenAI/Cohere) and vector DB provider (Pinecone/Qdrant) at their standard rates.

Example: Quick Start with Qdrant

{
"text": "Your document content goes here...",
"chunking_strategy": "recursive",
"chunk_size": 512,
"embedding_api_key": "sk-...",
"embedding_provider": "openai",
"embedding_model": "text-embedding-3-small",
"vector_db_api_key": "your-qdrant-key",
"vector_db_provider": "qdrant",
"index_name": "my-rag-collection",
"qdrant_url": "https://your-cluster.us-west-1.aws.cloud.qdrant.io:6333"
}

Sub-Actors

Security

  • API keys are validated for presence only and never logged
  • Qdrant URLs are validated against cloud.qdrant.io pattern (SSRF prevention)
  • All string inputs are sanitized against control characters
  • Dataset IDs and field names are validated with strict regex patterns

License

MIT


MCP Integration

This actor works as an MCP tool through Apify's hosted MCP server. No custom server needed.

  • Endpoint: https://mcp.apify.com?tools=labrat011/rag-pipeline
  • Auth: Authorization: Bearer <APIFY_TOKEN>
  • Transport: Streamable HTTP
  • Works with: Claude Desktop, Cursor, VS Code, Windsurf, Warp, Gemini CLI

Example MCP config (Claude Desktop / Cursor):

{
"mcpServers": {
"rag-pipeline": {
"url": "https://mcp.apify.com?tools=labrat011/rag-pipeline",
"headers": {
"Authorization": "Bearer <APIFY_TOKEN>"
}
}
}
}

AI agents can use this actor to ingest text into a vector database, build RAG knowledge bases, and set up retrieval-augmented generation pipelines -- all as a single callable MCP tool.