Rag Pipeline
Pricing
from $5.00 / 1,000 results
Rag Pipeline
One-click RAG pipeline: chunks text, generates embeddings, and stores vectors in Pinecone or Qdrant. Provide your content and API keys -- the orchestrator handles the rest.
Rag Pipeline
Pricing
from $5.00 / 1,000 results
One-click RAG pipeline: chunks text, generates embeddings, and stores vectors in Pinecone or Qdrant. Provide your content and API keys -- the orchestrator handles the rest.
Plain text or Markdown content to chunk, embed, and store. For bulk processing from a crawler, use source_dataset_id instead.
Apify dataset ID from a previous actor run (e.g., Website Content Crawler). Each item's text field will be chunked, embedded, and stored. Takes priority over the text input.
Which field to read from source dataset items. Common values: 'text', 'markdown', 'content'.
How to split content. 'recursive' (default): paragraphs, then sentences, then words. 'markdown': splits on headers. 'sentence': sentence boundaries only.
Target size for each chunk in tokens. Recommended: 256-1024 for most RAG use cases.
Overlapping tokens between consecutive chunks. Helps preserve context across boundaries.
Your OpenAI or Cohere API key for generating embeddings. Never logged or stored beyond this run.
Which embedding provider to use.
Which embedding model to use. OpenAI: 'text-embedding-3-small' (1536d), 'text-embedding-3-large' (3072d). Cohere: 'embed-english-v3.0' (1024d), 'embed-multilingual-v3.0' (1024d).
Texts per embedding API request. Higher = faster but more memory. OpenAI max: 2048, Cohere max: 96.
Your Pinecone or Qdrant API key. Never logged or stored beyond this run.
Which vector database to write to.
Pinecone: the index name. Qdrant: the collection name (auto-created if it doesn't exist).
Qdrant Cloud only: your full cluster URL (e.g., 'https://xyz.us-west-1.aws.cloud.qdrant.io:6333'). Not needed for Pinecone.
Pinecone namespace. Leave empty for the default namespace. Ignored for Qdrant.