I'm not sure why your chunk_size is set to 40,000 characters. In most cases, embeddings won't effectively capture the meaning of such long text. If you reduce it to 20,000 (which is still quite long), it should work.

I can also adjust the batch size (if you want), but I suspect the chunk_size might have been an unintentional copy-paste mistake.

Hope this helps!
Jiri

Jiří Spilka (jiri.spilka)

I'll go ahead and close this issue. This is discussed in more details in this issue

Add comment

Milvus Integration

apify/milvus-integration

This integration transfers data from Apify Actors to a Milvus/Zilliz database and is a good starting point for a question-answering, search, or RAG use case.

Apify

4.5

Chroma Integration

apify/chroma-integration

This integration transfers data from Apify Actors to a Chroma and is a good starting point for a question-answering, search, or RAG use case.

Apify

4.8

Weaviate Integration

apify/weaviate-integration

This integration transfers data from Apify Actors to a Weaviate and is a good starting point for a question-answering, search, or RAG use case.

Apify

4.7

OpenAI Vector Store Integration

jiri.spilka/openai-vector-store-integration

The Apify OpenAI Vector Store integration uploads data from Apify Actors to the OpenAI Vector Store linked to OpenAI Assistant.

Jiří Spilka

180

4.8

Actors MCP Server

apify/actors-mcp-server

⚠️ Legacy: This Actor is outdated. For the latest features and full documentation, visit https://mcp.apify.com. Easily connect any Apify Actor to AI agents using Anthropic’s Model Context Protocol (MCP) with our actively maintained MCP server.

Apify

1.9K

4.9

OpenSearch Integration

apify/opensearch-integration

Transfer data from Apify Actors to Amazon OpenSearch Service. This Actor is a good starting point for building question-answering systems, search functionality, or Retrieval-Augmented Generation (RAG) use cases.

Apify

4.4

MCP Stress Tester

jakub.kopecky/mcp-stress-tester

A simple MCP Stress Tester client Actor for stress-testing your Model Context Protocol server. 💻⚡

Jakub Kopecký

RAG Web Browser

apify/rag-web-browser

Web browser for OpenAI Assistants, RAG pipelines, or AI agents, similar to a web browser in ChatGPT. It queries Google Search, scrapes the top N pages, and returns their content as Markdown for further processing by an LLM. It can also scrape individual URLs. Supports Model Context Protocol (MCP).

Apify

4.6K

4.4

WCC Pinecone Integration

tri_angle/wcc-pinecone-integration

Crawl any website and store its content in your Pinecone vector database. Enhance the accuracy and reliability of your own AI Assistant with facts fetched from external sources or connect this integration to our Pinecone GPT Chatbot assistant available in Apify Store.