It just fails now, even with no change to the info. I tested 3 different setups that worked yesterday. Today they all failed with the same error. Any reason?

responsible_box

It even fails, when i login to another apify account and use a new pinecone.

responsible_box

My mistake, pinecone git a limit

responsible_box

No, it still a problem, i get error 429. It sasy read over 2000 per second. But it is just weird, because all my flows where working a few days ago.

responsible_box

I think if it could be able to make it read a bit slower, so I won't hit the limit, sometimes it works, but rarely.

Jiří Spilka (jiri.spilka)

Hi, thank you for using the pinecone integration!

First and foremost, I apologize for the delayed response; I missed the notification about the issue.

I’m currently reviewing your run but haven’t identified any obvious issues so far. There is a rate limit of 2,000 query read units per second per index (set by Pinecone), which the integration should typically not exceed, but it appears that it is happening.

I'll work on reproducing the issue and aim to have it resolved today or by tomorrow at the latest.

responsible_box

Okay great. All my pinecone integrations i have made have started to fail the last few days. I can send a lot more example if you need. I seems like the first one I push to pinecone, if I make a new pinecone account works, but the second time and thereafter it fails.

Jiří Spilka (jiri.spilka)

I've tested your setup, and everything seems to be working on my end. I still find it quite unlikely that we would hit the Pinecone rate limit (2k queries per second).

To help us track the issue, I’ve introduced a debug log. I’ve built a beta release that includes the following log:

for (k, item_id) in enumerate(items_ids):
    if k % 100 == 0:
        Actor.log.info("Processing item_id %s (%s/%s) to compare crawled data with the database", item_id, k, len(items_ids))
    crawled_db[item_id] = vector_store.get_by_item_id(item_id)

If you could test it on your end, I’d greatly appreciate it. You can switch to the new build (tag) in the integration settings under Run Options. At the very least, this will give us more insight into what's happening.

Depending on what we find, I’m considering implementing a retry mechanism with an exponential back-off or a similar approach. I'll do that tomorrow.

responsible_box

See this run id: 8Df5kbl3PBGP95lte

2024-09-29T17:32:47.650Z ACTOR: Pulling Docker image of build mQYdkV7ftThCjypJn from repository. 2024-09-29T17:33:01.384Z ACTOR: Creating Docker container. 2024-09-29T17:33:01.458Z ACTOR: Starting Docker container. 2024-09-29T17:33:04.611Z INFO Initializing actor... 2024-09-29T17:33:04.614Z INFO System info ({"apify_sdk_version": "1.7.2", "apify_client_version": "1.6.4", "python_version": "3.11.10", "os": "linux"}) 2024-09-29T17:33:04.619Z INFO Starting the Vector Store Actor 2024-09-29T17:33:04.763Z INFO Received start argument (vector database name): pinecone 2024-09-29T17:33:04.764Z INFO Get embeddings class: OpenAI 2024-09-29T17:33:05.681Z INFO Load Dataset ID dBbM1MMyBsdGfGhQx and extract fields ['Product URL', 'Titel', 'Pris', 'Pris_før_rabat', 'Image URL', 'Lagerstatus', 'Tilbud', 'Kort_beskrivelse', 'Yderligere_information', 'Beskrivelse'] 2024-09-29T17:33:07.369Z INFO Dataset loaded, number of documents: 1410 2024-09-29T17:33:07.601Z INFO Documents chunked to 1411 chunks 2024-09-29T17:33:08.467Z INFO Update database with crawled data. Delta updates enabled 2024-09-29T17:33:08.468Z INFO Comparing crawled data with the database ... 2024-09-29T17:33:10.175Z ERROR (429) 2024-09-29T17:33:10.176Z Reason: Too Many Requests 2024-09-29T17:33:10.178Z HTTP response headers: HTTPHeaderDict({'Date': 'Sun, 29 Sep 2024 17:33:10 GMT', 'Content-Type': 'application/json', 'Content-Length': '168', 'Connection': 'kee... [trimmed]

responsible_box

I also tried with the new beta commit.

2024-09-29T17:48:43.129Z ACTOR: Pulling Docker image of build eOAJxfCoUHOpZ5x6h from repository. 2024-09-29T17:48:54.173Z ACTOR: Creating Docker container. 2024-09-29T17:48:54.755Z ACTOR: Starting Docker container. 2024-09-29T17:48:57.989Z INFO Initializing actor... 2024-09-29T17:48:57.991Z INFO System info ({"apify_sdk_version": "1.7.2", "apify_client_version": "1.6.4", "python_version": "3.11.10", "os": "linux"}) 2024-09-29T17:48:57.993Z INFO Starting the Vector Store Actor 2024-09-29T17:48:58.122Z INFO Received start argument (vector database name): pinecone 2024-09-29T17:48:58.125Z INFO Get embeddings class: OpenAI 2024-09-29T17:48:58.933Z INFO Load Dataset ID dBbM1MMyBsdGfGhQx and extract fields ['Product URL', 'Titel', 'Pris', 'Pris_før_rabat', 'Image URL', 'Lagerstatus', 'Tilbud', 'Kort_beskrivelse', 'Yderligere_information', 'Beskrivelse'] 2024-09-29T17:48:59.600Z INFO Dataset loaded, number of documents: 1410 2024-09-29T17:48:59.677Z INFO Documents chunked to 1411 chunks 2024-09-29T17:49:00.481Z INFO Update database with crawled data. Delta updates enabled 2024-09-29T17:49:00.483Z INFO Comparing crawled data with the database ... 2024-09-29T17:49:00.485Z INFO Processing item_id 94499446dafd3b6d12856d79c456c93081c0b426430b33dcbbc61f183d6f23ba (0/1410) to compare crawled data with the database 2024-09-29T17:49:01.929Z ERROR (429) 2024-09-29T17:49:01.932Z Reason: Too Many Requests 2024-09-29T17:49:01... [trimmed]

responsible_box

so if i Enable incremental updates, it fails, if not, it does not fail.

Jiří Spilka (jiri.spilka)

Yeah, that’s definitely the easiest fix. However, I'm still trying to fully understand what’s happening

From you run: https://console.apify.com/admin/users/Zkd8wkb2TsUSw30xY/actors/runs/wNpsDIuBYerAQeU1i#log

This log is saying that less than 100 request were executed and then we got 429 too many requests.

Can you please check the status of your Pinecone database? I've attached my run for reference.

responsible_box

But it is not really a fix, because I somehow need to delete the old vectors when I upsert again. And the metrics looks fine, this is the run. xhJcKQ3fVsGS8NAZE

But now, if I create a new pinecone index, it seems like the problem is gone, but all my old pinecones does not work, even on different accounts. Will try more.

Jiří Spilka (jiri.spilka)

I'm pleased to see that you're using the incremental update feature, but I'm also disappointed that it's not working for you right now.

At this point, I’m inclined to think that the issue might be related to Pinecone itself. Could you reach out to their support for clarification?

In the meantime, I'll implement a retry mechanism with a delay on errors so that you can test it more.

responsible_box

Yeah, i have reached out to them. Maybe it is pinecone, waiting for answers. But it seems now like if I create a new flow in pinecone, the problem is gone. But all the old flows has the same problem

Jiří Spilka (jiri.spilka)

I was able to reproduce the issue on my end and encountered the same "429 Too Many Requests" error for my index as well.

To address this, I implemented an exponential backoff feature and reran the integration successfully. It’s now published as a beta release.
The downside is that the integration takes a longer time to finish (13 minutes for me). 😕 Please make sure to set the memory limit to 512 MB — that should be sufficient (instead of 1024 MB, not to waste resources)

Could you please give it a try and let me know if it works for you? 🙏

responsible_box

It works now, it does not take much longer for me. 3 minutes compared to 1.5 minutes before. Great, thanks man!

Jiří Spilka (jiri.spilka)

I’m glad to hear it’s working and that I could assist!

I’ve pushed the changes to the latest release, so there's no need to use the beta version anymore.

Thank you for your help in debugging the issue!

responsible_box

So now i am getting it again on one of my pinecones. Is it possible for a place where you can edit how many it reads at a time, like set it to 50 or 75?

Jiří Spilka (jiri.spilka)

Hi, I’m sorry you’re facing these issues again.

As we observed previously, the load on the Pinecone index remains quite low. The error indicates 2k requests per second, but when we checked Pinecone’s monitoring, we were well below this limit.

Have you received any response from Pinecone on this issue?

In the latest build, I increased the retry timeout from 120 seconds to 300 seconds, which might help slightly. However, I’m afraid it’s not a definitive solution.

The integration currently makes up to 32 concurrent requests, though this configuration isn’t exposed yet. If possible, please try a longer timeout and see if it helps. Otherwise, I’ll try to expose this parameter so you can adjust it as needed.

Jiří Spilka (jiri.spilka)

I’ll close this issue for now. Please reopen it if the problem persists. Thank you for using the Pinecone integration!

Add comment

Milvus Integration

apify/milvus-integration

This integration transfers data from Apify Actors to a Milvus/Zilliz database and is a good starting point for a question-answering, search, or RAG use case.

Apify

4.5

Chroma Integration

apify/chroma-integration

This integration transfers data from Apify Actors to a Chroma and is a good starting point for a question-answering, search, or RAG use case.

Apify

4.8

Weaviate Integration

apify/weaviate-integration

This integration transfers data from Apify Actors to a Weaviate and is a good starting point for a question-answering, search, or RAG use case.

Apify

4.7

OpenAI Vector Store Integration

jiri.spilka/openai-vector-store-integration

The Apify OpenAI Vector Store integration uploads data from Apify Actors to the OpenAI Vector Store linked to OpenAI Assistant.

Jiří Spilka

180

4.8

Actors MCP Server

apify/actors-mcp-server

⚠️ Legacy: This Actor is outdated. For the latest features and full documentation, visit https://mcp.apify.com. Easily connect any Apify Actor to AI agents using Anthropic’s Model Context Protocol (MCP) with our actively maintained MCP server.

Apify

1.9K

4.9

OpenSearch Integration

apify/opensearch-integration

Transfer data from Apify Actors to Amazon OpenSearch Service. This Actor is a good starting point for building question-answering systems, search functionality, or Retrieval-Augmented Generation (RAG) use cases.

Apify

4.4

MCP Stress Tester

jakub.kopecky/mcp-stress-tester

A simple MCP Stress Tester client Actor for stress-testing your Model Context Protocol server. 💻⚡

Jakub Kopecký

RAG Web Browser

apify/rag-web-browser

Web browser for OpenAI Assistants, RAG pipelines, or AI agents, similar to a web browser in ChatGPT. It queries Google Search, scrapes the top N pages, and returns their content as Markdown for further processing by an LLM. It can also scrape individual URLs. Supports Model Context Protocol (MCP).

Apify

4.6K

4.4

WCC Pinecone Integration

tri_angle/wcc-pinecone-integration

Crawl any website and store its content in your Pinecone vector database. Enhance the accuracy and reliability of your own AI Assistant with facts fetched from external sources or connect this integration to our Pinecone GPT Chatbot assistant available in Apify Store.