Pinecone Integration
No credit card required
Pinecone Integration
No credit card required
This integration transfers data from Apify Actors to a Pinecone and is a good starting point for a question-answering, search, or RAG use case.
Do you want to learn more about this Actor?
Get a demoIt just fails now, even with no change to the info. I tested 3 different setups that worked yesterday. Today they all failed with the same error. Any reason?
It even fails, when i login to another apify account and use a new pinecone.
My mistake, pinecone git a limit
No, it still a problem, i get error 429. It sasy read over 2000 per second. But it is just weird, because all my flows where working a few days ago.
I think if it could be able to make it read a bit slower, so I won't hit the limit, sometimes it works, but rarely.
Hi, thank you for using the pinecone integration!
First and foremost, I apologize for the delayed response; I missed the notification about the issue.
I’m currently reviewing your run but haven’t identified any obvious issues so far. There is a rate limit of 2,000 query read units per second per index
(set by Pinecone), which the integration should typically not exceed, but it appears that it is happening.
I'll work on reproducing the issue and aim to have it resolved today or by tomorrow at the latest.
Okay great. All my pinecone integrations i have made have started to fail the last few days. I can send a lot more example if you need. I seems like the first one I push to pinecone, if I make a new pinecone account works, but the second time and thereafter it fails.
I've tested your setup, and everything seems to be working on my end. I still find it quite unlikely that we would hit the Pinecone rate limit (2k queries per second).
To help us track the issue, I’ve introduced a debug log. I’ve built a beta release that includes the following log:
1for (k, item_id) in enumerate(items_ids): 2 if k % 100 == 0: 3 Actor.log.info("Processing item_id %s (%s/%s) to compare crawled data with the database", item_id, k, len(items_ids)) 4 crawled_db[item_id] = vector_store.get_by_item_id(item_id)
If you could test it on your end, I’d greatly appreciate it. You can switch to the new build (tag) in the integration settings under Run Options. At the very least, this will give us more insight into what's happening.
Depending on what we find, I’m considering implementing a retry mechanism with an exponential back-off or a similar approach. I'll do that tomorrow.
See this run id: 8Df5kbl3PBGP95lte
2024-09-29T17:32:47.650Z ACTOR: Pulling Docker image of build mQYdkV7ftThCjypJn from repository. 2024-09-29T17:33:01.384Z ACTOR: Creating Docker container. 2024-09-29T17:33:01.458Z ACTOR: Starting Docker container. 2024-09-29T17:33:04.611Z INFO Initializing actor... 2024-09-29T17:33:04.614Z INFO System info ({"apify_sdk_version": "1.7.2", "apify_client_version": "1.6.4", "python_version": "3.11.10", "os": "linux"}) 2024-09-29T17:33:04.619Z INFO Starting the Vector Store Actor 2024-09-29T17:33:04.763Z INFO Received start argument (vector database name): pinecone 2024-09-29T17:33:04.764Z INFO Get embeddings class: OpenAI 2024-09-29T17:33:05.681Z INFO Load Dataset ID dBbM1MMyBsdGfGhQx and extract fields ['Product URL', 'Titel', 'Pris', 'Pris_før_rabat', 'Image URL', 'Lagerstatus', 'Tilbud', 'Kort_beskrivelse', 'Yderligere_information', 'Beskrivelse'] 2024-09-29T17:33:07.369Z INFO Dataset loaded, number of documents: 1410 2024-09-29T17:33:07.601Z INFO Documents chunked to 1411 chunks 2024-09-29T17:33:08.467Z INFO Update database with crawled data. Delta updates enabled 2024-09-29T17:33:08.468Z INFO Comparing crawled data with the database ... 2024-09-29T17:33:10.175Z ERROR (429) 2024-09-29T17:33:10.176Z Reason: Too Many Requests 2024-09-29T17:33:10.178Z HTTP response headers: HTTPHeaderDict({'Date': 'Sun, 29 Sep 2024 17:33:10 GMT', 'Content-Type': 'application/json', 'Content-Length': '168', 'Connection': 'kee... [trimmed]
I also tried with the new beta commit.
2024-09-29T17:48:43.129Z ACTOR: Pulling Docker image of build eOAJxfCoUHOpZ5x6h from repository. 2024-09-29T17:48:54.173Z ACTOR: Creating Docker container. 2024-09-29T17:48:54.755Z ACTOR: Starting Docker container. 2024-09-29T17:48:57.989Z INFO Initializing actor... 2024-09-29T17:48:57.991Z INFO System info ({"apify_sdk_version": "1.7.2", "apify_client_version": "1.6.4", "python_version": "3.11.10", "os": "linux"}) 2024-09-29T17:48:57.993Z INFO Starting the Vector Store Actor 2024-09-29T17:48:58.122Z INFO Received start argument (vector database name): pinecone 2024-09-29T17:48:58.125Z INFO Get embeddings class: OpenAI 2024-09-29T17:48:58.933Z INFO Load Dataset ID dBbM1MMyBsdGfGhQx and extract fields ['Product URL', 'Titel', 'Pris', 'Pris_før_rabat', 'Image URL', 'Lagerstatus', 'Tilbud', 'Kort_beskrivelse', 'Yderligere_information', 'Beskrivelse'] 2024-09-29T17:48:59.600Z INFO Dataset loaded, number of documents: 1410 2024-09-29T17:48:59.677Z INFO Documents chunked to 1411 chunks 2024-09-29T17:49:00.481Z INFO Update database with crawled data. Delta updates enabled 2024-09-29T17:49:00.483Z INFO Comparing crawled data with the database ... 2024-09-29T17:49:00.485Z INFO Processing item_id 94499446dafd3b6d12856d79c456c93081c0b426430b33dcbbc61f183d6f23ba (0/1410) to compare crawled data with the database 2024-09-29T17:49:01.929Z ERROR (429) 2024-09-29T17:49:01.932Z Reason: Too Many Requests 2024-09-29T17:49:01... [trimmed]
so if i Enable incremental updates, it fails, if not, it does not fail.
Yeah, that’s definitely the easiest fix. However, I'm still trying to fully understand what’s happening
From you run: https://console.apify.com/admin/users/Zkd8wkb2TsUSw30xY/actors/runs/wNpsDIuBYerAQeU1i#log
This log is saying that less than 100 request were executed and then we got 429 too many requests.
Can you please check the status of your Pinecone database? I've attached my run for reference.
But it is not really a fix, because I somehow need to delete the old vectors when I upsert again. And the metrics looks fine, this is the run. xhJcKQ3fVsGS8NAZE
But now, if I create a new pinecone index, it seems like the problem is gone, but all my old pinecones does not work, even on different accounts. Will try more.
I'm pleased to see that you're using the incremental update feature, but I'm also disappointed that it's not working for you right now.
At this point, I’m inclined to think that the issue might be related to Pinecone itself. Could you reach out to their support for clarification?
In the meantime, I'll implement a retry mechanism with a delay on errors so that you can test it more.
Yeah, i have reached out to them. Maybe it is pinecone, waiting for answers. But it seems now like if I create a new flow in pinecone, the problem is gone. But all the old flows has the same problem
I was able to reproduce the issue on my end and encountered the same "429 Too Many Requests" error for my index as well.
To address this, I implemented an exponential backoff feature and reran the integration successfully. It’s now published as a beta release.
The downside is that the integration takes a longer time to finish (13 minutes for me). 😕 Please make sure to set the memory limit to 512 MB — that should be sufficient (instead of 1024 MB, not to waste resources)
Could you please give it a try and let me know if it works for you? 🙏
It works now, it does not take much longer for me. 3 minutes compared to 1.5 minutes before. Great, thanks man!
I’m glad to hear it’s working and that I could assist!
I’ve pushed the changes to the latest release, so there's no need to use the beta version anymore.
Thank you for your help in debugging the issue!
So now i am getting it again on one of my pinecones. Is it possible for a place where you can edit how many it reads at a time, like set it to 50 or 75?
Hi, I’m sorry you’re facing these issues again.
As we observed previously, the load on the Pinecone index remains quite low. The error indicates 2k requests per second, but when we checked Pinecone’s monitoring, we were well below this limit.
Have you received any response from Pinecone on this issue?
In the latest build, I increased the retry timeout from 120 seconds to 300 seconds, which might help slightly. However, I’m afraid it’s not a definitive solution.
The integration currently makes up to 32 concurrent requests, though this configuration isn’t exposed yet. If possible, please try a longer timeout and see if it helps. Otherwise, I’ll try to expose this parameter so you can adjust it as needed.
I’ll close this issue for now. Please reopen it if the problem persists. Thank you for using the Pinecone integration!
Actor Metrics
42 monthly users
-
19 stars
87% runs succeeded
5.8 days response time
Created in Jun 2024
Modified 4 days ago