
Pinecone Integration
Pricing
Pay per usage

Pinecone Integration
This integration transfers data from Apify Actors to a Pinecone and is a good starting point for a question-answering, search, or RAG use case.
4.6 (5)
Pricing
Pay per usage
29
Monthly users
124
Runs succeeded
94%
Response time
4 days
Last modified
13 days ago
Max token lentght...
I cant push big databases, it seems like they think it is one long chunk, but a lot of smaller chunks??? Hope there is a solution!
team2
For more information, I get this error:
Failed to update the database. Please ensure the following:
The database is configured properly. The vector dimension of your embedding model in the Actor input (Embedding settings → model) matches the one set up in the database. Error message: Error code: 400 - {'error': {'message': 'Requested 1,073,396 tokens, max 600,000 tokens per request', 'type': 'max_tokens_per_request', 'param': None, 'code': 'max_tokens_per_request'}}
So, when I push a database that is larger than 600,000 tokens, it fails. Not just one chunk—even if all chunks are 500 tokens long, if the total exceeds 600,000 tokens, it fails.
I wonder why? Who sets this limit? And is it possible to either:
Create a workaround so that after 600k tokens, it creates a new request? Increase the limit?
And another thing: Instead of updating chunks, why is it not possible to just retrieve all chunks with matching URLs, push new ones, and delete the old ones? Would this not minimize the loading cost on Apify and reduce costs on Pinecone, where requesting so many chunks is quite expensive, in this case we would only request the url?
team2
????
responsible_box
:)
responsible_box
:(

Hi,
Sorry for the late response—I’ve been extremely busy with high-priority tasks over the past few days.
I looked into the issue, and it is partially related to this: Issue Link.
We are using the official Pinecone implementation.
1For OpenAI embeddings, use pool_threads>4 when constructing the pinecone.Index, 2 embedding_chunk_size>1000 and batch_size~64 for best performance. 3 Args: 4 texts: Iterable of strings to add to the vectorstore. 5 metadatas: Optional list of metadatas associated with the texts. 6 ids: Optional list of ids to associate with the texts. 7 namespace: Optional pinecone namespace to add the texts to. 8 batch_size: Batch size to use when adding the texts to the vectorstore. 9 embedding_chunk_size: Chunk size to use when embedding the texts. 10 async_req: Whether runs asynchronously. 11 id_prefix: Optional string to use as an ID prefix when upserting vectors.
I can expose these parameters and will implement the fix on Monday.
Apologies for the inconvenience.
responsible_box
Thanks, my pinecone read and write is getting quite expensive, is there a way to need to request and push less? I am often only changing a few URLs of of 700 or so
responsible_box
Looking forward to seeing this :)

I have it almost ready. I'm currently in the process of correcting the unit tests and expect to have it completed by tonight. I will keep you updated on the progress.

Hi,
Sorry for the delayed response.
It took a bit longer as I decided to implement this functionality:
And another thing: Instead of updating chunks, why is it not possible to just retrieve all chunks with matching URLs, push new ones, and delete the old ones? Would this not minimize the loading cost on Apify and reduce costs on Pinecone, where requesting so many chunks is quite expensive? In this case, we would only request the URL.
This is now available in Beta 0.0.59 with the following changes:
embeddingBatchSize
(Pinecone only) – Batch size for embedding texts. Default:1000
, Minimum:1
.usePineconeIdPrefix
(Pinecone only) – Optimizes delta updates using a Pinecone ID prefix (item_id#chunk_id
) whenenableDeltaUpdates
istrue
. Works only when the database is empty.- New parameter
dataUpdatesStrategy
:- Replaces
enableDeltaUpdates
. - Automatically set to
deltaUpdates
ifenableDeltaUpdates = true
. - Options:
deltaUpdates
,add
, orupsert
.
- Replaces
- Renamed
deltaUpdatesPrimaryDatasetFields
→dataUpdatesPrimaryDatasetFields
:- Automatically migrated if the old field is present.
- Backward Compatibility:
- Supports legacy
enableDeltaUpdates
mappings anddeltaUpdatesPrimaryDatasetFields
.
- Supports legacy
I have also update documentation:
Configure update strategy
To control how the integration updates data in the database, use the dataUpdatesStrategy
parameter. This parameter allows you to choose between different update strategies based on your use case, such as adding new data, upserting records, or incrementally updating records based on changes (deltas). Below are the available strategies and explanations for when to use each:
-
Add data (
add
):- Appends new data to the database without checking for duplicates or updating existing records.
- Suitable for cases where deduplication or updates are unnecessary, and the data simply needs to be added.
- For example, you might use this strategy to continually append data from independent crawls without regard for overlaps.
-
Upsert data (
upsert
):- Updates existing records in the database if they match a key or identifier and inserts new records if they don’t already exist.
- Ideal when you want to maintain accurate and up-to-date data while avoiding duplication.
- For instance, this is useful in cases where unique items (such as user profiles or documents) need to be managed, ensuring the database reflects the latest changes.
- Check the
dataUpdatePrimaryDatasetFields
parameter to specify which fields are used to uniquely identify each dataset item.
-
Delta updates (
deltaUpdates
):- Incrementally updates records by identifying differences (deltas) between the new dataset and the existing database records.
- Ensures only new or modified records are processed, leaving unchanged records untouched. This minimizes unnecessary database operations and improves efficiency.
- This is the most efficient strategy when integrating data that evolves over time, such as website content or recurring crawls.
- Check the
dataUpdatePrimaryDatasetFields
parameter to specify which fields are used to uniquely identify each dataset item.
I have tested this with my (small) Pinecone database and unit tests.
Please let me know if everything is working as expected, and I’ll proceed with the release.
Best,
Jiri

Sorry for the long version—here's the TL;DR:
- Set
embeddingBatchSize
to 500 or a smaller value. - ** Set
dataUpdatesStrategy
to 'upsert' to delete old entries and add new ones. - Important: Ensure
dataUpdatesPrimaryDatasetFields
is set up correctly.

Hi, were you able to try this? Thank you. Jiri
Pricing
Pricing model
Pay per usageThis Actor is paid per platform usage. The Actor is free to use, and you only pay for the Apify platform usage.