Pinecone Integration avatar
Pinecone Integration
Try for free

No credit card required

View all Actors
Pinecone Integration

Pinecone Integration

apify/pinecone-integration
Try for free

No credit card required

This integration transfers data from Apify Actors to a Pinecone and is a good starting point for a question-answering, search, or RAG use case.

Do you want to learn more about this Actor?

Get a demo
RB

How does Delta updates settings work?

Closed

responsible_box opened this issue
2 months ago

I want it to delete all the current vectors in Pinecone when there is a new upload. So if I run it every 2. week, it will delete my current vectors, and add the new ones instead. How is this possible?

Thanks for the help. :)

jiri.spilka avatar

Thank you for your interest in the Pinecone integration!

Currently, the integration does not support deleting all vectors in the Pinecone database during each new upload. This feature is not available due to the risk of accidental misconfiguration, which could result in the deletion of the entire database.

Instead, the integration offers the deltaUpdates functionality, which ensures efficient and safe updates to your Pinecone database. Here’s how it works:

  • Unchanged Content: Updates the last_seen_at metadata field.
  • Changed Content: Deletes the old data, computes new vectors, and adds them to the database.
  • New Content: Computes vectors and adds them to the database.

Handling Removed Content: If a URL is removed from the website and is not present in the current crawl, you can delete objects in the Pinecone database that have not been seen in the past X days. This is managed using the expiredObjectDeletionPeriodDays setting.

Example Configuration Here is an example of how you can set this up for your use case:

Input Data When scraping a website, such as apify.com, using the Web Scraper, the output might look like this:

1{
2  "url": "https://apify.com",
3  "title": "Apify",
4  "content": "Apify is the platform where developers build, deploy, and publish web scraping, data extraction, and web automation tools."
5}

Integration Settings as follows:

1{
2  "datasetFields": ["content"],
3  "enableDeltaUpdates": true,
4  "deltaUpdatesPrimaryDatasetFields": ["url"],
5  "expiredObjectDeletionPeriodDays": 14
6}

The content field is stored in the Pinecone database. The url field is used as a unique identifier to manage content updates. Objects not seen for more than 14 days will be automatically deleted, ensuring your database remains up-to-date without keeping outdated data.

Please let me know if deltaUpdates works for you. I might consider adding a delete_all_vectors functionality in the future, but I believe that the deltaUpdates feature provides a better (and cost-effective) alternative.

jiri.spilka avatar

I'm going to close this issue now. Please let me know if you face any problems.

Developer
Maintained by Apify
Actor metrics
  • 25 monthly users
  • 3 stars
  • 85.5% runs succeeded
  • 1.1 days response time
  • Created in Jun 2024
  • Modified 1 day ago