Pinecone integration avatar
Pinecone integration
Try for free

No credit card required

View all Actors
Pinecone integration

Pinecone integration

jan.turon/pinecone-integration
Try for free

No credit card required

Simplify your data operations with this Apify and Pinecone integration. Easily push selected fields from your Apify Actor directly into any Pinecone index. If the index doesn't exist, the integration will create it. Practical and straightforward solution for handling data between Apify and Pinecone.

User avatar

Unable to upload documents to Pinecone

Closed

davidlance opened this issue
6 months ago

I believe I've entered all the configurations correctly to push my web scrape results to my Pinecone database, but the Pinecone vector count is 0 and I found an error in the run logs. I don't understand what I need to do to have the documents in Pinecone.

User avatar

davidlance

6 months ago

Log files

2023-11-15T20:32:19.661Z ACTOR: Pulling Docker image of build JOzHG9AAnkWKYnSo4 from repository. 2023-11-15T20:32:24.285Z ACTOR: Creating Docker container. 2023-11-15T20:32:24.906Z ACTOR: Starting Docker container. 2023-11-15T20:32:27.974Z INFO Initializing actor... 2023-11-15T20:32:27.975Z INFO System info ({"apify_sdk_version": "1.1.1", "apify_client_version": "1.3.0", "python_version": "3.11.6", "os": "linux"}) 2023-11-15T20:32:28.166Z Loading dataset 2023-11-15T20:32:28.168Z Metadata fields loaded {'source': None} 2023-11-15T20:32:28.181Z Dataset loaded for field text 2023-11-15T20:32:28.182Z Loading documents for field text 2023-11-15T20:32:31.096Z ERROR Actor failed with an exception 2023-11-15T20:32:31.098Z Traceback (most recent call last): 2023-11-15T20:32:31.099Z File "/usr/src/app/src/main.py", line 59, in main 2023-11-15T20:32:31.100Z documents = loader.load() 2023-11-15T20:32:31.101Z ^^^^^^^^^^^^^ 2023-11-15T20:32:31.102Z File "/usr/local/lib/python3.11/site-packages/langchain/document_loaders/apify_dataset.py", line 54, in load 2023-11-15T20:32:31.103Z return list(map(self.dataset_mapping_function, dataset_items)) 2023-11-15T20:32:31.104Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2023-11-15T20:32:31.105Z File "/usr/src/app/src/main.py", line 52, in

User avatar

davidlance

6 months ago

Document format

[ { "url": "https://www.domain.com/", "text": "lorem ipsum" }, { "url": "https://www.domain.com/", "text": "lorem ipsum" } ]

User avatar

Try to remove metadata_fields.

User avatar

davidlance

6 months ago

Removed the metadata config, but still get an error.

2023-11-16T14:41:05.889Z ACTOR: Pulling Docker image of build JOzHG9AAnkWKYnSo4 from repository. 2023-11-16T14:41:10.573Z ACTOR: Creating Docker container. 2023-11-16T14:41:10.629Z ACTOR: Starting Docker container. 2023-11-16T14:41:13.489Z INFO Initializing actor... 2023-11-16T14:41:13.490Z INFO System info ({"apify_sdk_version": "1.1.1", "apify_client_version": "1.3.0", "python_version": "3.11.6", "os": "linux"}) 2023-11-16T14:41:13.662Z Loading dataset 2023-11-16T14:41:13.664Z Metadata fields loaded {} 2023-11-16T14:41:13.665Z ERROR Actor failed with an exception 2023-11-16T14:41:13.666Z Traceback (most recent call last): 2023-11-16T14:41:13.667Z File "/usr/src/app/src/main.py", line 49, in main 2023-11-16T14:41:13.668Z dataset_id=actor_input.get('payload')['resource']['defaultDatasetId'], 2023-11-16T14:41:13.669Z ~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^ 2023-11-16T14:41:13.669Z TypeError: 'NoneType' object is not subscriptable 2023-11-16T14:41:13.670Z INFO Exiting actor ({"exit_code": 91})

User avatar

davidlance

6 months ago

This was run from the Pinecone actor directly. Will I need to rerun the webscraper actor again? Doesn't seem like I should need to, but that's the only thing I can think of. Just didn't want to take the time, or spend the money if I didn't have to.

User avatar

davidlance

6 months ago

Also, without the metadata configured, how will the page urls be stored in pinecone? Traditionally I see them saved as metadata.

User avatar

URLs are not in the results set unfortunately - this integration can't fetch it. It's about actor itself. Also, yes, you need to run run again.

User avatar

davidlance

6 months ago

Thanks for the quick response. Rerunning the webscraper now.

Feature requests:

  1. Reference stored datasets we've created in Apify in the storage section. I see they have id's to uniquely identify them. We shouldn't have to start from scratch every time there's an error. We're rebuilding a dataset that already exists wasting time and money.
  2. Open your actor up to more embedding models than just OpenAI. It would be nice to use HuggingFace models.
User avatar

davidlance

6 months ago

Also... the Website Content Crawler actor provides lots of good metadata. Would be nice to be able to save that into Pinecone.

{ "url": "https://www.company.com/", "crawl": { "loadedUrl": "https://www.company.com/", "loadedTime": "2023-11-14T19:12:10.154Z", "referrerUrl": "https://www.company.com/", "depth": 0, "httpStatusCode": 200 }, "metadata": { "canonicalUrl": "https://www.company.com/", "title": "Digital Product Growth | Experience Experts", "description": "Company builds technology-enabled solutions that propel businesses and delight customers.", "author": null, "keywords": null, "languageCode": "en-US" }, "screenshotUrl": null, "text": "we help companies grow...." },

User avatar

davidlance

6 months ago

Made another run with the webcrawler. The Pinecone Integration kicked off but failed. Logs seem to indicate there is metadata config, but that has been removed prior to the run. Please advise

2023-11-16T16:43:14.231Z ACTOR: Pulling Docker image of build JOzHG9AAnkWKYnSo4 from repository. 2023-11-16T16:43:18.786Z ACTOR: Creating Docker container. 2023-11-16T16:43:18.821Z ACTOR: Starting Docker container. 2023-11-16T16:43:21.319Z INFO Initializing actor... 2023-11-16T16:43:21.321Z INFO System info ({"apify_sdk_version": "1.1.1", "apify_client_version": "1.3.0", "python_version": "3.11.6", "os": "linux"}) 2023-11-16T16:43:21.464Z Loading dataset 2023-11-16T16:43:21.467Z Metadata fields loaded {'url': None} 2023-11-16T16:43:21.474Z Dataset loaded for field text 2023-11-16T16:43:21.477Z Loading documents for field text 2023-11-16T16:43:23.632Z ERROR Actor failed with an exception 2023-11-16T16:43:23.634Z Traceback (most recent call last): 2023-11-16T16:43:23.636Z File "/usr/src/app/src/main.py", line 59, in main 2023-11-16T16:43:23.638Z documents = loader.load() 2023-11-16T16:43:23.640Z ^^^^^^^^^^^^^ 2023-11-16T16:43:23.642Z File "/usr/local/lib/python3.11/site-packages/langchain/document_loaders/apify_dataset.py", line 54, in load 2023-11-16T16:43:23.644Z return list(map(self.dataset_mapping_function, dataset_items)) 2023-11-16T16:43:23.646Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2023-11-16T16:43:23.648Z File "/usr/src/app/src/main.py", line 52, in

User avatar

I reviewed an actor and metadata_fields field seemed to be buggy. I released new version fixing it. Also you can now optionally pass dataset_id in your input schema and run this actor stand alone.

User avatar

davidlance

6 months ago

Thanks for the fixes. I'll give them a try.

Developer
Maintained by Community
Actor metrics
  • 15 monthly users
  • 84.2% runs succeeded
  • 11 days response time
  • Created in May 2023
  • Modified 14 days ago