Pinecone All-in-One Integration avatar

Pinecone All-in-One Integration

Pricing

from $0.01 / page crawled & stored

Go to Apify Store
Pinecone All-in-One Integration

Pinecone All-in-One Integration

Crawl websites, store datasets, run semantic searches, delete vectors, and check index stats — all in one actor. Uses Pinecone Integrated Inference by default — no OpenAI key needed. Bring your own Pinecone API key and start in minutes.

Pricing

from $0.01 / page crawled & stored

Rating

0.0

(0)

Developer

Yuliia Kulakova

Yuliia Kulakova

Maintained by Community

Actor stats

0

Bookmarked

1

Total users

0

Monthly active users

a day ago

Last modified

Share

Pinecone All-in-One Integration

The only Pinecone actor you'll ever need. Crawl websites, store datasets, run semantic searches, delete vectors, and check index stats — all from a single tool. No OpenAI key required.


What You Get

Connect your Pinecone vector database to the web in minutes. Give us a URL (or a sitemap, or an existing dataset) and we handle everything: crawling, content extraction, chunking, embedding, and storage. Then search your data with natural language — right from the same actor.

5 modes. 1 actor. Zero unnecessary API keys.


Features

🔍 Crawl & Store

Point it at any website. The actor crawls pages with a real browser (Playwright), converts HTML to clean Markdown, splits content into smart chunks, and stores everything as vectors in your Pinecone index. Supports multiple start URLs and XML sitemaps.

📦 Store Dataset

Already have data from another Apify scraper? Skip the crawl — load any dataset directly into Pinecone. Works with any actor output.

Ask questions in plain English. Get ranked results with scores, metadata, and full chunk text. Add metadata filters and optional reranking (BGE Reranker v2 M3 or Pinecone Rerank v0) for precision results.

🗑️ Delete Vectors

Remove specific vectors by ID, wipe everything matching a metadata filter, or clear an entire namespace. Full control over your index.

📊 Index Stats

Check dimensions, record count, metric type, and per-namespace breakdown instantly. No Pinecone Console needed.


What You Need

Your Pinecone API Key (required)

This actor stores and retrieves data from your Pinecone index — you bring your own key.

  1. Sign up for free at app.pinecone.io — no credit card needed
  2. Go to API Keys in the left sidebar
  3. Copy your key and paste it into the pineconeApiKey field

💡 Pinecone's free plan includes 1 index and ~100,000 vectors — more than enough to get started.

OpenAI or Cohere Key (optional)

Only needed if you choose OpenAI or Cohere as your embedding provider. With Pinecone Integrated Inference (the default), no extra API key is required at all.


Quick Start

Crawl a website and store in Pinecone

{
"mode": "crawl-and-store",
"pineconeApiKey": "YOUR_PINECONE_API_KEY",
"pineconeIndexName": "my-docs",
"startUrls": [{ "url": "https://docs.example.com" }],
"maxPages": 50,
"embeddingProvider": "pinecone"
}

That's it. No OpenAI key. The index is created automatically.

Search your data

{
"mode": "query",
"pineconeApiKey": "YOUR_PINECONE_API_KEY",
"pineconeIndexName": "my-docs",
"queryText": "How do I configure authentication?",
"topK": 5
}

Store an existing Apify dataset

{
"mode": "store-dataset",
"pineconeApiKey": "YOUR_PINECONE_API_KEY",
"pineconeIndexName": "my-docs",
"datasetId": "YOUR_DATASET_ID",
"datasetFields": ["text"],
"metadataFields": { "pageUrl": "url", "pageTitle": "title" }
}

Delete old vectors by URL

{
"mode": "delete",
"pineconeApiKey": "YOUR_PINECONE_API_KEY",
"pineconeIndexName": "my-docs",
"deleteMode": "by-filter",
"deleteFilter": { "url": { "$eq": "https://docs.example.com/old-page" } }
}

Embedding Providers

ProviderExtra Key NeededDefault ModelCost
Pinecone (recommended)❌ Nomultilingual-e5-largeIncluded in Pinecone plan
OpenAI✅ Yestext-embedding-3-small~$0.02 / 1M tokens
Cohere✅ Yesembed-english-v3.0~$0.10 / 1M tokens

Pricing

EventPrice
Page crawled & stored in Pinecone$0.01
Dataset item stored in Pinecone$0.005
Semantic search query$0.02

Delete and Index Stats modes are free — no charge for management operations.

💡 Example: Crawl 200 pages = $2.00 · Run 50 searches = $1.00


How We Compare

FeatureOfficial Apify IntegrationOther ActorsThis Actor
Crawl + Store + Query + Delete + Stats5 modes
No OpenAI key needed
Semantic search built-in
Delete by filter or ID
Auto-create indexPartial
Dimension validation
Reranking support
Sitemap support

Output

Crawl & Store / Store Dataset

Every chunk pushed to the dataset:

FieldDescription
vectorIdDeterministic SHA-256 ID
urlSource page URL
titlePage title
chunkIndexChunk number
totalChunksTotal chunks from this page
chunkTextFull text content
contentHashSHA-256 for change detection
storedAtTimestamp

Query

FieldDescription
idVector ID
scoreRelevance score (0–1)
metadataStored metadata + chunk text

Tips

  • Same URL = same vector ID. Re-crawling overwrites the same vectors — safe to re-run anytime.
  • Large sites? Use sitemapUrl to crawl complete websites from one XML sitemap.
  • Dynamic content? renderJavaScript: true (default) handles React, Next.js, and SPAs.
  • Chaining scrapers? Run any scraper first, then use store-dataset mode with its dataset ID.
  • Cost-free embeddings? Keep embeddingProvider: "pinecone" — no extra API costs.