Pinecone All-in-One Integration
Pricing
from $0.01 / page crawled & stored
Pinecone All-in-One Integration
Crawl websites, store datasets, run semantic searches, delete vectors, and check index stats — all in one actor. Uses Pinecone Integrated Inference by default — no OpenAI key needed. Bring your own Pinecone API key and start in minutes.
Pricing
from $0.01 / page crawled & stored
Rating
0.0
(0)
Developer
Yuliia Kulakova
Actor stats
0
Bookmarked
1
Total users
0
Monthly active users
a day ago
Last modified
Categories
Share

The only Pinecone actor you'll ever need. Crawl websites, store datasets, run semantic searches, delete vectors, and check index stats — all from a single tool. No OpenAI key required.
What You Get
Connect your Pinecone vector database to the web in minutes. Give us a URL (or a sitemap, or an existing dataset) and we handle everything: crawling, content extraction, chunking, embedding, and storage. Then search your data with natural language — right from the same actor.
5 modes. 1 actor. Zero unnecessary API keys.
Features
🔍 Crawl & Store
Point it at any website. The actor crawls pages with a real browser (Playwright), converts HTML to clean Markdown, splits content into smart chunks, and stores everything as vectors in your Pinecone index. Supports multiple start URLs and XML sitemaps.
📦 Store Dataset
Already have data from another Apify scraper? Skip the crawl — load any dataset directly into Pinecone. Works with any actor output.
🔎 Semantic Search
Ask questions in plain English. Get ranked results with scores, metadata, and full chunk text. Add metadata filters and optional reranking (BGE Reranker v2 M3 or Pinecone Rerank v0) for precision results.
🗑️ Delete Vectors
Remove specific vectors by ID, wipe everything matching a metadata filter, or clear an entire namespace. Full control over your index.
📊 Index Stats
Check dimensions, record count, metric type, and per-namespace breakdown instantly. No Pinecone Console needed.
What You Need
Your Pinecone API Key (required)
This actor stores and retrieves data from your Pinecone index — you bring your own key.
- Sign up for free at app.pinecone.io — no credit card needed
- Go to API Keys in the left sidebar
- Copy your key and paste it into the
pineconeApiKeyfield
💡 Pinecone's free plan includes 1 index and ~100,000 vectors — more than enough to get started.
OpenAI or Cohere Key (optional)
Only needed if you choose OpenAI or Cohere as your embedding provider. With Pinecone Integrated Inference (the default), no extra API key is required at all.
Quick Start
Crawl a website and store in Pinecone
{"mode": "crawl-and-store","pineconeApiKey": "YOUR_PINECONE_API_KEY","pineconeIndexName": "my-docs","startUrls": [{ "url": "https://docs.example.com" }],"maxPages": 50,"embeddingProvider": "pinecone"}
That's it. No OpenAI key. The index is created automatically.
Search your data
{"mode": "query","pineconeApiKey": "YOUR_PINECONE_API_KEY","pineconeIndexName": "my-docs","queryText": "How do I configure authentication?","topK": 5}
Store an existing Apify dataset
{"mode": "store-dataset","pineconeApiKey": "YOUR_PINECONE_API_KEY","pineconeIndexName": "my-docs","datasetId": "YOUR_DATASET_ID","datasetFields": ["text"],"metadataFields": { "pageUrl": "url", "pageTitle": "title" }}
Delete old vectors by URL
{"mode": "delete","pineconeApiKey": "YOUR_PINECONE_API_KEY","pineconeIndexName": "my-docs","deleteMode": "by-filter","deleteFilter": { "url": { "$eq": "https://docs.example.com/old-page" } }}
Embedding Providers
| Provider | Extra Key Needed | Default Model | Cost |
|---|---|---|---|
| Pinecone (recommended) | ❌ No | multilingual-e5-large | Included in Pinecone plan |
| OpenAI | ✅ Yes | text-embedding-3-small | ~$0.02 / 1M tokens |
| Cohere | ✅ Yes | embed-english-v3.0 | ~$0.10 / 1M tokens |
Pricing
| Event | Price |
|---|---|
| Page crawled & stored in Pinecone | $0.01 |
| Dataset item stored in Pinecone | $0.005 |
| Semantic search query | $0.02 |
Delete and Index Stats modes are free — no charge for management operations.
💡 Example: Crawl 200 pages = $2.00 · Run 50 searches = $1.00
How We Compare
| Feature | Official Apify Integration | Other Actors | This Actor |
|---|---|---|---|
| Crawl + Store + Query + Delete + Stats | ❌ | ❌ | ✅ 5 modes |
| No OpenAI key needed | ❌ | ❌ | ✅ |
| Semantic search built-in | ❌ | ❌ | ✅ |
| Delete by filter or ID | ❌ | ❌ | ✅ |
| Auto-create index | ❌ | Partial | ✅ |
| Dimension validation | ❌ | ❌ | ✅ |
| Reranking support | ❌ | ❌ | ✅ |
| Sitemap support | ❌ | ❌ | ✅ |
Output
Crawl & Store / Store Dataset
Every chunk pushed to the dataset:
| Field | Description |
|---|---|
vectorId | Deterministic SHA-256 ID |
url | Source page URL |
title | Page title |
chunkIndex | Chunk number |
totalChunks | Total chunks from this page |
chunkText | Full text content |
contentHash | SHA-256 for change detection |
storedAt | Timestamp |
Query
| Field | Description |
|---|---|
id | Vector ID |
score | Relevance score (0–1) |
metadata | Stored metadata + chunk text |
Tips
- Same URL = same vector ID. Re-crawling overwrites the same vectors — safe to re-run anytime.
- Large sites? Use
sitemapUrlto crawl complete websites from one XML sitemap. - Dynamic content?
renderJavaScript: true(default) handles React, Next.js, and SPAs. - Chaining scrapers? Run any scraper first, then use
store-datasetmode with its dataset ID. - Cost-free embeddings? Keep
embeddingProvider: "pinecone"— no extra API costs.