If you need to scrape data from different websites (e.g., using Website Content Crawler), you can set up a task. In this task, you can connect Website Content Crawler (with a specific startURL) and OpenAI Vector Store Integration (with a specific vectorStoreId).

This way, you can update different vector stores with different data from the web.

Does this approach make sense? If you're working on a different use case, please let me know—I'd be happy to help!

Jiri

drippingfist

Hi Jiri

Thanks for your response! I start the website content crawler via an https request in which I set all the options for the run and startURL. When I try to add the OpenAI Vector Store Integration, it requires that I add a vector store ID. I don't want to do this manually each time. I'm now implementing a workaround where I first run the website content scraper, save the datasetID and the KeyValueStoreId to a database, then when the website content crawler is finished, I start the OpenAI Vectore store Integration with these values added to the debugging portion of the request body.

BUT, is there a way to pass the vectorStoreID into the Website Content Crawler request so that the OpenAI Vector Store Integration can use when if it's triggered as an integrated compotentn of my task?

Hope that makes sense.

Jiří Spilka (jiri.spilka)

Hi, thank you for the detailed explanation.

Unfortunately, it's not possible to pass the vectorStoreId directly into the Website Content Crawler request. This feature is not supported, and there are no plans to add support for it in the near future.

That said, there are a few alternative approaches you can consider:

Separate calls – As you mentioned, you can first run Website Content Crawler, retrieve the datasetId, and then call OpenAI Vector Store with both datasetId and vectorStoreId. This works but somewhat undermines the goal of having seamless integrations.
Using tasks – Another option is to create a Task that combines Website Content Crawler and OpenAI Vector Store with a specific vectorStoreId. If you have three unique vectorStoreIds, you would need to create a separate task for each one. You can then call the appropriate task with startURLs as usual, and it will save the data into the corresponding vectorStoreId. However, this approach is only feasible if the number of vector stores remains manageable.
Using webhooks – A more advanced option is to leverage Webhooks, though this setup is even more complex.

Unfortunately, I don't have an immediate out-of-the-box solution for this. If possible, I’d recommend trying the second approach first, or the first one if that works for your needs.

If you'd like, I can provide a code snippet for calling it. However, it seems like you already figured it out—let me know if you need any help. Jiri

Jiří Spilka (jiri.spilka)

I'll go ahead and close this issue now. Please feel free to ask questions or raise a new issue. Jiri

Add comment

Qdrant Integration

apify/qdrant-integration

Transfer data from Apify Actors to a Qdrant vector database.

Apify

4.5

GPT Scraper

drobnikj/gpt-scraper

Extract data from any website and feed it into GPT via the OpenAI API. Use ChatGPT to proofread content, analyze sentiment, summarize reviews, extract contact details, and much more.

Jakub Drobník

4.4

Auto GPT

lukaskrivka/auto-gpt

Run Auto GPT sessions directly on Apify. No OpenAI account or API token is required! Store parsed thoughts into datasets for later analysis.

Lukáš Křivka

199

GPT Browser

anchor/gpt-browser

A GPT browser to use OpenAI prompt on any website. Put a list of URLs and a prompt, then the GPT agent will give you the answer you need. Fast, easy, and not limited with OpenAI ChatGPT restrictions. The best way to search and use GPT on large number of websites. Upload Excel or CSV. Screenshots 📸

Anchor

Mastra.ai MCP Agent

jakub.kopecky/actor-mastra-mcp-agent

🤖 AI agent using mastra.ai with Apify MCP Server. 🚀 Runs queries via OpenAI models, taps Apify Actors for web data, and outputs to datasets. 🛠️

Jakub Kopecký

OpenRouter - Unified LLM Interface for ChatGPT, Claude, Gemini

xyzzy/open-router

Use the OpenRouter platform to choose the best and most cost effective model for your prompts utilizing a standardized interface (including ChatGPT, Claude, Gemini, Llama, Mistral, and more). See instructions for creating an OpenRouter account and API key.