
tsboi index
Pricing
Pay per event

tsboi index
Indexing for LLMs. This application crawls specified websites, processes their content into a searchable vector database, and enables users to ask natural language questions about the content.
0.0 (0)
Pricing
Pay per event
0
Total users
1
Monthly users
1
Runs succeeded
>99%
Last modified
3 days ago
LangChain.js template
LangChain is a framework for developing applications powered by language models.
This example template illustrates how to use LangChain.js with Apify to crawl the web data, vectorize them, and prompt the OpenAI model. All of this is within a single Apify Actor and slightly over a hundred lines of code.
Included features
- Apify SDK - a toolkit for building Actors
- Input schema - define and easily validate a schema for your Actor's input
- Langchain.js - a framework for developing applications powered by language models
- OpenAI - a powerful language model
How it works
The code contains the following steps:
- Crawls given website using Website Content Crawler Actor.
- Vectorizes the data using the OpenAI API.
- Caches the vector index in the key-value store so that when you run Actor for the same website again, the cached data are used to speed it up.
- Data are fed to the OpenAI model using Langchain.js, and a given query is asked.
Before you start
To be able to run this template both locally and on the Apify platform, you need to:
- Have an Apify account and sign into it using
apify login
command in your terminal. Without this, you won't be able to run the required Website Content Crawler Actor to gather the data. - Have an OpenAI account and an API key. This is needed for vectorizing the data and also to be able to prompt the OpenAI model.
- When running locally store this as OPENAI_API_KEY environment variable (https://docs.apify.com/cli/docs/vars#set-up-environment-variables-in-apify-console).
- When running on Apify platform, you can simply paste this into the input field in the input UI.
Production use
This serves purely as an example of the whole pipeline.
For production use, we recommend you to:
- Separate crawling, data vectorization, and prompting into separate Actors. This way, you can run them independently and scale them separately.
- Replace the local vector store with Pinecone or a similar database. See the LangChain.js docs for more information.
Resources
- Pinecone integration Actor
- How to use Pinecone with LLMs
- How to use LangChain with OpenAI, Pinecone, and Apify
- Integration with Zapier, Make, Google Drive and others
- Video guide on getting data using Apify API
- A short guide on how to create web scrapers using code templates
Getting started
For complete information see this article. In short, you will:
- Build the Actor
- Run the Actor
Pull the Actor for local development
If you would like to develop locally, you can pull the existing Actor from Apify console using Apify CLI:
-
Install
apify-cli
Using Homebrew
$brew install apify-cliUsing NPM
$npm -g install apify-cli -
Pull the Actor by its unique
<ActorId>
, which is one of the following:- unique name of the Actor to pull (e.g. "apify/hello-world")
- or ID of the Actor to pull (e.g. "E2jjCZBezvAZnX8Rb")
You can find both by clicking on the Actor title at the top of the page, which will open a modal containing both Actor unique name and Actor ID.
This command will copy the Actor into the current directory on your local machine.
$apify pull <ActorId>
Documentation reference
To learn more about Apify and Actors, take a look at the following resources: