Pricing

Pay per event

Go to Store

tsboi index

Try for free

Developed by

Ikenna Chidoka

Indexing for LLMs. This application crawls specified websites, processes their content into a searchable vector database, and enables users to ask natural language questions about the content.

0.0 (0)

Pricing

Pay per event

Total users

Monthly users

Runs succeeded

>99%

Last modified

3 days ago

SEO tools

LangChain.js template

LangChain is a framework for developing applications powered by language models.

This example template illustrates how to use LangChain.js with Apify to crawl the web data, vectorize them, and prompt the OpenAI model. All of this is within a single Apify Actor and slightly over a hundred lines of code.

Included features

Apify SDK - a toolkit for building Actors
Input schema - define and easily validate a schema for your Actor's input
Langchain.js - a framework for developing applications powered by language models
OpenAI - a powerful language model

How it works

The code contains the following steps:

Crawls given website using Website Content Crawler Actor.
Vectorizes the data using the OpenAI API.
Caches the vector index in the key-value store so that when you run Actor for the same website again, the cached data are used to speed it up.
Data are fed to the OpenAI model using Langchain.js, and a given query is asked.

Before you start

To be able to run this template both locally and on the Apify platform, you need to:

Have an Apify account and sign into it using apify login command in your terminal. Without this, you won't be able to run the required Website Content Crawler Actor to gather the data.
Have an OpenAI account and an API key. This is needed for vectorizing the data and also to be able to prompt the OpenAI model.
- When running locally store this as OPENAI_API_KEY environment variable (https://docs.apify.com/cli/docs/vars#set-up-environment-variables-in-apify-console).
- When running on Apify platform, you can simply paste this into the input field in the input UI.

Production use

This serves purely as an example of the whole pipeline.

For production use, we recommend you to:

Separate crawling, data vectorization, and prompting into separate Actors. This way, you can run them independently and scale them separately.
Replace the local vector store with Pinecone or a similar database. See the LangChain.js docs for more information.

Resources

Pinecone integration Actor
How to use Pinecone with LLMs
How to use LangChain with OpenAI, Pinecone, and Apify
Integration with Zapier, Make, Google Drive and others
Video guide on getting data using Apify API
A short guide on how to create web scrapers using code templates

Getting started

For complete information see this article. In short, you will:

Build the Actor
Run the Actor

Pull the Actor for local development

If you would like to develop locally, you can pull the existing Actor from Apify console using Apify CLI:

Install apify-cli

Using Homebrew

$brew install apify-cli

Using NPM

$npm -g install apify-cli

Pull the Actor by its unique <ActorId>, which is one of the following:
- unique name of the Actor to pull (e.g. "apify/hello-world")
- or ID of the Actor to pull (e.g. "E2jjCZBezvAZnX8Rb")
You can find both by clicking on the Actor title at the top of the page, which will open a modal containing both Actor unique name and Actor ID.

This command will copy the Actor into the current directory on your local machine.
```
$apify pull <ActorId>
```

Documentation reference

To learn more about Apify and Actors, take a look at the following resources:

On this page

Share Actor:

Ask Website with AI

fayoussef/ask-website-with-ai

Analyzes websites using AI (Gemini/OpenAI) to answer questions from scraped content. It can explore internal links for comprehensive answers, taking a list of URLs and questions. Ideal for targeted data extraction and content summarization.

youssef farhan

5.0

AI Website Content Markdown Scraper

quaking_pail/ai-website-content-markdown-scraper

This Apify Actor, "Website Content Crawler with Markdown Extraction," is designed to perform a comprehensive crawl of specified websites, extract their text content, convert it into Markdown format, and store it in a structured dataset. The extracted content is suitable for feeding LLMs.

AI_Builder

607

4.3

/llms.txt Generator

jakub.kopecky/llmstxt-generator

The /llms.txt Generator 🕸️📄 extracts website content to create an llms.txt file for AI apps 🤖✨ like LLM fine-tuning and indexing. Output is available 📥 in the Key-Value Store for easy download and integration into workflows. 🚀

Jakub Kopecký

446

2.7

Website Content Vector Retriever

hamza.alwan/website-content-vector-retriever

Hamza Alwan

AI Content Humanizer 🤖✨

easyapi/ai-content-humanizer

Transform AI-generated content into natural, human-like text with customizable length and tone. Perfect for content creators, marketers, and writers looking to make their AI content more authentic and engaging. 🤖✍️

EasyApi

5.0

Google People Also Ask Scraper

ib4ngz/google-people-also-ask-scraper

Scrapes questions that appear in the `People also ask` section on the Google search results page according to the keyword entered with the max depth you want

Iqbal R

Google Indexing API Bulk URL Submission

mabdulmoghni/google-indexing-api-bulk-url-submission

This Actor allows you to submit multiple URLs for indexing in bulk through Google's Indexing API. It avoids the need to manually request each URL to be indexed via the Google Search Console interface. With this tool, you can quickly submit up to 100 URLs at once.

Mohamed Moo

Google Bulk Index Checker

caprolok/google-bulk-index-checker

Google Bulk Index Checker is a swift, user-friendly tool designed to verify if a website is indexed by Google. It provides instant indexing status updates, helping SEO professionals and webmasters ensure their sites are visible on Google search. Essential for efficient SEO management.

Caprolok

Instagram Business Intelligence Pro

red.cars/instagram-business-intelligence-pro

Transform Instagram data into actionable business insights with zero setup. Get competitor analysis, influencer research, content strategy, and market sentiment from simple natural language queries.