Pricing

$7.00 / 1,000 page crawleds

RAG Browser

This Actor provides essential web browsing and content extraction functionality for AI Agents, LLM applications, and Retrieval-Augmented Generation (RAG) pipelines. It functions similarly to the web search feature in popular LLM chatbots, providing fresh, contextualized data directly from the web.

Pricing

$7.00 / 1,000 page crawleds

Rating

0.0

(0)

Developer

Visita Intelligence

Actor stats

Bookmarked

Total users

Monthly active users

2 months ago

Last modified

🌐 RAG Web Browser

Give your AI agent live web access. This Apify Actor searches Google, scrapes the top result pages, and returns clean Markdown (or plain text / HTML) ready for LLM consumption. Optional chunked output splits content into embedding-ready segments for direct ingestion into vector databases.

Built for OpenAI Assistants, custom GPTs, LangChain, CrewAI, LlamaIndex, and any RAG pipeline that needs real-time web data.

Quick Start

1. Run via Apify API (one-liner)

curl -X POST "https://api.apify.com/v2/acts/YOUR_USERNAME~rag-web-browser/runs?token=YOUR_API_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"query": "latest AI news 2026", "maxResults": 3}'

2. Run via Apify Client (Node.js)

import { ApifyClient } from 'apify-client';

const client = new ApifyClient({ token: 'YOUR_API_TOKEN' });
const run = await client.actor('YOUR_USERNAME/rag-web-browser').call({
    query: 'best practices for RAG pipelines',
    maxResults: 3,
    outputFormats: ['markdown'],
});
const { items } = await client.dataset(run.defaultDatasetId).listItems();
console.log(items[0].markdown);

3. Run via Apify Client (Python)

from apify_client import ApifyClient

client = ApifyClient("YOUR_API_TOKEN")
run = client.actor("YOUR_USERNAME/rag-web-browser").call(
    run_input={"query": "best practices for RAG pipelines", "maxResults": 3}
)
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item["markdown"][:500])

Main Features

Feature	Description
Real-Time Grounding	Queries Google Search for up-to-date information — no stale training data.
Clean Markdown Output	Strips navigation, ads, modals, and scripts. Returns LLM-ready Markdown.
Chunked Output for RAG	Optionally splits each page into overlapping chunks, perfect for embedding into vector DBs.
Hybrid Scraping	Fast `raw-http` mode by default; falls back to full Playwright browser for JS-heavy sites.
Standby / HTTP Mode	Run as a persistent HTTP service with a `/search` endpoint for real-time queries.
MCP Support	Built-in Model Context Protocol server for native AI tool integration.
OpenAPI Spec Included	Plug directly into OpenAI custom GPTs as an Action.

💰 Pay-per-Event (PPE) Pricing

You pay only for the pages you actually get — no CU charges for the Actor run itself.

Event Name	Title	Unit	Price	Description
`apify-default-dataset-item`	Page crawled	Per page	$0.007	Charged each time a web page is successfully crawled and its content is extracted. Failed or skipped pages are not charged.

Example cost: A search with maxResults: 3 that successfully scrapes all 3 pages costs $0.021.

Cost comparison vs. alternatives:

Service	Typical cost (3 results)	Clean Markdown	Chunking	Proxy included
This Actor	~$0.021	Yes	Yes	Yes
Tavily Search API	~$0.005 (snippets only)	Partial	No	N/A
SerpAPI	~$0.01 (SERP only)	No	No	Yes
Brave Search API	~$0.005 (snippets only)	No	No	N/A

⚙️ Input Parameters

Parameter	Type	Default	Description
`query`	string	(required)	Google Search keywords or a specific URL to scrape. Supports advanced operators.
`maxResults`	integer	`3`	Number of top SERP results to scrape (1–100). Ignored when `query` is a URL.
`outputFormats`	array	`["markdown"]`	One or more of: `text`, `markdown`, `html`.
`scrapingTool`	string	`raw-http`	`raw-http` (fast) or `browser-playwright` (handles JS-heavy sites).
`requestTimeoutSecs`	integer	`40`	Max seconds for the entire request.
`maxRequestRetries`	integer	`1`	Retries per target page on failure.
`removeCookieWarnings`	boolean	`true`	Attempt to dismiss cookie consent dialogs.
`debugMode`	boolean	`false`	Include timing/debug info in output.

📤 Output Format

Each result in the dataset is a JSON object:

{
  "metadata": {
    "url": "https://example.com/article",
    "title": "Example Article Title",
    "description": "Meta description of the page",
    "author": "Jane Doe",
    "languageCode": "en"
  },
  "searchResult": {
    "title": "Example Article Title",
    "description": "Google snippet for this result",
    "url": "https://example.com/article",
    "resultType": "ORGANIC",
    "rank": 1
  },
  "markdown": "# Example Article Title\n\nThe full content of the page in clean Markdown...",
  "text": null,
  "html": null,
  "query": "example search query"
}

🔗 Integration Examples

OpenAI Assistants / Custom GPTs

This Actor ships with an .actor/openapi.json you can import directly as a GPT Action:

In the GPT editor, go to Configure → Actions → Create new action.
Import the schema from .actor/openapi.json.
Set the server URL to your Standby endpoint or the Apify API.
Your GPT can now call searchWeb to get live search results.

LangChain (Python)

from langchain_community.utilities import ApifyWrapper

apify = ApifyWrapper()
loader = apify.call_actor(
    actor_id="YOUR_USERNAME/rag-web-browser",
    run_input={"query": "LangChain RAG tutorial", "maxResults": 3},
    dataset_mapping_function=lambda item: item.get("markdown", ""),
)
docs = loader.load()
# docs is a list of Document objects ready for your chain

CrewAI

from crewai_tools import ApifyActorTool

search_tool = ApifyActorTool(
    actor_id="YOUR_USERNAME/rag-web-browser",
    input={"query": "{query}", "maxResults": 3},
    output_key="markdown",
)
# Use search_tool in your CrewAI agent definition

LlamaIndex

from llama_index.readers.apify import ApifyActor

reader = ApifyActor("YOUR_USERNAME/rag-web-browser")
documents = reader.load_data(
    run_input={"query": "vector database comparison 2026", "maxResults": 5}
)
# Feed documents into your LlamaIndex pipeline

Direct HTTP (Standby Mode)

When the Actor runs in Standby mode, query it like any REST API:

$curl "https://YOUR_STANDBY_URL/search?query=latest+AI+news&maxResults=3"

🤖 Use Cases

Ground LLM responses with fresh web data to eliminate hallucinations
Build research agents that autonomously gather and synthesize information
Power AI chatbots with real-time search (like ChatGPT's browse feature)
Feed RAG pipelines with up-to-date documents for question answering
Monitor topics by periodically searching and extracting content
Create datasets of clean web content for fine-tuning or evaluation

License

ISC

Skyscanner Flight ✈️

jupri/skyscanner-flight

💫 Scrape Skyscanner.com Flights

cat

711

1.0

KAYAK Flight Scanner

jupri/kayak-flights

💫 Scrape flight data from Kayak.com

cat

Google Flights API

johnvc/Google-Flights-Data-Scraper-Flight-and-Price-Search

Google Flights API for extracting flight search data. Search one-way, round-trip, and multi-city itineraries. Filter by price, stops, airlines, and passenger counts. Ideal for travel apps, price monitoring, data analytics, and flight comparison. Supports 29+ languages and 39+ countries.

John

946

4.8

Google Flights Scraper

automation-lab/google-flights-scraper

Extract flight search results from Google Flights: airlines, prices, stops, duration, departure/arrival times. One-way & round-trip. All cabin classes. No API key needed.

Stas Persiianenko

Skyscanner Flight Scraper API

makework36/flight-price-scraper

Compare flight prices from 7 sources: Google Flights, Kiwi, Travelpayouts, Ryanair, EasyJet, Wizz Air, Norwegian. One API call.

deusex machine

Skyscanner API Bulk (Flights)

canadesk/skyscanner-flights-api-bulk

Get the best flight itineraries and calendar prices in BULK from Skyscanner! Also supports Multi-city and Explore with various filters. It's fast.

Canadesk Support

Skyscanner Scraper

harvest/skyscanner-scraper

Extracts detailed flight information from Skyscanner including prices, airlines, and schedules based on user-specified search parameters.

Harvest Data

165

1.0

Google Flights

canadesk/google-flights

Find the cheapest deals with Google Flight. It's fast and costs little!

Canadesk Support

357

1.0

Skyscanner API (Flights)

canadesk/skyscanner-flights-api

Get the best flight itineraries and calendar prices from Skyscanner! Supports one-way and round trips with various filters. It's fast and costs little!

Canadesk Support

164

3.0