Pricing

Pay per usage

Go to Store

RAG Web Browser

Try for free

Developed by

Apify

Web browser for OpenAI Assistants, RAG pipelines, or AI agents, similar to a web browser in ChatGPT. It queries Google Search, scrapes the top N pages, and returns their content as Markdown for further processing by an LLM. It can also scrape individual URLs. Supports Model Context Protocol (MCP).

4.3 (10)

Pricing

Pay per usage

128

Total users

4.6K

Monthly users

1.3K

Runs succeeded

>99%

Issues response

1.2 days

Last modified

3 months ago

Open source

Search term or URL

querystringRequired

Enter Google Search keywords or a URL of a specific web page. The keywords might include the advanced search operators. Examples:

san francisco weather
https://www.cnn.com
function calling site:openai.com

Maximum results

maxResultsintegerOptional

The maximum number of top organic Google Search results whose web pages will be extracted. If query is a URL, then this field is ignored and the Actor only fetches the specific web page.

Default value of this property is 3

Output formats

outputFormatsarrayOptional

Select one or more formats to which the target web pages will be extracted and saved in the resulting dataset.

Default value of this property is ["markdown"]

Request timeout

requestTimeoutSecsintegerOptional

The maximum time in seconds available for the request, including querying Google Search and scraping the target web pages. For example, OpenAI allows only 45 seconds for custom actions. If a target page loading and extraction exceeds this timeout, the corresponding page will be skipped in results to ensure at least some results are returned within the timeout. If no page is extracted within the timeout, the whole request fails.

Default value of this property is 40

SERP proxy group

serpProxyGroupEnumOptional

Enables overriding the default Apify Proxy group used for fetching Google Search results.

Value options:

"GOOGLE_SERP": string"SHADER": string

Default value of this property is "GOOGLE_SERP"

SERP max retries

serpMaxRetriesintegerOptional

The maximum number of times the Actor will retry fetching the Google Search results on error. If the last attempt fails, the entire request fails.

Default value of this property is 2

Proxy configuration

proxyConfigurationobjectOptional

Apify Proxy configuration used for scraping the target web pages.

Default value of this property is {"useApifyProxy":true}

Select a scraping tool

scrapingToolEnumOptional

Select a scraping tool for extracting the target web pages. The Browser tool is more powerful and can handle JavaScript heavy websites, while the Plain HTML tool can't handle JavaScript but is about two times faster.

Value options:

"browser-playwright": string"raw-http": string

Default value of this property is "raw-http"

Remove HTML elements (CSS selector)

removeElementsCssSelectorstringOptional

A CSS selector matching HTML elements that will be removed from the DOM, before converting it to text, Markdown, or saving as HTML. This is useful to skip irrelevant page content. The value must be a valid CSS selector as accepted by the document.querySelectorAll() function.

By default, the Actor removes common navigation elements, headers, footers, modals, scripts, and inline image. You can disable the removal by setting this value to some non-existent CSS selector like dummy_keep_everything.

Default value of this property is "nav, footer, script, style, noscript, svg, img[src^='data:'],\n[role=\"alert\"],\n[role=\"banner\"],\n[role=\"dialog\"],\n[role=\"alertdialog\"],\n[role=\"region\"][aria-label*=\"skip\" i],\n[aria-modal=\"true\"]"

HTML transformer

htmlTransformerstringOptional

Specify how to transform the HTML to extract meaningful content without any extra fluff, like navigation or modals. The HTML transformation happens after removing and clicking the DOM elements.

None (default) - Only removes the HTML elements specified via 'Remove HTML elements' option.
Readable text - Extracts the main contents of the webpage, without navigation and other fluff.

Default value of this property is "none"

Desired browsing concurrency

desiredConcurrencyintegerOptional

The desired number of web browsers running in parallel. The system automatically scales the number based on the CPU and memory usage. If the initial value is 0, the Actor picks the number automatically based on the available memory.

Default value of this property is 5

Target page max retries

maxRequestRetriesintegerOptional

The maximum number of times the Actor will retry loading the target web page on error. If the last attempt fails, the page will be skipped in the results.

Default value of this property is 1

Target page dynamic content timeout

dynamicContentWaitSecsintegerOptional

The maximum time in seconds to wait for dynamic page content to load. The Actor considers the web page as fully loaded once this time elapses or when the network becomes idle.

Default value of this property is 10

Remove cookie warnings

removeCookieWarningsbooleanOptional

If enabled, the Actor attempts to close or remove cookie consent dialogs to improve the quality of extracted text. Note that this setting increases the latency.

Default value of this property is true

Enable debug mode

debugModebooleanOptional

If enabled, the Actor will store debugging information into the resulting dataset under the debug field.

Default value of this property is false

Browser Use Apify

lexis-solutions/browser-use-apify

Open-source AI-powered browser automation based on Browser Use and hosted on Apify. Run any task using Apify's platform and LLMs like ChatGPT, Claude, etc. Easy, scalable, resilient, and hosted solution for web-enabled AI agents.

Lexis Solutions

167

5.0

Web Scraper

apify/web-scraper

Crawls arbitrary websites using a web browser and extracts structured data from web pages using a provided JavaScript function. The Actor supports both recursive crawling and lists of URLs, and automatically manages concurrency for maximum performance.

Apify

90K

4.5

Website Content Crawler

apify/website-content-crawler

Crawl websites and extract text content to feed AI models, LLM applications, vector databases, or RAG pipelines. The Actor supports rich formatting using Markdown, cleans the HTML, downloads files, and integrates well with 🦜🔗 LangChain, LlamaIndex, and the wider LLM ecosystem.

Apify

63K

4.0

Dynamic Web Scraper

josejet/dynamic-web-scraper

Dynamic Web Scraper is an Apify Actor that gathers information online by simulating user browsing behavior on the web. It reduces the time and amount of scraped web pages by using a model (ChatGPT) to make decisions regarding browser navigation and results evaluation.

Pepa J W̚͠h̾̔̎̿͊͛̄͊e̢̦̲̰̦̋̇͗̾̑oi̟͈̯̝̊̉́̇͑̕ğ̆͘͡e͗͛o͊̔̇̄

147

Playwright MCP Server

jiri.spilka/playwright-mcp-server

A Model Context Protocol (MCP) server that provides browser automation capabilities using Playwright

Jiří Spilka

Actors MCP Server

apify/actors-mcp-server

⚠️ Legacy: This Actor is outdated. For the latest features and full documentation, visit https://mcp.apify.com. Easily connect any Apify Actor to AI agents using Anthropic’s Model Context Protocol (MCP) with our actively maintained MCP server.

Apify

1.9K

4.7

AI Web Scraper - Powered by Crawl4AI

raizen/ai-web-scraper

A blazing-fast AI web scraper powered by Crawl4AI. Perfect for LLMs, AI agents, AI automation, model training, sentiment analysis, and content generation. Supports deep crawling, multiple extraction strategies and flexible output (Markdown/JSON). Seamlessly integrates with Make.com, n8n, and Zapier.

Raizen Technology

162

1.0

AI-Powered Web Content & Link Extractor

scrapercoder/ai-powered-web-content-link-extractor

Crawls websites to extract clean, structured content for AI/LLM use, ideal for training datasets, knowledge bases, and RAG systems. Json output includes: * text: Normalized page content * links: Extracted sub-URLs

wallnut.ai

🔥 FireScrape AI Website Content Markdown Scraper

mohamedgb00714/fireScraper-AI-Website-Content-Markdown-Scraper

Advanced web scraper powered by Crawlee and Puppeteer — extracts website content, converts it to Markdown, and structures it for LLM training datasets.

mohamed el hadi msaid

3.5

Tester MCP Client

jiri.spilka/tester-mcp-client

A model context protocol (MCP) client that connects to any MCP server using Server-Sent Events (SSE) and displays the conversation in a chat-like UI. It is a standalone Actor server designed for testing MCP servers over SSE.

Jiří Spilka

606

4.9