RAG Web Browser avatar
RAG Web Browser

Pricing

Pay per usage

Go to Store
RAG Web Browser

RAG Web Browser

Developed by

Apify

Maintained by Apify

Web browser for OpenAI Assistants, RAG pipelines, or AI agents, similar to a web browser in ChatGPT. It queries Google Search, scrapes the top N pages from the results, and returns their cleaned content as Markdown for further processing by an LLM. It can also scrape individual URLs. Supports MCP.

4.4 (11)

Pricing

Pay per usage

75

Monthly users

847

Runs succeeded

>99%

Response time

2.6 days

Last modified

18 days ago

ZO

Doesnt seem to remove images from returned markdown

Open

zognotadog opened this issue
3 months ago

The crawler seems to be outputting image data in the markdown even though its meant to strip it. This was confirmed with a few runs on our own site an apify.com

Has something gone wrong? The Website Content Crawler does not have this issue.

-------- RAG Web Browser Result ---------------

Apify: Full-stack web scraping and data extraction platformStar apify/crawlee on GitHubRib StichBack ButtonSearch IconFilter Icon Skip to content Contact sales Log in Get started # Your full‑stack platform for web scraping Apify is the largest ecosystem where developers build, deploy, and publish data extraction and web automation tools

-------- Website Content Cralwer Result for Apify.com --------

Full-stack web scraping and data extraction platform Apify is the largest ecosystem where developers build, deploy, and publish data extraction and web automation tools. We call them Actors. TikTok Data Extractor clockworks/free-tiktok-scraper Extract data about videos, users, and channels based on hashtags or...

jiri.spilka avatar

Hi, thank you for using RAG Web Browser.
I appreciate your detailed explanation.

The RAG Web Browser has a slightly different configuration. To keep settings simple, it outputs raw page content without transformation, unlike the Website Content Crawler, which uses the readableText option. This option can sometimes remove content and isn’t 100% reliable. Instead, in RAG Web Browser, we let the LLM determine what content is useful by setting "htmlTransformer": "none".

When I run Website Content Crawler with "htmlTransformer": "none", I receive similar output to the RAG Web Browser.

  • RAG Web Browser: run
    "Apify: Full-stack web scraping and data extraction platformStar apify/crawlee on GitHubRib StichBack ButtonSearch IconFilter Icon\n\nSkip to content
  • Website Content Crawler: run
    "Apify: Full-stack web scraping and data extraction platform\n\nSkip to content

Interestingly, there is a bit more processing Website Content Crawler is doing. If you want both Actors to produce identical output, it should be possible. However, I encountered an issue when testing this and couldn't quickly figure out the cause.

Would you like to use RAG Web Browser with this configuration? If so, I can look into that further.

Apologies for any inconvenience. Jiri

Pricing

Pricing model

Pay per usage

This Actor is paid per platform usage. The Actor is free to use, and you only pay for the Apify platform usage.