# RAG Web Browser (`parseforge/rag-web-browser`) Actor

Give your AI agents real-time web access! Search the web on any topic and get full page content as clean Markdown, ready for LLMs, RAG pipelines, or OpenAI Assistants. Includes titles, descriptions, links, authors, images, and metadata. Start grounding your AI with fresh data in minutes!

- **URL**: https://apify.com/parseforge/rag-web-browser.md
- **Developed by:** [ParseForge](https://apify.com/parseforge) (community)
- **Categories:** AI, Automation
- **Stats:** 4 total users, 3 monthly users, 100.0% runs succeeded, NaN bookmarks
- **User rating**: No ratings yet

## Pricing

Pay per event

This Actor is paid per event. You are not charged for the Apify platform usage, but only a fixed price for specific events.
Since this Actor supports Apify Store discounts, the price gets lower the higher subscription plan you have.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-event

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

![ParseForge Banner](https://github.com/ParseForge/apify-assets/blob/ad35ccc13ddd068b9d6cba33f323962e39aed5b2/banner.jpg?raw=true)

## 🤖 RAG Web Browser

> 🚀 **Give your LLM live web access in seconds.** Search the web or fetch specific URLs and return **clean Markdown with 17 metadata fields** per page. No API key, no registration, no manual content cleaning.

> 🕒 **Last updated:** 2026-04-24 · **📊 17 fields** per record · **⚡ 10 pages in ~6 seconds** · **🔎 Search + fetch** · **🧠 LLM-optimized output**

The **RAG Web Browser** is built for retrieval-augmented generation pipelines, autonomous agents, and any workflow where an LLM needs grounded, up-to-date web content. Send a search query to get the top N results, or pass a list of URLs to fetch them directly. Every page is stripped of navigation, ads, and boilerplate, then converted to clean Markdown that feeds directly into embedding pipelines and vector databases.

Each record ships with rich metadata including title, description, author, published time, modified time, site name, Open Graph image, language, word count, and estimated reading time. Search results include a `rankFromSearch` field so you can weight retrieval by original engine position. Concurrent fetching keeps 10 URLs flying in parallel, so research agents stay snappy and RAG refreshes finish while your coffee is still hot.

| 🎯 Target Audience | 💡 Primary Use Cases |
|---|---|
| AI engineers, RAG builders, research agent developers, LLM app teams, content researchers, data scientists | Live RAG context, agent web browsing, knowledge base refresh, competitive intelligence, fact-grounding |

---

### 📋 What the RAG Web Browser does

Five content workflows in a single run:

- 🔎 **Search mode.** Pass a text query and get the top N results from DuckDuckGo with clean content for each.
- 🎯 **URL mode.** Provide specific URLs and the scraper fetches them in parallel.
- 📝 **Clean Markdown.** Strips navigation, footers, sidebars, scripts, and ads. Preserves headings, lists, blockquotes, and code blocks.
- 📊 **Rich metadata.** Title, description, author, publishedTime, modifiedTime, siteName, og:image, language, word count, reading time.
- 🏆 **Search rank preserved.** When searching, every result keeps its rank position so you can weight retrieval accordingly.

Output comes as markdown, plain text, or raw HTML. You can also request an outbound-link dump when you need to follow references.

> 💡 **Why it matters:** LLMs trained on data older than six months cannot answer questions about today's news, pricing, or product documentation. This Actor gives them a live window on the web without you having to build browser automation, proxies, or content cleaners.

---

### 🎬 Full Demo

_🚧 Coming soon: a 3-minute walkthrough showing how to wire the output into a RAG stack._

---

### ⚙️ Input

<table>
<thead>
<tr><th>Input</th><th>Type</th><th>Default</th><th>Behavior</th></tr>
</thead>
<tbody>
<tr><td><code>query</code></td><td>string</td><td><code>""</code></td><td>Search query (use this OR startUrls). Engine is DuckDuckGo.</td></tr>
<tr><td><code>startUrls</code></td><td>array of URLs</td><td><code>[]</code></td><td>Specific URLs to fetch (use this OR query).</td></tr>
<tr><td><code>maxResults</code></td><td>integer</td><td><code>10</code></td><td>Search results to fetch when using query mode.</td></tr>
<tr><td><code>maxItems</code></td><td>integer</td><td><code>10</code></td><td>Records returned. Free plan caps at 10, paid plan at 1,000,000.</td></tr>
<tr><td><code>outputFormats</code></td><td>array</td><td><code>["markdown","text"]</code></td><td>Subset of <code>markdown</code>, <code>text</code>, <code>html</code>.</td></tr>
<tr><td><code>includeLinks</code></td><td>boolean</td><td><code>false</code></td><td>Include every outbound link from each page.</td></tr>
</tbody>
</table>

**Example: search mode for Claude pricing research.**

```json
{
    "query": "Anthropic Claude API pricing 2026",
    "maxResults": 10,
    "maxItems": 10,
    "outputFormats": ["markdown", "text"]
}
````

**Example: fetch a list of known URLs for a RAG refresh.**

```json
{
    "startUrls": [
        { "url": "https://docs.apify.com/platform/actors" },
        { "url": "https://docs.apify.com/platform/schedules" },
        { "url": "https://docs.apify.com/api/v2" }
    ],
    "maxItems": 3,
    "outputFormats": ["markdown"],
    "includeLinks": true
}
```

> ⚠️ **Good to Know:** single-page apps with heavy client-side rendering sometimes return thin content because the scraper fetches server-rendered HTML. For JavaScript-heavy sites (Notion, Gitbook, some app dashboards), pair this Actor with Website Content Crawler and its browser rendering mode.

***

### 📊 Output

Each record contains **17 fields**. Download the dataset as CSV, Excel, JSON, or XML.

#### 🧾 Schema

| Field | Type | Example |
|---|---|---|
| 🔗 `url` | string | `"https://www.anthropic.com/pricing"` |
| 🏷️ `title` | string | null | `"Claude Pricing | Anthropic"` |
| 📝 `description` | string | null | `"Explore pricing for Claude models."` |
| 📃 `markdown` | string | `"## Claude Pricing\n\n### Opus 4.6..."` |
| 💬 `text` | string | `"Claude Pricing Opus 4.6..."` |
| 🧾 `html` | string | null | raw HTML if requested |
| 🔗 `links` | array | null | outbound links if requested |
| 🔢 `wordCount` | number | `1240` |
| ⏱️ `readingTimeMinutes` | number | `7` |
| 🌍 `language` | string | null | `"en"` |
| 🧑 `author` | string | null | `"Anthropic Team"` |
| 📅 `publishedTime` | ISO 8601 | null | `"2025-02-24T00:00:00Z"` |
| 🔁 `modifiedTime` | ISO 8601 | null | `"2025-03-10T00:00:00Z"` |
| 🏢 `siteName` | string | null | `"Anthropic"` |
| 🖼️ `imageUrl` | string | null | `"https://.../og.png"` |
| 🏆 `rankFromSearch` | number | null | `1` |
| 🕒 `fetchedAt` | ISO 8601 | `"2026-04-21T12:00:00.000Z"` |
| 🟢 `httpStatus` | number | `200` |
| ⏱️ `responseTimeMs` | number | `412` |
| ❗ `error` | string | null | `"Timeout"` on failure |

#### 📦 Sample records

<details>
<summary><strong>📚 Typical documentation page</strong></summary>

```json
{
    "url": "https://docs.apify.com/platform/actors",
    "title": "Actors | Apify Documentation",
    "description": "Learn how Apify Actors package scrapers and automation into reusable tools.",
    "markdown": "## Actors\n\nAn **Actor** is a serverless program that runs on the Apify platform...",
    "text": "Actors An Actor is a serverless program that runs on the Apify platform...",
    "wordCount": 860,
    "readingTimeMinutes": 5,
    "language": "en",
    "author": "Apify",
    "publishedTime": "2024-08-15T00:00:00Z",
    "modifiedTime": null,
    "siteName": "Apify Documentation",
    "imageUrl": "https://docs.apify.com/og-image.png",
    "rankFromSearch": null,
    "httpStatus": 200,
    "responseTimeMs": 210,
    "fetchedAt": "2026-04-21T12:00:00.000Z"
}
```

</details>

<details>
<summary><strong>🎯 Top search result with rank</strong></summary>

```json
{
    "url": "https://lakefs.io/blog/llm-observability-tools/",
    "title": "LLM Observability Tools: 2026 Comparison",
    "description": "A deep comparison of the leading LLM observability platforms.",
    "markdown": "## LLM Observability Tools\n\n### Introduction\n...",
    "text": "LLM Observability Tools Introduction ...",
    "wordCount": 3120,
    "readingTimeMinutes": 16,
    "language": "en",
    "author": "lakeFS Team",
    "publishedTime": "2026-02-18T00:00:00Z",
    "siteName": "lakeFS",
    "imageUrl": "https://lakefs.io/wp-content/og.png",
    "rankFromSearch": 1,
    "httpStatus": 200,
    "responseTimeMs": 512,
    "fetchedAt": "2026-04-21T12:00:00.000Z"
}
```

</details>

<details>
<summary><strong>🚧 Sparse page with minimal metadata</strong></summary>

```json
{
    "url": "https://example-startup.com/",
    "title": "Example Startup",
    "description": null,
    "markdown": "## Example Startup\n\nWelcome.",
    "text": "Example Startup Welcome.",
    "wordCount": 4,
    "readingTimeMinutes": 1,
    "language": null,
    "author": null,
    "publishedTime": null,
    "modifiedTime": null,
    "siteName": null,
    "imageUrl": null,
    "rankFromSearch": 4,
    "httpStatus": 200,
    "responseTimeMs": 180,
    "fetchedAt": "2026-04-21T12:00:00.000Z"
}
```

</details>

***

### ✨ Why choose this Actor

| | Capability |
|---|---|
| 🧠 | **LLM-ready output.** Markdown is clean, deterministic, and free of navigation noise. |
| 🔎 | **Search or fetch.** One input for search, another for direct URLs, same clean output. |
| 📊 | **17 metadata fields.** Enrich retrieval with author, publishedTime, reading time, and rank. |
| ⚡ | **Fast.** 10 pages in about 6 seconds with concurrency of 10. |
| 🔁 | **Repeatable.** Same URL + same query always produces the same structured record. |
| 🚫 | **No authentication.** Works with public URLs and the public DuckDuckGo HTML endpoint. |
| 🔌 | **Integrations.** Drop into LangChain, LlamaIndex, or any tool that can consume JSON records. |

> 📊 Clean markdown from live web context is the fastest way to extend an LLM beyond its training cutoff. This Actor delivers it without browser automation or custom cleaners.

***

### 📈 How it compares to alternatives

| Approach | Cost | Coverage | Refresh | Filters | Setup |
|---|---|---|---|---|---|
| **⭐ RAG Web Browser** *(this Actor)* | $5 free credit, then pay-per-use | Any public URL | **Live per run** | search + URL list, format picker | ⚡ 2 min |
| Paid live search APIs | $99+/month | Search results only | Real-time | Query only | ⏳ Hours |
| DIY Playwright scrapers | Free | Your code | Your schedule | Whatever you build | 🐢 Days |
| Headless browser cloud | $$$ per hour | Any URL | Live | Custom scripts | 🕒 Variable |

Pick this Actor when you want a LLM-ready web context in minutes without cloud browser billing or custom cleaner code.

***

### 🚀 How to use

1. 📝 **Sign up.** [Create a free account with $5 credit](https://console.apify.com/sign-up?fpr=vmoqkp) (takes 2 minutes).
2. 🌐 **Open the Actor.** Go to the RAG Web Browser page on the Apify Store.
3. 🎯 **Pick a mode.** Enter a search query OR a list of URLs, set `maxItems`, and choose output formats.
4. 🚀 **Run it.** Click **Start** and let the Actor collect your content.
5. 📥 **Download.** Grab your results in the **Dataset** tab as CSV, Excel, JSON, or XML.

> ⏱️ Total time from signup to downloaded dataset: **3-5 minutes.** No coding required.

***

### 💼 Business use cases

<table>
<tr>
<td width="50%" valign="top">

#### 🧠 AI Engineering & RAG

- Keep vector databases current with live content
- Build research agents that cite real sources
- Ground chatbots in documentation at run time
- Power context retrieval for customer support

</td>
<td width="50%" valign="top">

#### 📚 Knowledge Management

- Scheduled refresh of internal wikis
- Sync external docs into a searchable index
- Collect press releases and product announcements
- Back up external blog posts with attribution

</td>
</tr>
<tr>
<td width="50%" valign="top">

#### 📊 Competitive Intelligence

- Monitor competitor blogs and pricing pages
- Track announcements across an industry
- Watch for product changes on rival sites
- Feed curated sources into an analyst LLM

</td>
<td width="50%" valign="top">

#### 🧑‍💻 Developer Tooling

- Augment coding assistants with fresh docs
- Add current library release notes to prompts
- Keep CLI help text in sync with docs sites
- Pipe issue trackers into an LLM workflow

</td>
</tr>
</table>

***

### 🔌 Automating RAG Web Browser

Control the scraper programmatically for scheduled runs and pipeline integrations:

- 🟢 **Node.js.** Install the `apify-client` NPM package.
- 🐍 **Python.** Use the `apify-client` PyPI package.
- 📚 See the [Apify API documentation](https://docs.apify.com/api/v2) for full details.

The [Apify Schedules feature](https://docs.apify.com/platform/schedules) lets you trigger this Actor on any cron interval. Hourly refreshes keep a RAG pipeline grounded in fresh content.

### 🌟 Beyond business use cases

Data like this powers more than commercial workflows. The same structured records support research, education, civic projects, and personal initiatives.

<table>
<tr>
<td width="50%">

#### 🎓 Research and academia

- Empirical datasets for papers, thesis work, and coursework
- Longitudinal studies tracking changes across snapshots
- Reproducible research with cited, versioned data pulls
- Classroom exercises on data analysis and ethical scraping

</td>
<td width="50%">

#### 🎨 Personal and creative

- Side projects, portfolio demos, and indie app launches
- Data visualizations, dashboards, and infographics
- Content research for bloggers, YouTubers, and podcasters
- Hobbyist collections and personal trackers

</td>
</tr>
<tr>
<td width="50%">

#### 🤝 Non-profit and civic

- Transparency reporting and accountability projects
- Advocacy campaigns backed by public-interest data
- Community-run databases for local issues
- Investigative journalism on public records

</td>
<td width="50%">

#### 🧪 Experimentation

- Prototype AI and machine-learning pipelines with real data
- Validate product-market hypotheses before engineering spend
- Train small domain-specific models on niche corpora
- Test dashboard concepts with live input

</td>
</tr>
</table>

***

### 🤖 Ask an AI assistant about this scraper

Open a ready-to-send prompt about this ParseForge actor in the AI of your choice:

- 💬 [**ChatGPT**](https://chat.openai.com/?q=How%20do%20I%20use%20the%20RAG%20WEB%20BROWSER%20Scraper%20by%20ParseForge%20on%20Apify%3F%20Show%20me%20input%20examples%2C%20output%20fields%2C%20common%20use%20cases%2C%20and%20how%20to%20integrate%20it%20into%20a%20workflow.)
- 🧠 [**Claude**](https://claude.ai/new?q=How%20do%20I%20use%20the%20RAG%20WEB%20BROWSER%20Scraper%20by%20ParseForge%20on%20Apify%3F%20Show%20me%20input%20examples%2C%20output%20fields%2C%20common%20use%20cases%2C%20and%20how%20to%20integrate%20it%20into%20a%20workflow.)
- 🔍 [**Perplexity**](https://perplexity.ai/search?q=How%20do%20I%20use%20the%20RAG%20WEB%20BROWSER%20Scraper%20by%20ParseForge%20on%20Apify%3F%20Show%20me%20input%20examples%2C%20output%20fields%2C%20common%20use%20cases%2C%20and%20how%20to%20integrate%20it%20into%20a%20workflow.)
- 🅒 [**Copilot**](https://copilot.microsoft.com/?q=How%20do%20I%20use%20the%20RAG%20WEB%20BROWSER%20Scraper%20by%20ParseForge%20on%20Apify%3F%20Show%20me%20input%20examples%2C%20output%20fields%2C%20common%20use%20cases%2C%20and%20how%20to%20integrate%20it%20into%20a%20workflow.)

***

***

### ❓ Frequently Asked Questions

#### 🧩 How does it work?

Pass a search query or a list of URLs. For search mode, the Actor queries DuckDuckGo and fetches the top results in parallel. For URL mode, it fetches each URL directly. Every page is cleaned to remove navigation, ads, and boilerplate, then converted to Markdown with metadata.

#### 📏 How accurate is the content extraction?

Very accurate for article and documentation pages. Pages that rely entirely on client-side JavaScript to render content may return thin results; pair with Website Content Crawler in browser mode for those.

#### 🔁 Can I refresh a RAG index on a schedule?

Yes. Apify Schedules lets you run this Actor on any cron interval. Pipe the output into your vector database via webhooks or the Apify API.

#### 🎯 Which search engine does search mode use?

DuckDuckGo HTML, which is reliable and does not require authentication. For Google-specific SERPs, use the Google Search Scraper.

#### ⏰ Can I schedule regular runs?

Yes. Use Apify Schedules to run this Actor on any cron interval and keep your knowledge base in sync.

#### ⚖️ Is it legal to use for RAG?

Fetching publicly available content is generally fine. Check your target sites' terms of service and robots.txt. Some publishers require attribution or block commercial reuse.

#### 💼 Can I use this commercially?

Yes. Public web content is commonly used for RAG, research, and commercial AI products. Respect copyright and the licensing of each source.

#### 💳 Do I need a paid Apify plan to use this Actor?

No. The free plan covers testing (10 pages per run). A paid plan lifts the limit, speeds up concurrency, and gives you access to Apify residential proxy.

#### 🔁 What happens if a run fails or gets interrupted?

Apify retries transient errors automatically. Partial datasets from failed runs are preserved. Failed URLs include an `error` field so you can filter them downstream.

#### 🧾 Does it strip scripts and ads?

Yes. Script, style, noscript, iframe, nav, footer, header, and aside tags are removed before conversion to Markdown.

#### 🔗 Can I also get outbound links?

Yes. Set `includeLinks: true` and each record will include every `<a href>` found on the page with its text label.

#### 🆘 What if I need help?

Our team is available through the Apify platform and the Tally form below.

***

### 🔌 Integrate with any app

RAG Web Browser connects to any cloud service via [Apify integrations](https://apify.com/integrations):

- [**Make**](https://docs.apify.com/platform/integrations/make) - Automate multi-step workflows
- [**Zapier**](https://docs.apify.com/platform/integrations/zapier) - Connect with 5,000+ apps
- [**Slack**](https://docs.apify.com/platform/integrations/slack) - Get run notifications in your channels
- [**Airbyte**](https://docs.apify.com/platform/integrations/airbyte) - Pipe content into your warehouse
- [**GitHub**](https://docs.apify.com/platform/integrations/github) - Trigger runs from commits
- [**Google Drive**](https://docs.apify.com/platform/integrations/drive) - Export content to Sheets or Docs

You can also use webhooks to push freshly fetched Markdown into vector databases and any downstream RAG stack.

***

### 🔗 Recommended Actors

- [**🕸️ Website Content Crawler**](https://apify.com/parseforge/website-content-crawler) - Deep-crawl a domain with depth and JS rendering
- [**📰 Smart Article Extractor**](https://apify.com/parseforge/article-extractor) - Extract clean article text from news sites
- [**🔍 Google Search Scraper**](https://apify.com/parseforge/google-search-scraper) - SERP results with rank and description
- [**📧 Contact Info Scraper**](https://apify.com/parseforge/contact-info-scraper) - Emails, phones, and socials from URLs
- [**📸 URL Screenshot Tool**](https://apify.com/parseforge/screenshot-url) - Full-page screenshots as PNG, JPEG, or PDF

> 💡 **Pro Tip:** browse the complete [ParseForge collection](https://apify.com/parseforge) for more AI-ready web tools.

***

**🆘 Need Help?** [**Open our contact form**](https://tally.so/r/BzdKgA) to request a new scraper, propose a custom data project, or report an issue.

***

> **⚠️ Disclaimer:** this Actor is an independent tool and is not affiliated with any search engine or website. Only publicly accessible web content is fetched. Respect the robots.txt and terms of service of every site you add to the input.

# Actor input Schema

## `query` (type: `string`):

Web search query. Alternative to Start URLs.

## `startUrls` (type: `array`):

Specific URLs to fetch. Alternative to query.

## `maxResults` (type: `integer`):

Number of search results to fetch (when using query).

## `maxItems` (type: `integer`):

Free users: Limited to 10 items (preview). Paid users: Optional, max 1,000,000

## `outputFormats` (type: `array`):

Which content formats to include. markdown is always included.

## `includeLinks` (type: `boolean`):

Include all outbound links from each page.

## Actor input object example

```json
{
  "query": "Anthropic Claude API pricing",
  "maxResults": 10,
  "maxItems": 10,
  "outputFormats": [
    "markdown",
    "text"
  ]
}
```

# Actor output Schema

## `results` (type: `string`):

Complete dataset

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {
    "query": "Anthropic Claude API pricing",
    "maxResults": 10,
    "maxItems": 10,
    "outputFormats": [
        "markdown",
        "text"
    ],
    "includeLinks": false
};

// Run the Actor and wait for it to finish
const run = await client.actor("parseforge/rag-web-browser").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = {
    "query": "Anthropic Claude API pricing",
    "maxResults": 10,
    "maxItems": 10,
    "outputFormats": [
        "markdown",
        "text",
    ],
    "includeLinks": False,
}

# Run the Actor and wait for it to finish
run = client.actor("parseforge/rag-web-browser").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{
  "query": "Anthropic Claude API pricing",
  "maxResults": 10,
  "maxItems": 10,
  "outputFormats": [
    "markdown",
    "text"
  ],
  "includeLinks": false
}' |
apify call parseforge/rag-web-browser --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=parseforge/rag-web-browser",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "RAG Web Browser",
        "description": "Give your AI agents real-time web access! Search the web on any topic and get full page content as clean Markdown, ready for LLMs, RAG pipelines, or OpenAI Assistants. Includes titles, descriptions, links, authors, images, and metadata. Start grounding your AI with fresh data in minutes!",
        "version": "1.0",
        "x-build-id": "ZaPpbHYy92NCBWWY8"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/parseforge~rag-web-browser/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-parseforge-rag-web-browser",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/parseforge~rag-web-browser/runs": {
            "post": {
                "operationId": "runs-sync-parseforge-rag-web-browser",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/parseforge~rag-web-browser/run-sync": {
            "post": {
                "operationId": "run-sync-parseforge-rag-web-browser",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "properties": {
                    "query": {
                        "title": "Search Query",
                        "type": "string",
                        "description": "Web search query. Alternative to Start URLs."
                    },
                    "startUrls": {
                        "title": "Start URLs",
                        "type": "array",
                        "description": "Specific URLs to fetch. Alternative to query.",
                        "items": {
                            "type": "object",
                            "required": [
                                "url"
                            ],
                            "properties": {
                                "url": {
                                    "type": "string",
                                    "title": "URL of a web page",
                                    "format": "uri"
                                }
                            }
                        }
                    },
                    "maxResults": {
                        "title": "Max Search Results",
                        "minimum": 1,
                        "maximum": 100,
                        "type": "integer",
                        "description": "Number of search results to fetch (when using query)."
                    },
                    "maxItems": {
                        "title": "Max Items",
                        "minimum": 1,
                        "maximum": 1000000,
                        "type": "integer",
                        "description": "Free users: Limited to 10 items (preview). Paid users: Optional, max 1,000,000"
                    },
                    "outputFormats": {
                        "title": "Output Formats",
                        "type": "array",
                        "description": "Which content formats to include. markdown is always included.",
                        "items": {
                            "type": "string",
                            "enum": [
                                "markdown",
                                "text",
                                "html"
                            ]
                        }
                    },
                    "includeLinks": {
                        "title": "Include Links",
                        "type": "boolean",
                        "description": "Include all outbound links from each page."
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
