Browser Mcp avatar
Browser Mcp
Under maintenance

Pricing

from $0.01 / 1,000 results

Go to Apify Store
Browser Mcp

Browser Mcp

Under maintenance

The Browser MCP Actor combines Browser with the Model Context Protocol (MCP) to let AI agents control web browsers via a standardized interface. It enables navigation, data extraction, form filling, testing, and complex web automation.

Pricing

from $0.01 / 1,000 results

Rating

0.0

(0)

Developer

Nahom D

Nahom D

Maintained by Community

Actor stats

0

Bookmarked

1

Total users

0

Monthly active users

5 days ago

Last modified

Share

Browser MCP Actor - RAG Web Browser

Browser automation bridge for AI agents using the Model Context Protocol (MCP) with RAG optimization

The Browser MCP Actor integrates the robust browser automation capabilities of Scrapling (powered by Camoufox) with the Model Context Protocol (MCP), enabling AI agents and language models to perform web scraping, testing, and automation tasks through a standardized interface. This Actor acts as a bridge between AI systems and web browsers, allowing models to navigate websites, extract data, fill forms, and perform complex browser interactions.

NEW: Includes RAG-optimized content extraction, Google Search integration, and intelligent content processing for LLM consumption.

🎯 Key Features

  • MCP Protocol Integration: Facilitates seamless communication between AI agents and web browsers
  • RAG-Optimized Content Extraction: Clean markdown, text, and HTML output perfect for LLM consumption
  • Google Search Integration: Query Google and scrape top results automatically
  • Intelligent Content Processing: Remove navigation, ads, cookie banners; extract readable content
  • Comprehensive Browser Automation: Supports multiple browsers with device emulation and stealth mode
  • Cloudflare Challenge Solving: Automatically bypasses Cloudflare protection when enabled
  • Intelligent Element Detection: Includes retry mechanisms and error handling for robust automation
  • Multiple Output Formats: Text, Markdown, and HTML formats (["markdown", "text", "html"])
  • Session Management: Maintains persistent browser sessions for complex multi-step workflows
  • State Persistence: Automatically saves and restores state during server migrations
  • Proxy Support: Built-in support for Apify proxy and custom proxy configurations

πŸ‘₯ Target Audience

  • AI Developers: Building autonomous agents that need web interaction capabilities
  • QA Engineers: Implementing AI-assisted testing workflows
  • Data Scientists: Requiring intelligent web scraping solutions
  • Businesses: Looking to automate web-based processes through conversational AI interfaces

πŸš€ Benefits

  • Reduced Development Time: Eliminates the need for custom browser automation code
  • Enhanced Reliability: Features AI-driven error recovery and adaptive element selection
  • Improved Accessibility: Allows non-technical users to describe tasks in natural language
  • Scalable Automation: Handles dynamic websites and complex user workflows with minimal manual intervention

πŸ“‹ Available Tools

RAG-Optimized Tools

The all-in-one tool for RAG pipelines: Search Google or scrape a URL, automatically extract clean content optimized for LLM consumption.

Parameters:

  • query (required): Google Search keywords OR a direct URL to scrape
  • maxResults (optional): Maximum search results to scrape (default: 3, max: 10)
  • outputFormats (optional): Array of formats - text, markdown, html (default: ["markdown"])
  • htmlTransformer (optional): none or readability for main content extraction (default: none)
  • removeElements (optional): Array of CSS selectors for elements to remove
  • removeCookieWarnings (optional): Remove cookie consent dialogs (default: true)
  • solveCloudflare (optional): Solve Cloudflare challenges (default: true)
  • proxy (optional): Proxy URL
  • timeout (optional): Request timeout in milliseconds (default: 40000)

Example - Search and scrape:

{
"command": "rag_web_browser",
"arguments": {
"query": "python async programming best practices",
"maxResults": 3,
"outputFormats": ["markdown", "text"],
"htmlTransformer": "readability",
"solveCloudflare": true
}
}

Example - Direct URL:

{
"command": "rag_web_browser",
"arguments": {
"query": "https://docs.python.org/3/library/asyncio.html",
"outputFormats": ["markdown"]
}
}

Search Google and return top organic results with URLs, titles, and descriptions.

Parameters:

  • query (required): Search query
  • maxResults (optional): Maximum results (default: 5, max: 20)
  • proxy (optional): Proxy URL

Example:

{
"command": "google_search",
"arguments": {
"query": "machine learning tutorials site:medium.com",
"maxResults": 10
}
}

3. Extract Content

Extract and process content from a current browser session in RAG-optimized formats.

Parameters:

  • session_id (required): Active browser session ID
  • outputFormats (optional): Array of formats (default: ["markdown"])
  • htmlTransformer (optional): none or readability (default: none)
  • removeElements (optional): CSS selectors to remove
  • includeMetadata (optional): Include page metadata (default: true)

Example:

{
"command": "extract_content",
"arguments": {
"session_id": "abc-123",
"outputFormats": ["markdown", "text", "html"],
"htmlTransformer": "readability",
"includeMetadata": true
}
}

Browser Automation Tools

4. Navigate

Navigate to a URL with comprehensive configuration options.

Parameters:

  • url (required): Target URL to navigate to
  • headless (optional): Run in headless mode (default: true)
  • solve_cloudflare (optional): Automatically solve Cloudflare challenges (default: false)
  • network_idle (optional): Wait for network to be idle (default: false)
  • wait_selector (optional): CSS selector to wait for before returning
  • timeout (optional): Timeout in milliseconds (default: 30000)
  • proxy (optional): Proxy URL or configuration
  • disable_resources (optional): Disable loading of images, fonts, etc. for speed
  • session_id (optional): Session ID to reuse existing browser session

Example:

{
"command": "navigate",
"arguments": {
"url": "https://example.com",
"solve_cloudflare": true,
"headless": true,
"network_idle": true
}
}

5. Extract

Extract data from the current page using CSS selectors.

Parameters:

  • session_id (required): Session ID of the browser
  • selector (required): CSS selector to extract data from
  • attribute (optional): Attribute to extract (text, href, src, etc.) - default: text
  • multiple (optional): Extract multiple elements (default: false)
  • format (optional): Output format - json, text, or html (default: text)

Example:

{
"command": "extract",
"arguments": {
"session_id": "abc-123",
"selector": "h1.title",
"attribute": "text",
"format": "json"
}
}

6. Screenshot

Take a screenshot of the current page or specific element.

Parameters:

  • session_id (required): Session ID of the browser
  • selector (optional): CSS selector of element to screenshot (full page if omitted)
  • format (optional): Image format - png or jpeg (default: png)
  • full_page (optional): Capture full scrollable page (default: false)

Example:

{
"command": "screenshot",
"arguments": {
"session_id": "abc-123",
"full_page": true,
"format": "png"
}
}

7. Click

Click on an element identified by CSS selector.

Parameters:

  • session_id (required): Session ID of the browser
  • selector (required): CSS selector of element to click
  • wait_navigation (optional): Wait for navigation after click (default: false)
  • timeout (optional): Timeout in milliseconds

Example:

{
"command": "click",
"arguments": {
"session_id": "abc-123",
"selector": "button.submit",
"wait_navigation": true
}
}

8. Fill Form

Fill form fields with provided data.

Parameters:

  • session_id (required): Session ID of the browser
  • fields (required): Map of CSS selectors to values to fill

Example:

{
"command": "fill_form",
"arguments": {
"session_id": "abc-123",
"fields": {
"input[name='email']": "user@example.com",
"input[name='password']": "secret123"
}
}
}

9. Execute Script

Execute JavaScript code on the current page.

Parameters:

  • session_id (required): Session ID of the browser
  • script (required): JavaScript code to execute
  • args (optional): Arguments to pass to the script

Example:

{
"command": "execute_script",
"arguments": {
"session_id": "abc-123",
"script": "return document.title"
}
}

10. Wait

Wait for a specific condition on the page.

Parameters:

  • session_id (required): Session ID of the browser
  • selector (optional): CSS selector to wait for
  • state (optional): State to wait for - attached, detached, visible, hidden (default: attached)
  • timeout (optional): Timeout in milliseconds

Example:

{
"command": "wait",
"arguments": {
"session_id": "abc-123",
"selector": ".content-loaded",
"state": "visible",
"timeout": 5000
}
}

11. Get Page Info

Get information about the current page (URL, title, cookies, etc.).

Parameters:

  • session_id (required): Session ID of the browser

Example:

{
"command": "get_page_info",
"arguments": {
"session_id": "abc-123"
}
}

12. Close Session

Close a browser session and free resources.

Parameters:

  • session_id (required): Session ID of the browser to close

Example:

{
"command": "close_session",
"arguments": {
"session_id": "abc-123"
}
}

πŸ”§ Full Browser Configuration

The Actor supports all Browser configuration options when navigating:

ArgumentDescriptionOptional
urlTarget URL❌
headlessRun browser in headless (true) or headful (false) modeβœ”οΈ
disable_resourcesDrop unnecessary resources (font, image, media) for speedβœ”οΈ
cookiesSet cookies for the requestβœ”οΈ
useragentCustom user agent stringβœ”οΈ
network_idleWait until no network connections for 500msβœ”οΈ
load_domWait for JavaScript to fully load (default: true)βœ”οΈ
timeoutTimeout in milliseconds (default: 30000)βœ”οΈ
waitAdditional wait time after page loadβœ”οΈ
wait_selectorWait for specific CSS selectorβœ”οΈ
wait_selector_stateState to wait for selector (default: attached)βœ”οΈ
google_searchSet referer as Google search (default: true)βœ”οΈ
extra_headersDictionary of extra HTTP headersβœ”οΈ
proxyProxy string or configurationβœ”οΈ
solve_cloudflareSolve Cloudflare challenges automaticallyβœ”οΈ
block_webrtcForce WebRTC to respect proxy settingsβœ”οΈ
hide_canvasAdd noise to canvas for fingerprint preventionβœ”οΈ
allow_webglEnable WebGL (default: true)βœ”οΈ
real_chromeUse installed Chrome browserβœ”οΈ
localeSpecify user locale (e.g., en-GB, de-DE)βœ”οΈ
timezone_idChange browser timezoneβœ”οΈ

πŸ’‘ Usage Examples

Example 1: Simple Web Scraping

{
"command": "navigate",
"arguments": {
"url": "https://example.com",
"wait_selector": "article.post"
}
}

After navigation, extract titles:

{
"command": "extract",
"arguments": {
"session_id": "<returned-session-id>",
"selector": "h2.post-title",
"multiple": true,
"format": "json"
}
}

Example 2: Cloudflare-Protected Site

{
"command": "navigate",
"arguments": {
"url": "https://cloudflare-protected-site.com",
"solve_cloudflare": true,
"headless": true,
"network_idle": true
}
}

Example 3: Form Automation

Navigate, fill, and submit a form:

{
"command": "navigate",
"arguments": {
"url": "https://example.com/login"
}
}

Then fill and submit:

{
"command": "fill_form",
"arguments": {
"session_id": "<session-id>",
"fields": {
"#username": "myuser",
"#password": "mypass"
}
}
}
{
"command": "click",
"arguments": {
"session_id": "<session-id>",
"selector": "button[type='submit']",
"wait_navigation": true
}
}

Example 4: Using Apify Proxy

{
"command": "navigate",
"arguments": {
"url": "https://example.com"
},
"proxyConfig": {
"useApifyProxy": true,
"apifyProxyGroups": ["RESIDENTIAL"]
}
}

πŸƒ Running the Actor

On Apify Platform

  1. Create a new Actor
  2. Upload this code
  3. Configure input in JSON format
  4. Run the Actor

Locally

$apify run

Input Schema

{
"command": "navigate",
"arguments": {
"url": "https://example.com",
"solve_cloudflare": true
},
"proxyConfig": {
"useApifyProxy": true
}
}

MCP Server Mode

If no command is specified, the Actor runs in MCP server mode, accepting commands via stdio:

$python -m src.main

πŸ“Š Output

The Actor outputs structured data to the Apify dataset:

{
"command": "extract",
"arguments": {
"session_id": "abc-123",
"selector": "h1"
},
"result": [
{
"data": "Example Title",
"selector": "h1"
}
],
"status": "success"
}

πŸ› οΈ Development

Install Dependencies

$pip install -r pyproject.toml

Project Structure

browser-mcp/
β”œβ”€β”€ src/
β”‚ β”œβ”€β”€ __init__.py
β”‚ β”œβ”€β”€ __main__.py
β”‚ β”œβ”€β”€ main.py # Main entry point
β”‚ β”œβ”€β”€ mcp_server.py # MCP protocol implementation
β”‚ └── browser_session.py # Browser session management
β”œβ”€β”€ pyproject.toml
└── README.md

🀝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

πŸ“„ License

This project is licensed under the MIT License.

πŸ› Troubleshooting

Common Issues

  1. Cloudflare not solving: Ensure solve_cloudflare: true is set in arguments
  2. Session not found: Always use the returned session_id from navigate command
  3. Timeout errors: Increase the timeout value or use network_idle: true
  4. Element not found: Verify selectors and use wait_selector to ensure page is loaded
  5. Sessions lost after migration: This is expected behavior. Browser sessions cannot be fully restored after server migration. Clients should re-create sessions when they receive migration notifications.

State Persistence & Migrations

The Actor implements automatic state persistence to handle Apify server migrations:

How it works:

  • When a migration event is detected, the Actor saves the current state (active session IDs and metadata)
  • The Actor then reboots automatically to speed up the migration process
  • On restart, the Actor checks for previously saved state

Important notes:

  • Browser sessions cannot be fully restored - Active browser contexts and page states are lost during migration
  • Session IDs are preserved for tracking purposes, but the underlying browser instances must be recreated
  • Clients should handle reconnection - After a migration, clients need to create new sessions via the navigate command
  • For long-running operations - Consider checkpointing your workflow at logical points

Migration frequency:

  • Migrations can occur at any time due to server maintenance, load balancing, or crashes
  • The Actor is optimized to complete migrations within seconds

Best practices:

  • Design workflows to be resumable
  • Store extracted data frequently using Actor.push_data()
  • For critical operations, implement retry logic in your client code

πŸ“ž Support

For issues and questions:

  • Open an issue on GitHub
  • Contact via Apify platform
  • Check Browser documentation for browser-specific issues

Usage

  1. Configure your search criteria and preferences in the project settings.
  2. Run the scraper to collect the latest job opportunities.
  3. Review the results in the output dataset or file.