Browser Mcp
Pricing
from $0.01 / 1,000 results
Browser Mcp
The Browser MCP Actor combines Browser with the Model Context Protocol (MCP) to let AI agents control web browsers via a standardized interface. It enables navigation, data extraction, form filling, testing, and complex web automation.
Pricing
from $0.01 / 1,000 results
Rating
0.0
(0)
Developer

Nahom D
Actor stats
0
Bookmarked
1
Total users
0
Monthly active users
5 days ago
Last modified
Categories
Share
Browser MCP Actor - RAG Web Browser
Browser automation bridge for AI agents using the Model Context Protocol (MCP) with RAG optimization
The Browser MCP Actor integrates the robust browser automation capabilities of Scrapling (powered by Camoufox) with the Model Context Protocol (MCP), enabling AI agents and language models to perform web scraping, testing, and automation tasks through a standardized interface. This Actor acts as a bridge between AI systems and web browsers, allowing models to navigate websites, extract data, fill forms, and perform complex browser interactions.
NEW: Includes RAG-optimized content extraction, Google Search integration, and intelligent content processing for LLM consumption.
π― Key Features
- MCP Protocol Integration: Facilitates seamless communication between AI agents and web browsers
- RAG-Optimized Content Extraction: Clean markdown, text, and HTML output perfect for LLM consumption
- Google Search Integration: Query Google and scrape top results automatically
- Intelligent Content Processing: Remove navigation, ads, cookie banners; extract readable content
- Comprehensive Browser Automation: Supports multiple browsers with device emulation and stealth mode
- Cloudflare Challenge Solving: Automatically bypasses Cloudflare protection when enabled
- Intelligent Element Detection: Includes retry mechanisms and error handling for robust automation
- Multiple Output Formats: Text, Markdown, and HTML formats (["markdown", "text", "html"])
- Session Management: Maintains persistent browser sessions for complex multi-step workflows
- State Persistence: Automatically saves and restores state during server migrations
- Proxy Support: Built-in support for Apify proxy and custom proxy configurations
π₯ Target Audience
- AI Developers: Building autonomous agents that need web interaction capabilities
- QA Engineers: Implementing AI-assisted testing workflows
- Data Scientists: Requiring intelligent web scraping solutions
- Businesses: Looking to automate web-based processes through conversational AI interfaces
π Benefits
- Reduced Development Time: Eliminates the need for custom browser automation code
- Enhanced Reliability: Features AI-driven error recovery and adaptive element selection
- Improved Accessibility: Allows non-technical users to describe tasks in natural language
- Scalable Automation: Handles dynamic websites and complex user workflows with minimal manual intervention
π Available Tools
RAG-Optimized Tools
1. RAG Web Browser (β Recommended)
The all-in-one tool for RAG pipelines: Search Google or scrape a URL, automatically extract clean content optimized for LLM consumption.
Parameters:
query(required): Google Search keywords OR a direct URL to scrapemaxResults(optional): Maximum search results to scrape (default:3, max:10)outputFormats(optional): Array of formats -text,markdown,html(default:["markdown"])htmlTransformer(optional):noneorreadabilityfor main content extraction (default:none)removeElements(optional): Array of CSS selectors for elements to removeremoveCookieWarnings(optional): Remove cookie consent dialogs (default:true)solveCloudflare(optional): Solve Cloudflare challenges (default:true)proxy(optional): Proxy URLtimeout(optional): Request timeout in milliseconds (default:40000)
Example - Search and scrape:
{"command": "rag_web_browser","arguments": {"query": "python async programming best practices","maxResults": 3,"outputFormats": ["markdown", "text"],"htmlTransformer": "readability","solveCloudflare": true}}
Example - Direct URL:
{"command": "rag_web_browser","arguments": {"query": "https://docs.python.org/3/library/asyncio.html","outputFormats": ["markdown"]}}
2. Google Search
Search Google and return top organic results with URLs, titles, and descriptions.
Parameters:
query(required): Search querymaxResults(optional): Maximum results (default:5, max:20)proxy(optional): Proxy URL
Example:
{"command": "google_search","arguments": {"query": "machine learning tutorials site:medium.com","maxResults": 10}}
3. Extract Content
Extract and process content from a current browser session in RAG-optimized formats.
Parameters:
session_id(required): Active browser session IDoutputFormats(optional): Array of formats (default:["markdown"])htmlTransformer(optional):noneorreadability(default:none)removeElements(optional): CSS selectors to removeincludeMetadata(optional): Include page metadata (default:true)
Example:
{"command": "extract_content","arguments": {"session_id": "abc-123","outputFormats": ["markdown", "text", "html"],"htmlTransformer": "readability","includeMetadata": true}}
Browser Automation Tools
4. Navigate
Navigate to a URL with comprehensive configuration options.
Parameters:
url(required): Target URL to navigate toheadless(optional): Run in headless mode (default:true)solve_cloudflare(optional): Automatically solve Cloudflare challenges (default:false)network_idle(optional): Wait for network to be idle (default:false)wait_selector(optional): CSS selector to wait for before returningtimeout(optional): Timeout in milliseconds (default:30000)proxy(optional): Proxy URL or configurationdisable_resources(optional): Disable loading of images, fonts, etc. for speedsession_id(optional): Session ID to reuse existing browser session
Example:
{"command": "navigate","arguments": {"url": "https://example.com","solve_cloudflare": true,"headless": true,"network_idle": true}}
5. Extract
Extract data from the current page using CSS selectors.
Parameters:
session_id(required): Session ID of the browserselector(required): CSS selector to extract data fromattribute(optional): Attribute to extract (text, href, src, etc.) - default:textmultiple(optional): Extract multiple elements (default:false)format(optional): Output format -json,text, orhtml(default:text)
Example:
{"command": "extract","arguments": {"session_id": "abc-123","selector": "h1.title","attribute": "text","format": "json"}}
6. Screenshot
Take a screenshot of the current page or specific element.
Parameters:
session_id(required): Session ID of the browserselector(optional): CSS selector of element to screenshot (full page if omitted)format(optional): Image format -pngorjpeg(default:png)full_page(optional): Capture full scrollable page (default:false)
Example:
{"command": "screenshot","arguments": {"session_id": "abc-123","full_page": true,"format": "png"}}
7. Click
Click on an element identified by CSS selector.
Parameters:
session_id(required): Session ID of the browserselector(required): CSS selector of element to clickwait_navigation(optional): Wait for navigation after click (default:false)timeout(optional): Timeout in milliseconds
Example:
{"command": "click","arguments": {"session_id": "abc-123","selector": "button.submit","wait_navigation": true}}
8. Fill Form
Fill form fields with provided data.
Parameters:
session_id(required): Session ID of the browserfields(required): Map of CSS selectors to values to fill
Example:
{"command": "fill_form","arguments": {"session_id": "abc-123","fields": {"input[name='email']": "user@example.com","input[name='password']": "secret123"}}}
9. Execute Script
Execute JavaScript code on the current page.
Parameters:
session_id(required): Session ID of the browserscript(required): JavaScript code to executeargs(optional): Arguments to pass to the script
Example:
{"command": "execute_script","arguments": {"session_id": "abc-123","script": "return document.title"}}
10. Wait
Wait for a specific condition on the page.
Parameters:
session_id(required): Session ID of the browserselector(optional): CSS selector to wait forstate(optional): State to wait for -attached,detached,visible,hidden(default:attached)timeout(optional): Timeout in milliseconds
Example:
{"command": "wait","arguments": {"session_id": "abc-123","selector": ".content-loaded","state": "visible","timeout": 5000}}
11. Get Page Info
Get information about the current page (URL, title, cookies, etc.).
Parameters:
session_id(required): Session ID of the browser
Example:
{"command": "get_page_info","arguments": {"session_id": "abc-123"}}
12. Close Session
Close a browser session and free resources.
Parameters:
session_id(required): Session ID of the browser to close
Example:
{"command": "close_session","arguments": {"session_id": "abc-123"}}
π§ Full Browser Configuration
The Actor supports all Browser configuration options when navigating:
| Argument | Description | Optional |
|---|---|---|
url | Target URL | β |
headless | Run browser in headless (true) or headful (false) mode | βοΈ |
disable_resources | Drop unnecessary resources (font, image, media) for speed | βοΈ |
cookies | Set cookies for the request | βοΈ |
useragent | Custom user agent string | βοΈ |
network_idle | Wait until no network connections for 500ms | βοΈ |
load_dom | Wait for JavaScript to fully load (default: true) | βοΈ |
timeout | Timeout in milliseconds (default: 30000) | βοΈ |
wait | Additional wait time after page load | βοΈ |
wait_selector | Wait for specific CSS selector | βοΈ |
wait_selector_state | State to wait for selector (default: attached) | βοΈ |
google_search | Set referer as Google search (default: true) | βοΈ |
extra_headers | Dictionary of extra HTTP headers | βοΈ |
proxy | Proxy string or configuration | βοΈ |
solve_cloudflare | Solve Cloudflare challenges automatically | βοΈ |
block_webrtc | Force WebRTC to respect proxy settings | βοΈ |
hide_canvas | Add noise to canvas for fingerprint prevention | βοΈ |
allow_webgl | Enable WebGL (default: true) | βοΈ |
real_chrome | Use installed Chrome browser | βοΈ |
locale | Specify user locale (e.g., en-GB, de-DE) | βοΈ |
timezone_id | Change browser timezone | βοΈ |
π‘ Usage Examples
Example 1: Simple Web Scraping
{"command": "navigate","arguments": {"url": "https://example.com","wait_selector": "article.post"}}
After navigation, extract titles:
{"command": "extract","arguments": {"session_id": "<returned-session-id>","selector": "h2.post-title","multiple": true,"format": "json"}}
Example 2: Cloudflare-Protected Site
{"command": "navigate","arguments": {"url": "https://cloudflare-protected-site.com","solve_cloudflare": true,"headless": true,"network_idle": true}}
Example 3: Form Automation
Navigate, fill, and submit a form:
{"command": "navigate","arguments": {"url": "https://example.com/login"}}
Then fill and submit:
{"command": "fill_form","arguments": {"session_id": "<session-id>","fields": {"#username": "myuser","#password": "mypass"}}}
{"command": "click","arguments": {"session_id": "<session-id>","selector": "button[type='submit']","wait_navigation": true}}
Example 4: Using Apify Proxy
{"command": "navigate","arguments": {"url": "https://example.com"},"proxyConfig": {"useApifyProxy": true,"apifyProxyGroups": ["RESIDENTIAL"]}}
π Running the Actor
On Apify Platform
- Create a new Actor
- Upload this code
- Configure input in JSON format
- Run the Actor
Locally
$apify run
Input Schema
{"command": "navigate","arguments": {"url": "https://example.com","solve_cloudflare": true},"proxyConfig": {"useApifyProxy": true}}
MCP Server Mode
If no command is specified, the Actor runs in MCP server mode, accepting commands via stdio:
$python -m src.main
π Output
The Actor outputs structured data to the Apify dataset:
{"command": "extract","arguments": {"session_id": "abc-123","selector": "h1"},"result": [{"data": "Example Title","selector": "h1"}],"status": "success"}
π οΈ Development
Install Dependencies
$pip install -r pyproject.toml
Project Structure
browser-mcp/βββ src/β βββ __init__.pyβ βββ __main__.pyβ βββ main.py # Main entry pointβ βββ mcp_server.py # MCP protocol implementationβ βββ browser_session.py # Browser session managementβββ pyproject.tomlβββ README.md
π€ Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
π License
This project is licensed under the MIT License.
π Related Resources
π Troubleshooting
Common Issues
- Cloudflare not solving: Ensure
solve_cloudflare: trueis set in arguments - Session not found: Always use the returned
session_idfrom navigate command - Timeout errors: Increase the
timeoutvalue or usenetwork_idle: true - Element not found: Verify selectors and use
wait_selectorto ensure page is loaded - Sessions lost after migration: This is expected behavior. Browser sessions cannot be fully restored after server migration. Clients should re-create sessions when they receive migration notifications.
State Persistence & Migrations
The Actor implements automatic state persistence to handle Apify server migrations:
How it works:
- When a migration event is detected, the Actor saves the current state (active session IDs and metadata)
- The Actor then reboots automatically to speed up the migration process
- On restart, the Actor checks for previously saved state
Important notes:
- Browser sessions cannot be fully restored - Active browser contexts and page states are lost during migration
- Session IDs are preserved for tracking purposes, but the underlying browser instances must be recreated
- Clients should handle reconnection - After a migration, clients need to create new sessions via the
navigatecommand - For long-running operations - Consider checkpointing your workflow at logical points
Migration frequency:
- Migrations can occur at any time due to server maintenance, load balancing, or crashes
- The Actor is optimized to complete migrations within seconds
Best practices:
- Design workflows to be resumable
- Store extracted data frequently using
Actor.push_data() - For critical operations, implement retry logic in your client code
π Support
For issues and questions:
- Open an issue on GitHub
- Contact via Apify platform
- Check Browser documentation for browser-specific issues
Usage
- Configure your search criteria and preferences in the project settings.
- Run the scraper to collect the latest job opportunities.
- Review the results in the output dataset or file.