Playwright Mcp

Under maintenance

Pricing

Pay per usage

Try for free

Go to Apify Store

Playwright Mcp

Under maintenance

Try for free

Pricing

Pay per usage

Rating

0.0

(0)

Developer

Dilip S Chakravarthi

Maintained by Community

Actor stats

Bookmarked

Total users

Monthly active users

7 days ago

Last modified

Playwright MCP Server

jiri.spilka/playwright-mcp-server

A Model Context Protocol (MCP) server that provides browser automation capabilities using Playwright

Jiří Spilka

122

Excel Mcp Server

lovely_radiologist/excel-mcp-server

Vivek Gaur

Time MCP Server

agentify/time-mcp-server

An MCP server implementing the Model Context Protocol (MCP) for time-related operations.

agentify

Mcp Server Generator

fiery_dream/mcp-server-generator

Cody Churchwell

Web Search MCP Server

abotapi/ai-search-mcp-server

An Apify MCP Server that provides real-time web search capabilities for AI agents via the Model Context Protocol (MCP).

AbotAPI

Mcp Validator

rocketagro/mcp-validator

**MCP Validator** is a professional validation and compliance testing tool for [Model Context Protocol (MCP)](https://modelcontextprotocol.io) servers. It ensures your MCP server’s **tools, resources, prompts, and templates** are correct, compliant, and production-ready.

Jahid Hasan

Exa MCP Server

agentify/exa-mcp-server

HTTP MCP proxy to Exa's hosted MCP server (mcp.exa.ai). Connect via streamable HTTP

agentify

Fuel Prices ANZ MCP Server

abotapi/fuel-prices-anz-mcp-server

An Apify MCP Server that provides real-time fuel price search capabilities for AI agents in Australia and New Zealand via the MCP.

AbotAPI

Firecrawl MCP Server

agentify/firecrawl-mcp-server

A Model Context Protocol (MCP) server implementation that integrates with Firecrawl MCP for web scraping capabilities

agentify

304

Tester MCP Client

jiri.spilka/tester-mcp-client

A model context protocol (MCP) client that connects to any MCP server using Streamable HTTP and displays the conversation in a chat-like UI. It is a standalone Actor server designed for testing MCP servers over Stremable HTTP.

Jiří Spilka

1.1K

5.0

1# Video-to-Playwright Automation Actor 2 3Transform screen recordings into executable Playwright automation scripts using Google Gemini AI vision capabilities. 4 5## 🎯 What It Does 6 7This actor analyzes video recordings of user interactions with websites and automatically generates Python Playwright scripts that replicate those actions. Perfect for: 8 9- **QA Testing**: Record manual tests once, replay them automatically 10- **Web Scraping**: Show the actor how to navigate a site, get a working script 11- **RPA Automation**: Convert manual workflows into automated browser tasks 12- **Documentation**: Generate script documentation from video demos 13 14## 🚀 How It Works 15 161. **Upload Video**: Place your screen recording (mp4, avi, mov, webm) in the videos directory 172. **AI Analysis**: Google Gemini 2.0 Flash analyzes every frame to identify: 18 - Mouse clicks and movements 19 - Keyboard inputs and text entries 20 - Navigation patterns 21 - Element interactions (buttons, forms, links) 223. **Script Generation**: Creates a complete, production-ready Playwright Python script with: 23 - Proper selectors (CSS, text-based, IDs) 24 - Wait conditions and timeouts 25 - Error handling and screenshots on failure 26 - Detailed comments explaining each step 274. **Auto-Execution**: Optionally runs the generated script to verify it works 285. **Storage**: Saves scripts to Apify key-value store for later use 29 30## 📋 Input Parameters 31 32| Parameter | Type | Description | Default | 33|-----------|------|-------------|---------| 34| `videoFile` | string | Name of video file in videos directory | `test_1.mp4` | 35| `taskDescription` | string | Optional context about the task shown | - | 36| `includeScreenshots` | boolean | Add screenshot captures to script | `false` | 37| `executeAfter` | boolean | Auto-run script after generation | `true` | 38| `headless` | boolean | Run browser in headless mode | `false` | 39| `slowMo` | integer | Slow motion delay (ms) for debugging | `100` | 40 41## 🔧 Environment Variables 42 43Set these in the Apify Console: 44 45- **`GEMINI_API_KEY`** *(required)*: Your Google Gemini API key - [Get one here](https://aistudio.google.com/app/apikey) 46- `VIDEO_UPLOAD_DIR`: Custom directory for videos (default: `./videos`) 47- `MAX_VIDEO_SIZE_MB`: Max upload size in MB (default: `100`) 48- `GEMINI_MODEL`: Model to use (default: `gemini-2.0-flash-exp`) 49 50## 📦 Outputs 51 52### Key-Value Store 53- `generated_script.py`: The complete Playwright automation script 54- Additional scripts if you request modifications 55 56### Dataset (if `save_output=true`) 57- Execution results with stdout/stderr 58- Success/failure status 59- Return codes 60 61## 💡 Example Usage 62 63### Basic Usage 64```json 65{ 66 "videoFile": "login_workflow.mp4", 67 "taskDescription": "User logging into dashboard", 68 "executeAfter": true 69} 70``` 71 72### Advanced Usage 73```json 74{ 75 "videoFile": "checkout_process.mp4", 76 "taskDescription": "Complete e-commerce checkout flow", 77 "includeScreenshots": true, 78 "executeAfter": true, 79 "headless": false, 80 "slowMo": 500 81} 82``` 83 84## 🎥 Video Recording Tips 85 86For best results when recording videos: 87 881. **Clear Actions**: Perform actions deliberately with visible mouse movements 892. **Wait for Loads**: Pause after page loads and before clicking 903. **Stable Elements**: Interact with elements that have consistent selectors 914. **Resolution**: Record in 1080p or higher for better element detection 925. **Duration**: Keep videos under 2-3 minutes (Gemini context limits) 936. **Single Task**: One clear workflow per video 94 95## 🏗️ Architecture 96 97``` 98┌─────────────────┐ 99│ Upload Video │ 100└────────┬────────┘ 101 │ 102 ▼ 103┌─────────────────┐ 104│ Gemini Vision │ ◄── Frame-by-frame analysis 105│ AI Analysis │ Interaction detection 106└────────┬────────┘ Element identification 107 │ 108 ▼ 109┌─────────────────┐ 110│ Script Generator│ ◄── Playwright template 111│ (AI Prompted) │ Selector optimization 112└────────┬────────┘ Error handling 113 │ 114 ▼ 115┌─────────────────┐ 116│ Execute Script │ ◄── Browser automation 117│ (Optional) │ Verification 118└────────┬────────┘ 119 │ 120 ▼ 121┌─────────────────┐ 122│ Save to Store │ ◄── Key-value storage 123│ Return Results│ Dataset output 124└─────────────────┘ 125``` 126 127## 🛠️ Technical Details 128 129### Supported Websites 130- Works best with public websites (no login walls for search engines) 131- Handles dynamic content (SPAs, AJAX) 132- Supports consent dialogs and popups 133- Compatible with YouTube, Google, e-commerce sites, etc. 134 135### Script Features 136Generated scripts include: 137- Async/await pattern for modern Playwright 138- Proper browser context setup 139- Viewport and user-agent configuration 140- Timeout management (15s elements, 30s navigation) 141- Try/catch error handling 142- Screenshot on failure 143- Console logging for debugging 144 145### Limitations 146- Video must be clear and not too fast 147- Complex multi-step workflows may need script refinement 148- Personalized content (like YouTube homepage) requires search functionality 149- Very long videos (>5 min) may hit token limits 150 151## 🔄 MCP Server Mode 152 153The actor can also run as an MCP (Model Context Protocol) server for interactive use: 154 155Set `AUTO_ANALYZE_VIDEO=false` to enable MCP mode, then use these tools: 156 157- `analyze_video`: Generate script from video 158- `modify_script`: Refine script with natural language 159- `execute_script`: Run the automation 160- `get_script`: Retrieve current script 161- `save_script`: Store to key-value store 162 163## 📚 Example Output 164 165```python 166from playwright.async_api import async_playwright 167import asyncio 168 169async def main(): 170 async with async_playwright() as p: 171 browser = await p.chromium.launch(headless=False) 172 page = await browser.new_page() 173 174 # Step 1: Navigate to YouTube 175 print("Navigating to YouTube...") 176 await page.goto('https://www.youtube.com') 177 178 # Step 2: Search for video 179 print("Searching...") 180 search_input = page.locator('input[name="search_query"]') 181 await search_input.fill('Minecraft gameplay') 182 await page.keyboard.press('Enter') 183 await asyncio.sleep(2) 184 185 # Step 3: Click first video 186 print("Clicking video...") 187 video = page.locator('a#video-title').nth(0) 188 await video.click() 189 190 print("[SUCCESS] Automation completed!") 191 await browser.close() 192 193asyncio.run(main()) 194``` 195 196## 🐛 Troubleshooting 197 198**Script times out finding elements:** 199- Video content may not be available (e.g., personalized feeds) 200- Use search functionality instead of expecting content on homepage 201- Increase timeout values in generated script 202 203**Unicode/encoding errors:** 204- Already handled with UTF-8 subprocess environment 205- If issues persist, remove emojis from print statements 206 207**Video upload fails:** 208- Check file size (max 100MB by default) 209- Ensure video format is supported (mp4, avi, mov, webm) 210- Verify video path in `VIDEO_UPLOAD_DIR` 211 212**Generated script doesn't work:** 213- Use `modify_script` tool to refine selectors 214- Add more wait conditions 215- Handle dynamic content with explicit waits 216 217## 📄 License 218 219MIT 220 221## 🔗 Links 222 223- [Apify Platform](https://apify.com) 224- [Playwright Documentation](https://playwright.dev/python/) 225- [Google Gemini API](https://ai.google.dev/) 226- [MCP Protocol](https://modelcontextprotocol.io/) 227 228## 👥 Support 229 230For issues or questions: 2311. Check the execution logs in Apify Console 2322. Review generated script for errors 2333. Try modifying the script with natural language instructions 2344. Ensure your Gemini API key is valid and has quota

{ "actorSpecification": 1, "name": "playwright-mcp", "title": "Playwright MCP Server", "description": "MCP server that performs browser automation tasks using Playwright and integrates with AI-driven task interpretation.", "version": "1.0", "buildTag": "latest", "meta": { "templateId": "python-empty" }, "input": "./input_schema.json", "dockerfile": "./Dockerfile", "readme": "./README.md", "environmentVariables": { "GEMINI_API_KEY": "Google Gemini API Key for task interpretation" }, "defaultRunOptions": { "useCache": false, "memoryMbytes": 2048, "timeoutSecs": 1800 } }

{ "title": "Playwright MCP Server Input", "type": "object", "schemaVersion": 1, "properties": { "videoFile": { "title": "Video File", "type": "string", "description": "Name of the video file in the videos directory (e.g., test_1.mp4)", "editor": "textfield", "default": "test_1.mp4" }, "taskDescription": { "title": "Task Description", "type": "string", "description": "Optional description of the task shown in the video", "editor": "textarea" }, "includeScreenshots": { "title": "Include Screenshots", "type": "boolean", "description": "Include screenshot capture in generated script", "default": false }, "executeAfter": { "title": "Execute After Generation", "type": "boolean", "description": "Automatically execute the generated script after analysis", "default": true }, "headless": { "title": "Headless Mode", "type": "boolean", "description": "Run browser in headless mode", "default": false }, "slowMo": { "title": "Slow Motion Delay (ms)", "type": "integer", "description": "Delay in milliseconds for debugging", "default": 100, "minimum": 0, "maximum": 5000 } } }

1""" 2Enhanced MCP Server for Video-to-Playwright Automation 3Improved video analysis and script generation 4""" 5 6import asyncio 7import json 8import os 9from pathlib import Path 10from typing import Optional, Any, List 11import google.generativeai as genai 12from mcp.server import Server 13from mcp.types import Tool, TextContent 14import mcp.server.stdio 15from apify import Actor 16 17# Load environment variables 18GEMINI_API_KEY = os.getenv('GEMINI_API_KEY') 19if GEMINI_API_KEY: 20 genai.configure(api_key=GEMINI_API_KEY) 21 22class VideoPlaywrightMCP: 23 def __init__(self): 24 self.server = Server(os.getenv('MCP_SERVER_NAME', 'video-playwright-automation')) 25 self.model = genai.GenerativeModel(os.getenv('GEMINI_MODEL', 'gemini-2.5-pro')) 26 self.conversation_history = [] 27 self.generated_script = None 28 self.video_upload_dir = Path(os.getenv('VIDEO_UPLOAD_DIR', 'c:/Users/dilip/OneDrive/Desktop/AI/apify/playwright-mcp/videos')) 29 self.video_upload_dir.mkdir(parents=True, exist_ok=True) 30 self.max_video_size_mb = int(os.getenv('MAX_VIDEO_SIZE_MB', '100')) 31 32 self.setup_tools() 33 34 def setup_tools(self): 35 """Register MCP tools""" 36 37 @self.server.list_tools() 38 async def list_tools() -> list[Tool]: 39 return [ 40 Tool( 41 name="analyze_video", 42 description="Analyze a video and generate a Playwright automation script using Gemini AI", 43 inputSchema={ 44 "type": "object", 45 "properties": { 46 "video_path": { 47 "type": "string", 48 "description": "Path to the video file (supports mp4, avi, mov, webm)" 49 }, 50 "task_description": { 51 "type": "string", 52 "description": "Optional description of the task shown in the video" 53 }, 54 "include_screenshots": { 55 "type": "boolean", 56 "description": "Include screenshot capture in generated script", 57 "default": False 58 }, 59 "slow_mo": { 60 "type": "integer", 61 "description": "Slow motion delay in milliseconds for debugging", 62 "default": 0 63 } 64 }, 65 "required": ["video_path"] 66 } 67 ), 68 Tool( 69 name="modify_script", 70 description="Modify the generated Playwright script based on user feedback", 71 inputSchema={ 72 "type": "object", 73 "properties": { 74 "modification_request": { 75 "type": "string", 76 "description": "Natural language description of changes to make" 77 } 78 }, 79 "required": ["modification_request"] 80 } 81 ), 82 Tool( 83 name="execute_script", 84 description="Execute the generated Playwright script", 85 inputSchema={ 86 "type": "object", 87 "properties": { 88 "headless": { 89 "type": "boolean", 90 "description": "Run browser in headless mode", 91 "default": True 92 }, 93 "save_output": { 94 "type": "boolean", 95 "description": "Save execution results to Apify dataset", 96 "default": False 97 } 98 } 99 } 100 ), 101 Tool( 102 name="get_script", 103 description="Retrieve the current generated Playwright script", 104 inputSchema={ 105 "type": "object", 106 "properties": { 107 "format": { 108 "type": "string", 109 "enum": ["python", "json"], 110 "description": "Output format", 111 "default": "python" 112 } 113 } 114 } 115 ), 116 Tool( 117 name="save_script", 118 description="Save the generated script to Apify key-value store", 119 inputSchema={ 120 "type": "object", 121 "properties": { 122 "filename": { 123 "type": "string", 124 "description": "Filename to save the script as", 125 "default": "playwright_script.py" 126 } 127 } 128 } 129 ) 130 ] 131 132 @self.server.call_tool() 133 async def call_tool(name: str, arguments: Any) -> list[TextContent]: 134 try: 135 if name == "analyze_video": 136 return await self.analyze_video( 137 arguments.get("video_path"), 138 arguments.get("task_description"), 139 arguments.get("include_screenshots", False), 140 arguments.get("slow_mo", 0) 141 ) 142 elif name == "modify_script": 143 return await self.modify_script(arguments.get("modification_request")) 144 elif name == "execute_script": 145 return await self.execute_script( 146 arguments.get("headless", True), 147 arguments.get("save_output", False) 148 ) 149 elif name == "get_script": 150 return await self.get_script(arguments.get("format", "python")) 151 elif name == "save_script": 152 return await self.save_script(arguments.get("filename", "playwright_script.py")) 153 else: 154 raise ValueError(f"Unknown tool: {name}") 155 except Exception as e: 156 Actor.log.error(f"Error in {name}: {str(e)}") 157 return [TextContent(type="text", text=f"❌ Error: {str(e)}")] 158 159 async def analyze_video( 160 self, 161 video_path: str, 162 task_description: Optional[str] = None, 163 include_screenshots: bool = False, 164 slow_mo: int = 0 165 ) -> list[TextContent]: 166 """Analyze video and generate Playwright script with enhanced accuracy""" 167 try: 168 input_path = Path(video_path) 169 if not input_path.is_absolute(): 170 input_path = self.video_upload_dir / input_path 171 172 if not input_path.exists(): 173 raise FileNotFoundError(f"Video not found at: {input_path}") 174 175 size_mb = input_path.stat().st_size / (1024 * 1024) 176 if size_mb > self.max_video_size_mb: 177 raise ValueError(f"Video size {size_mb:.1f}MB exceeds limit {self.max_video_size_mb}MB") 178 179 Actor.log.info(f"Analyzing video: {input_path}") 180 181 # Upload video to Gemini 182 video_file = genai.upload_file(path=str(input_path)) 183 Actor.log.info("Video uploaded, waiting for processing...") 184 185 # Wait for video processing 186 while video_file.state.name == "PROCESSING": 187 await asyncio.sleep(2) 188 video_file = genai.get_file(video_file.name) 189 190 if video_file.state.name == "FAILED": 191 raise ValueError("Video processing failed") 192 193 Actor.log.info("Video processed successfully") 194 195 # Enhanced prompt with detailed instructions 196 prompt = f""" 197Analyze this video FRAME BY FRAME and identify EVERY single user interaction in chronological order. 198 199{"Task Context: " + task_description if task_description else ""} 200 201CRITICAL ANALYSIS STEPS: 2021. **Watch the entire video carefully** - Note every mouse movement, click, keyboard input, scroll, and navigation 2032. **Identify the starting URL** - What webpage does the video begin on? 2043. **Track each interaction** - For EACH action, note: 205 - What element is being interacted with? (button, input field, link, dropdown, etc.) 206 - What is the visible text or label of that element? 207 - What type of action? (click, type, press Enter, select, scroll, etc.) 208 - What happens after the action? (page loads, dropdown opens, search results appear, etc.) 2094. **Note timing** - Identify when to wait for elements, page loads, or animations 2105. **Identify text inputs** - What exact text is typed into each field? 211 212IMPORTANT RULES: 213- If the user searches for something, USE THE SEARCH FUNCTIONALITY instead of expecting content on homepage 214- If a specific video/content is clicked, search for it first to ensure it's available 215- Use simple, reliable selectors that work across sessions 216- Do NOT assume personalized content (like YouTube homepage videos) will be the same 217- Add proper error handling for dynamic content 218 219Generate a script that will ACTUALLY WORK in any session, not just replay the exact video scenario. 220 221PLAYWRIGHT SCRIPT REQUIREMENTS: 222 223```python 224from playwright.async_api import async_playwright, TimeoutError as PlaywrightTimeoutError 225import asyncio 226 227async def main(): 228 async with async_playwright() as p: 229 # Launch browser 230 browser = await p.chromium.launch( 231 headless=False, 232 slow_mo={slow_mo} 233 ) 234 235 context = await browser.new_context( 236 viewport={{'width': 1920, 'height': 1080}}, 237 user_agent='Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36' 238 ) 239 240 page = await context.new_page() 241 242 try: 243 # STEP 1: Navigate to starting URL 244 print("Step 1: Navigating to [URL]...") 245 await page.goto('[URL]', wait_until='domcontentloaded', timeout=30000) 246 await asyncio.sleep(2) # Wait for page to settle 247 248 # STEP 2: [First interaction] 249 print("Step 2: [Description]...") 250 # Use page.locator() with CSS selector or text 251 element = page.locator('[SELECTOR]') 252 await element.wait_for(state='visible', timeout=15000) 253 await element.click() 254 await asyncio.sleep(1) 255 256 # IMPORTANT: If video shows clicking on specific content (like a YouTube video): 257 # - First search for it using the search box 258 # - Then click on the search result 259 # Example for YouTube (click search icon first to activate search): 260 # await page.click('ytd-masthead #search-icon-legacy') 261 # await asyncio.sleep(1) 262 # await page.fill('input[name="search_query"]', 'video title') 263 # await page.keyboard.press('Enter') 264 # await asyncio.sleep(3) 265 # video = page.locator('a#video-title').nth(0) 266 # await video.wait_for(state='visible', timeout=15000) 267 # await video.click() 268 269 # Add more steps as needed for each action in the video 270 # CORRECT PATTERNS: 271 # element = page.locator('selector').nth(0) 272 # await element.wait_for(state='visible', timeout=15000) 273 # await element.click() 274 # OR 275 # await page.fill('input#id', 'text') 276 # await page.click('button#id') 277 278 {"# Take screenshots" if include_screenshots else ""} 279 {'''await page.screenshot(path='screenshot.png') 280 print("Screenshot saved")''' if include_screenshots else ""} 281 282 print("[SUCCESS] Automation completed!") 283 284 except PlaywrightTimeoutError as e: 285 print(f"[TIMEOUT] Element not found: {{e}}") 286 await page.screenshot(path='error.png') 287 288 except Exception as e: 289 print(f"[ERROR] {{type(e).__name__}}: {{e}}") 290 await page.screenshot(path='error.png') 291 292 finally: 293 await asyncio.sleep(3) # Keep browser open briefly 294 await context.close() 295 await browser.close() 296 print("[CLEANUP] Browser closed") 297 298if __name__ == "__main__": 299 asyncio.run(main()) 300``` 301 302CRITICAL SELECTOR STRATEGY (MUST FOLLOW EXACTLY): 3031. Use `page.locator('css-selector')` and store in a variable 3042. Use `page.locator('text=Exact Text')` for buttons/links with visible text 3053. Use `page.fill('#input-id', 'text')` for input fields 3064. NEVER use `.first()` - instead use `.nth(0)` or make selector more specific 3075. Pattern: `element = page.locator('selector'); await element.wait_for(state='visible'); await element.click()` 3086. Add `await asyncio.sleep(1-2)` after major actions to let page settle 309 310CORRECT Examples: 311```python 312# Click first matching element 313search_btn = page.locator('button.search').nth(0) 314await search_btn.wait_for(state='visible', timeout=15000) 315await search_btn.click() 316 317# Or use more specific selector 318video_link = page.locator('a#video-title').filter(has_text='Minecraft') 319await video_link.wait_for(state='visible', timeout=15000) 320await video_link.click() 321 322# Fill input 323await page.fill('input#search', 'search term') 324await asyncio.sleep(1) 325``` 326 327For YouTube: 328- Search box: Try clicking search icon first, then fill `input[name="search_query"]` or `input#search` 329- Pattern: await page.click('button#search-icon-legacy'); await asyncio.sleep(1); await page.fill('input[name="search_query"]', 'text') 330- Search button: `button#search-icon-legacy` 331- Video links: `a#video-title` 332- Handle consent: Check for button with aria-label containing "Accept" or "Reject" 333 334For searches: 335- If user clicks on specific content, search for it first rather than expecting it on homepage 336- Example: await page.fill('input#search', 'search term'); await page.keyboard.press('Enter'); await asyncio.sleep(2) 337 338WRONG - DO NOT USE: 339- `.first()` followed by parentheses 340- `.get_by_role()` without proper chaining 341 342Generate the COMPLETE, EXECUTABLE script with ALL steps from the video. 343Include detailed comments for each step explaining what you observed in the video. 344The script must work end-to-end without modifications. 345""" 346 347 # Generate script using Gemini 348 Actor.log.info("Generating enhanced Playwright script...") 349 response = self.model.generate_content([video_file, prompt]) 350 351 script_content = response.text 352 353 # Clean up markdown code blocks 354 if "```python" in script_content: 355 script_content = script_content.split("```python")[1].split("```")[0].strip() 356 elif "```" in script_content: 357 script_content = script_content.split("```")[1].split("```")[0].strip() 358 359 self.generated_script = script_content 360 self.conversation_history.append({ 361 "role": "user", 362 "content": f"Video: {video_path}, Task: {task_description or 'Not specified'}" 363 }) 364 self.conversation_history.append({ 365 "role": "assistant", 366 "content": script_content 367 }) 368 369 # Auto-save the generated script 370 try: 371 await Actor.set_value('generated_script.py', script_content) 372 Actor.log.info("Script saved to key-value store as 'generated_script.py'") 373 except Exception as save_error: 374 Actor.log.warning(f"Could not save script: {save_error}") 375 376 Actor.log.info("Enhanced script generated successfully") 377 378 return [ 379 TextContent( 380 type="text", 381 text=f"✅ Video analyzed with enhanced detection!\n\n**Generated Playwright Script:**\n\n```python\n{script_content}\n```\n\n**Next Steps:**\n- Review the script to ensure all steps match your video\n- Use `modify_script` if any steps are missing or incorrect\n- Use `execute_script` to test the automation\n- Use `save_script` to store in Apify KV store" 382 ) 383 ] 384 385 except Exception as e: 386 Actor.log.error(f"Error analyzing video: {str(e)}") 387 return [TextContent(type="text", text=f"❌ Error analyzing video: {str(e)}")] 388 389 async def modify_script(self, modification_request: str) -> list[TextContent]: 390 """Modify the generated script based on user feedback""" 391 if not self.generated_script: 392 return [TextContent( 393 type="text", 394 text="❌ No script has been generated yet. Use `analyze_video` first." 395 )] 396 397 try: 398 Actor.log.info(f"Modifying script: {modification_request}") 399 400 prompt = f""" 401Here is the current Playwright script: 402```python 403{self.generated_script} 404``` 405 406User modification request: {modification_request} 407 408Please modify the script according to the request. Ensure: 4091. All actions are properly sequenced 4102. Appropriate waits are added (wait_for_selector, wait_for_load_state, wait_for_timeout) 4113. Selectors are robust and specific 4124. Error handling is comprehensive 4135. The script remains complete and executable 4146. Comments explain what each step does 415 416If the user mentions missing steps or actions that didn't work: 417- Add explicit waits before interactions 418- Try alternative selectors 419- Add visibility/enabled checks 420- Consider if consent dialogs or popups need to be handled first 421 422Return ONLY the complete updated Python code with detailed comments. 423""" 424 425 response = self.model.generate_content(prompt) 426 modified_script = response.text 427 428 # Clean up markdown 429 if "```python" in modified_script: 430 modified_script = modified_script.split("```python")[1].split("```")[0].strip() 431 elif "```" in modified_script: 432 modified_script = modified_script.split("```")[1].split("```")[0].strip() 433 434 self.generated_script = modified_script 435 self.conversation_history.append({ 436 "role": "user", 437 "content": f"Modify: {modification_request}" 438 }) 439 self.conversation_history.append({ 440 "role": "assistant", 441 "content": modified_script 442 }) 443 444 Actor.log.info("Script modified successfully") 445 446 return [ 447 TextContent( 448 type="text", 449 text=f"✅ Script modified successfully!\n\n```python\n{modified_script}\n```" 450 ) 451 ] 452 453 except Exception as e: 454 Actor.log.error(f"Error modifying script: {str(e)}") 455 return [TextContent(type="text", text=f"❌ Error modifying script: {str(e)}")] 456 457 async def execute_script(self, headless: bool = True, save_output: bool = False) -> list[TextContent]: 458 """Execute the generated Playwright script""" 459 if not self.generated_script: 460 return [TextContent( 461 type="text", 462 text="❌ No script has been generated yet. Use `analyze_video` first." 463 )] 464 465 try: 466 Actor.log.info("Executing Playwright script...") 467 468 # Save script to temporary file (force UTF-8 to avoid Windows codec issues) 469 script_path = self.video_upload_dir / "temp_playwright_script.py" 470 script_path.write_text(self.generated_script, encoding="utf-8") 471 472 # Execute the script 473 # Ensure UTF-8 mode for Python subprocess to handle Unicode output 474 env = os.environ.copy() 475 env["PYTHONUTF8"] = "1" 476 env["PYTHONIOENCODING"] = "utf-8" 477 process = await asyncio.create_subprocess_exec( 478 "python", str(script_path), 479 stdout=asyncio.subprocess.PIPE, 480 stderr=asyncio.subprocess.PIPE, 481 env=env 482 ) 483 484 stdout, stderr = await process.communicate() 485 486 # Decode with error handling for Unicode issues 487 stdout_text = stdout.decode('utf-8', errors='replace') 488 stderr_text = stderr.decode('utf-8', errors='replace') 489 490 # Log the output 491 Actor.log.info(f"Script output:\n{stdout_text}") 492 if stderr_text: 493 Actor.log.warning(f"Script errors:\n{stderr_text}") 494 495 result = { 496 "success": process.returncode == 0, 497 "stdout": stdout_text, 498 "stderr": stderr_text, 499 "return_code": process.returncode 500 } 501 502 if save_output: 503 await Actor.push_data(result) 504 Actor.log.info("Execution results saved to dataset") 505 506 if result["success"]: 507 Actor.log.info("Script executed successfully") 508 return [TextContent( 509 type="text", 510 text=f"✅ Script executed successfully!\n\n**Output:**\n```\n{result['stdout']}\n```" 511 )] 512 else: 513 Actor.log.error(f"Script execution failed: {result['stderr']}") 514 return [TextContent( 515 type="text", 516 text=f"❌ Script execution failed:\n```\n{result['stderr']}\n```\n\n**Tip:** Use `modify_script` to fix the issues. Common problems:\n- Incorrect selectors\n- Missing waits\n- Elements not visible/enabled\n- Page not loaded" 517 )] 518 519 except Exception as e: 520 Actor.log.error(f"Error executing script: {str(e)}") 521 return [TextContent(type="text", text=f"❌ Error executing script: {str(e)}")] 522 523 async def get_script(self, format: str = "python") -> list[TextContent]: 524 """Retrieve the current script""" 525 if not self.generated_script: 526 return [TextContent(type="text", text="❌ No script has been generated yet.")] 527 528 if format == "json": 529 script_data = { 530 "script": self.generated_script, 531 "conversation_history": self.conversation_history, 532 "format": "python" 533 } 534 return [TextContent( 535 type="text", 536 text=f"```json\n{json.dumps(script_data, indent=2)}\n```" 537 )] 538 else: 539 return [TextContent( 540 type="text", 541 text=f"**Current Playwright Script:**\n\n```python\n{self.generated_script}\n```" 542 )] 543 544 async def save_script(self, filename: str) -> list[TextContent]: 545 """Save script to Apify key-value store""" 546 if not self.generated_script: 547 return [TextContent(type="text", text="❌ No script has been generated yet.")] 548 549 try: 550 await Actor.set_value(filename, self.generated_script) 551 Actor.log.info(f"Script saved to key-value store as {filename}") 552 return [TextContent( 553 type="text", 554 text=f"✅ Script saved to Apify key-value store as `{filename}`" 555 )] 556 except Exception as e: 557 Actor.log.error(f"Error saving script: {str(e)}") 558 return [TextContent(type="text", text=f"❌ Error saving script: {str(e)}")] 559 560 async def run(self): 561 """Run the MCP server""" 562 async with mcp.server.stdio.stdio_server() as (read_stream, write_stream): 563 await self.server.run( 564 read_stream, 565 write_stream, 566 self.server.create_initialization_options() 567 ) 568 569async def main(): 570 """Main entry point for Apify Actor""" 571 async with Actor: 572 Actor.log.info("Starting Enhanced Video-to-Playwright MCP Server...") 573 574 if not GEMINI_API_KEY: 575 Actor.log.error("GEMINI_API_KEY not found in environment variables!") 576 raise ValueError("GEMINI_API_KEY is required") 577 578 mcp = VideoPlaywrightMCP() 579 580 # Optional auto-analyze mode 581 if os.getenv('AUTO_ANALYZE_VIDEO', 'false').lower() == 'true': 582 video_file = os.getenv('VIDEO_FILE', 'test_1.mp4') 583 include_screenshots = os.getenv('INCLUDE_SCREENSHOTS', 'false').lower() == 'true' 584 execute_after = os.getenv('EXECUTE_AFTER', 'true').lower() == 'true' 585 slow_mo = int(os.getenv('SLOW_MO', '0')) 586 587 Actor.log.info(f"Auto-analyze enabled. Video: {video_file}") 588 try: 589 await mcp.analyze_video( 590 video_file, 591 include_screenshots=include_screenshots, 592 slow_mo=slow_mo 593 ) 594 if execute_after: 595 await mcp.execute_script( 596 headless=os.getenv('PLAYWRIGHT_HEADLESS', 'true').lower() == 'true' 597 ) 598 except Exception as e: 599 Actor.log.error(f"Auto-analyze failed: {e}") 600 else: 601 await mcp.run() 602 603if __name__ == "__main__": 604 asyncio.run(main())

1from playwright.async_api import async_playwright, TimeoutError as PlaywrightTimeoutError 2import asyncio 3 4async def main(): 5 async with async_playwright() as p: 6 # Launch browser in non-headless mode with a slight delay for observation 7 browser = await p.chromium.launch( 8 headless=False, 9 slow_mo=100 10 ) 11 12 # Create a new browser context with a specific viewport and user agent 13 context = await browser.new_context( 14 viewport={'width': 1920, 'height': 1080}, 15 user_agent='Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/116.0.0.0 Safari/537.36' 16 ) 17 18 # Disable the "Chrome is being controlled by automated test software" infobar 19 context.set_default_navigation_timeout(60000) 20 21 page = await context.new_page() 22 23 try: 24 # STEP 1: Navigate to YouTube.com 25 # The user first searches for "yt" and then clicks the link. 26 # A more direct and reliable approach is to navigate directly to youtube.com. 27 print("Step 1: Navigating to https://www.youtube.com...") 28 await page.goto('https://www.youtube.com', wait_until='domcontentloaded', timeout=30000) 29 await asyncio.sleep(2) # Wait for the page to settle 30 31 # STEP 2: Handle YouTube Consent Pop-up (if it appears) 32 # YouTube often presents a consent dialog on the first visit. 33 print("Step 2: Checking for and handling consent dialog...") 34 try: 35 # This selector targets the "Accept all" button based on its ARIA label 36 accept_button = page.locator('button[aria-label*="Accept all"]') 37 await accept_button.wait_for(state='visible', timeout=5000) 38 await accept_button.click() 39 print(" - Consent dialog accepted.") 40 await asyncio.sleep(2) # Wait for dialog to disappear 41 except PlaywrightTimeoutError: 42 print(" - Consent dialog not found, continuing.") 43 44 # STEP 3: Search for the specific video 45 # The video on the homepage is personalized. To ensure the script works every time, 46 # we will search for the video title instead of trying to find it on the homepage. 47 print("Step 3: Searching for the video 'I Survived 1000 Days in Minecraft Hardcore'...") 48 49 # Click the search input field to focus it 50 search_input = page.locator('input#search') 51 await search_input.wait_for(state='visible', timeout=15000) 52 await search_input.click() 53 await asyncio.sleep(1) 54 55 # Fill the search field with the video title 56 await page.fill('input#search', 'I Survived 1000 Days in Minecraft Hardcore') 57 await asyncio.sleep(1) 58 59 # Press Enter to initiate the search 60 await page.keyboard.press('Enter') 61 print(" - Search initiated.") 62 63 # Wait for the search results page to load 64 await page.wait_for_url('**/results?search_query=*', timeout=30000) 65 await asyncio.sleep(2) 66 67 # STEP 4: Click on the correct video from the search results 68 # We locate the video link by its title to ensure we click the right one. 69 print("Step 4: Clicking on the video from search results...") 70 71 # This locator finds a video renderer containing the specific title and channel name, 72 # then targets the clickable title link within it. This is highly reliable. 73 video_title = "I Survived 1000 Days in Minecraft Hardcore" 74 channel_name = "ItsNotLudo" 75 76 video_link = page.locator(f'ytd-video-renderer:has-text("{video_title}"):has-text("{channel_name}") a#video-title') 77 await video_link.wait_for(state='visible', timeout=20000) 78 await video_link.click() 79 print(f" - Clicked on video titled '{video_title}'.") 80 81 # Wait for the video page to load 82 await page.wait_for_url('**/watch?v=*', timeout=30000) 83 print(" - Video page loaded successfully.") 84 85 await asyncio.sleep(5) # Allow some time to see the video playing 86 87 print("[SUCCESS] Automation completed!") 88 89 except PlaywrightTimeoutError as e: 90 print(f"[TIMEOUT] An element was not found in time: {e}") 91 await page.screenshot(path='error_screenshot.png') 92 93 except Exception as e: 94 print(f"[ERROR] An unexpected error occurred: {type(e).__name__}: {e}") 95 await page.screenshot(path='error_screenshot.png') 96 97 finally: 98 await asyncio.sleep(3) # Keep browser open briefly for final review 99 await context.close() 100 await browser.close() 101 print("[CLEANUP] Browser and context closed.") 102 103if __name__ == "__main__": 104 asyncio.run(main())

1from playwright.async_api import async_playwright, TimeoutError as PlaywrightTimeoutError 2import asyncio 3 4async def main(): 5 async with async_playwright() as p: 6 # Launch browser in non-headless mode to observe 7 browser = await p.chromium.launch( 8 headless=False, 9 slow_mo=100 10 ) 11 12 context = await browser.new_context( 13 viewport={'width': 1920, 'height': 1080}, 14 user_agent='Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36' 15 ) 16 17 page = await context.new_page() 18 19 try: 20 # STEP 1: Navigate directly to YouTube 21 print("Step 1: Navigating to YouTube...") 22 await page.goto('https://www.youtube.com', wait_until='domcontentloaded', timeout=30000) 23 await asyncio.sleep(3) # Wait for page to fully load 24 25 # STEP 2: Handle consent if it appears 26 print("Step 2: Checking for consent dialog...") 27 try: 28 # Try multiple consent button selectors 29 consent_selectors = [ 30 'button[aria-label*="Accept"]', 31 'button[aria-label*="Reject"]', 32 'button:has-text("Accept all")', 33 'tp-yt-paper-button:has-text("Accept")' 34 ] 35 36 for selector in consent_selectors: 37 try: 38 consent_btn = page.locator(selector).first 39 if await consent_btn.is_visible(timeout=3000): 40 print(f" - Found consent button, clicking...") 41 await consent_btn.click() 42 await asyncio.sleep(2) 43 break 44 except: 45 continue 46 else: 47 print(" - No consent dialog found, continuing...") 48 except Exception as e: 49 print(f" - Consent handling skipped: {e}") 50 51 # STEP 3: Click the search icon to activate search box 52 print("Step 3: Activating YouTube search...") 53 try: 54 # Try to click search icon in header 55 search_icon = page.locator('button#search-icon-legacy, ytd-masthead button#search-icon-legacy').first 56 await search_icon.wait_for(state='visible', timeout=10000) 57 await search_icon.click() 58 await asyncio.sleep(1) 59 except: 60 print(" - Search already active or using different layout...") 61 62 # STEP 4: Type in search box 63 print("Step 4: Searching for video...") 64 video_title = "I Survived 1000 Days in Minecraft Hardcore" 65 66 # Try multiple search input selectors 67 search_input = page.locator('input[name="search_query"], input#search, ytd-searchbox input').first 68 await search_input.wait_for(state='visible', timeout=15000) 69 await search_input.fill(video_title) 70 await asyncio.sleep(1) 71 72 # STEP 5: Press Enter or click search button 73 print("Step 5: Submitting search...") 74 await page.keyboard.press('Enter') 75 await page.wait_for_load_state('domcontentloaded') 76 await asyncio.sleep(3) # Wait for search results 77 78 # STEP 6: Click first video result 79 print("Step 6: Clicking on video from search results...") 80 # Use nth(0) instead of .first to avoid the callable issue 81 video_link = page.locator('a#video-title').nth(0) 82 await video_link.wait_for(state='visible', timeout=15000) 83 await video_link.click() 84 85 # STEP 7: Wait for video page to load 86 print("Step 7: Waiting for video page...") 87 await page.wait_for_url('**/watch?v=**', timeout=30000) 88 await asyncio.sleep(2) 89 90 print("\n[SUCCESS] Automation completed! Video is now playing.") 91 92 except PlaywrightTimeoutError as e: 93 print(f"\n[TIMEOUT] Element not found in time: {e}") 94 await page.screenshot(path='error.png') 95 print("Screenshot saved to error.png") 96 97 except Exception as e: 98 print(f"\n[ERROR] {type(e).__name__}: {e}") 99 await page.screenshot(path='error.png') 100 101 finally: 102 # Keep browser open to see result 103 await asyncio.sleep(5) 104 await context.close() 105 await browser.close() 106 print("[CLEANUP] Browser closed") 107 108if __name__ == "__main__": 109 asyncio.run(main())

.git .mise.toml .nvim.lua storage # The rest is copied from https://github.com/github/gitignore/blob/main/Python.gitignore # Byte-compiled / optimized / DLL files __pycache__/ *.py[cod] *$py.class # C extensions *.so # Distribution / packaging .Python build/ develop-eggs/ dist/ downloads/ eggs/ .eggs/ lib/ lib64/ parts/ sdist/ var/ wheels/ share/python-wheels/ *.egg-info/ .installed.cfg *.egg MANIFEST # PyInstaller # Usually these files are written by a python script from a template # before PyInstaller builds the exe, so as to inject date/other infos into it. *.manifest *.spec # Installer logs pip-log.txt pip-delete-this-directory.txt # Unit test / coverage reports htmlcov/ .tox/ .nox/ .coverage .coverage.* .cache nosetests.xml coverage.xml *.cover *.py,cover .hypothesis/ .pytest_cache/ cover/ # Translations *.mo *.pot # Django stuff: *.log local_settings.py db.sqlite3 db.sqlite3-journal # Flask stuff: instance/ .webassets-cache # Scrapy stuff: .scrapy # Sphinx documentation docs/_build/ # PyBuilder .pybuilder/ target/ # Jupyter Notebook .ipynb_checkpoints # IPython profile_default/ ipython_config.py # pyenv # For a library or package, you might want to ignore these files since the code is # intended to run in multiple environments; otherwise, check them in: .python-version # pdm # Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control. #pdm.lock # pdm stores project-wide configurations in .pdm.toml, but it is recommended to not include it # in version control. # https://pdm.fming.dev/latest/usage/project/#working-with-version-control .pdm.toml .pdm-python .pdm-build/ # PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm __pypackages__/ # Celery stuff celerybeat-schedule celerybeat.pid # SageMath parsed files *.sage.py # Environments .env .venv env/ venv/ ENV/ env.bak/ venv.bak/ # Spyder project settings .spyderproject .spyproject # Rope project settings .ropeproject # mkdocs documentation /site # mypy .mypy_cache/ .dmypy.json dmypy.json # Pyre type checker .pyre/ # pytype static type analyzer .pytype/ # Cython debug symbols cython_debug/ # PyCharm # JetBrains specific template is maintained in a separate JetBrains.gitignore that can # be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore # and can be added to the global gitignore or merged into this file. For a more nuclear # option (not recommended) you can uncomment the following to ignore the entire idea folder. .idea/ # Visual Studio Code # Ignores the folder created by VS Code when changing workspace settings, doing debugger # configuration, etc. Can be commented out to share Workspace Settings within a team .vscode # Zed editor # Ignores the folder created when setting Project Settings in the Zed editor. Can be commented out # to share Project Settings within a team .zed

# Python __pycache__/ *.py[cod] *$py.class *.so .Python build/ develop-eggs/ dist/ downloads/ eggs/ .eggs/ lib/ lib64/ parts/ sdist/ var/ wheels/ *.egg-info/ .installed.cfg *.egg # Virtual environments venv/ ENV/ env/ # Environment variables .env .env.local # Apify apify_storage/ storage/ # IDEs .vscode/ .idea/ *.swp *.swo # Videos and temp files /tmp/ *.mp4 *.avi *.mov *.webm # Logs *.log

1# Apify Actors Development Guide 2 3Important: Before you begin, fill in the `generatedBy` property in the meta section of `.actor/actor.json`. Replace it with the model you're currently using. This helps Apify monitor and improve AGENTS.md for specific LLM models. 4 5## What are Apify Actors? 6 7- Actors are serverless programs that run in the cloud. They're inspired by the UNIX philosophy - programs that do one thing well and can be easily combined to build complex systems. 8- Actors are programs packaged as Docker images that run in isolated containers 9 10## Core Concepts 11 12- Accept well-defined JSON input 13- Perform isolated tasks (web scraping, automation, data processing) 14- Produce structured JSON output to datasets and/or store data in key-value stores 15- Can run from seconds to hours or even indefinitely 16- Persist state and can be restarted 17 18## Do 19 20- accept well-defined JSON input and produce structured JSON output 21- use Apify SDK (`apify`) for code running ON Apify platform 22- validate input early with proper error handling and fail gracefully 23- use CheerioCrawler for static HTML content (10x faster than browsers) 24- use PlaywrightCrawler only for JavaScript-heavy sites and dynamic content 25- use router pattern (createCheerioRouter/createPlaywrightRouter) for complex crawls 26- implement retry strategies with exponential backoff for failed requests 27- use proper concurrency settings (HTTP: 10-50, Browser: 1-5) 28- set sensible defaults in `.actor/input_schema.json` for all optional fields 29- set up output schema in `.actor/output_schema.json` 30- clean and validate data before pushing to dataset 31- use semantic CSS selectors and fallback strategies for missing elements 32- respect robots.txt, ToS, and implement rate limiting with delays 33- check which tools (cheerio/playwright/crawlee) are installed before applying guidance 34 35## Don't 36 37- do not rely on `Dataset.getInfo()` for final counts on Cloud platform 38- do not use browser crawlers when HTTP/Cheerio works (massive performance gains with HTTP) 39- do not hard code values that should be in input schema or environment variables 40- do not skip input validation or error handling 41- do not overload servers - use appropriate concurrency and delays 42- do not scrape prohibited content or ignore Terms of Service 43- do not store personal/sensitive data unless explicitly permitted 44- do not use deprecated options like `requestHandlerTimeoutMillis` on CheerioCrawler (v3.x) 45- do not use `additionalHttpHeaders` - use `preNavigationHooks` instead 46 47## Commands 48 49```bash 50# Local development 51apify run # Run Actor locally 52 53# Authentication & deployment 54apify login # Authenticate account 55apify push # Deploy to Apify platform 56 57# Help 58apify help # List all commands 59``` 60 61## Safety and Permissions 62 63Allowed without prompt: 64 65- read files with `Actor.get_value()` 66- push data with `Actor.push_data()` 67- set values with `Actor.set_value()` 68- enqueue requests to RequestQueue 69- run locally with `apify run` 70 71Ask first: 72 73- npm/pip package installations 74- apify push (deployment to cloud) 75- proxy configuration changes (requires paid plan) 76- Dockerfile changes affecting builds 77- deleting datasets or key-value stores 78 79## Project Structure 80 81.actor/ 82├── actor.json # Actor config: name, version, env vars, runtime settings 83├── input_schema.json # Input validation & Console form definition 84└── output_schema.json # Specifies where an Actor stores its output 85src/ 86└── main.js # Actor entry point and orchestrator 87storage/ # Local storage (mirrors Cloud during development) 88├── datasets/ # Output items (JSON objects) 89├── key_value_stores/ # Files, config, INPUT 90└── request_queues/ # Pending crawl requests 91Dockerfile # Container image definition 92AGENTS.md # AI agent instructions (this file) 93 94## Actor Input Schema 95 96The input schema defines the input parameters for an Actor. It's a JSON object comprising various field types supported by the Apify platform. 97 98### Structure 99 100```json 101{ 102 "title": "<INPUT-SCHEMA-TITLE>", 103 "type": "object", 104 "schemaVersion": 1, 105 "properties": { 106 /* define input fields here */ 107 }, 108 "required": [] 109} 110``` 111 112### Example 113 114```json 115{ 116 "title": "E-commerce Product Scraper Input", 117 "type": "object", 118 "schemaVersion": 1, 119 "properties": { 120 "startUrls": { 121 "title": "Start URLs", 122 "type": "array", 123 "description": "URLs to start scraping from (category pages or product pages)", 124 "editor": "requestListSources", 125 "default": [{ "url": "https://example.com/category" }], 126 "prefill": [{ "url": "https://example.com/category" }] 127 }, 128 "followVariants": { 129 "title": "Follow Product Variants", 130 "type": "boolean", 131 "description": "Whether to scrape product variants (different colors, sizes)", 132 "default": true 133 }, 134 "maxRequestsPerCrawl": { 135 "title": "Max Requests per Crawl", 136 "type": "integer", 137 "description": "Maximum number of pages to scrape (0 = unlimited)", 138 "default": 1000, 139 "minimum": 0 140 }, 141 "proxyConfiguration": { 142 "title": "Proxy Configuration", 143 "type": "object", 144 "description": "Proxy settings for anti-bot protection", 145 "editor": "proxy", 146 "default": { "useApifyProxy": false } 147 }, 148 "locale": { 149 "title": "Locale", 150 "type": "string", 151 "description": "Language/country code for localized content", 152 "default": "cs", 153 "enum": ["cs", "en", "de", "sk"], 154 "enumTitles": ["Czech", "English", "German", "Slovak"] 155 } 156 }, 157 "required": ["startUrls"] 158} 159``` 160 161## Actor Output Schema 162 163The Actor output schema builds upon the schemas for the dataset and key-value store. It specifies where an Actor stores its output and defines templates for accessing that output. Apify Console uses these output definitions to display run results. 164 165### Structure 166 167```json 168{ 169 "actorOutputSchemaVersion": 1, 170 "title": "<OUTPUT-SCHEMA-TITLE>", 171 "properties": { 172 /* define your outputs here */ 173 } 174} 175``` 176 177### Example 178 179```json 180{ 181 "actorOutputSchemaVersion": 1, 182 "title": "Output schema of the files scraper", 183 "properties": { 184 "files": { 185 "type": "string", 186 "title": "Files", 187 "template": "{{links.apiDefaultKeyValueStoreUrl}}/keys" 188 }, 189 "dataset": { 190 "type": "string", 191 "title": "Dataset", 192 "template": "{{links.apiDefaultDatasetUrl}}/items" 193 } 194 } 195} 196``` 197 198### Output Schema Template Variables 199 200- `links` (object) - Contains quick links to most commonly used URLs 201- `links.publicRunUrl` (string) - Public run url in format `https://console.apify.com/view/runs/:runId` 202- `links.consoleRunUrl` (string) - Console run url in format `https://console.apify.com/actors/runs/:runId` 203- `links.apiRunUrl` (string) - API run url in format `https://api.apify.com/v2/actor-runs/:runId` 204- `links.apiDefaultDatasetUrl` (string) - API url of default dataset in format `https://api.apify.com/v2/datasets/:defaultDatasetId` 205- `links.apiDefaultKeyValueStoreUrl` (string) - API url of default key-value store in format `https://api.apify.com/v2/key-value-stores/:defaultKeyValueStoreId` 206- `links.containerRunUrl` (string) - URL of a webserver running inside the run in format `https://<containerId>.runs.apify.net/` 207- `run` (object) - Contains information about the run same as it is returned from the `GET Run` API endpoint 208- `run.defaultDatasetId` (string) - ID of the default dataset 209- `run.defaultKeyValueStoreId` (string) - ID of the default key-value store 210 211## Dataset Schema Specification 212 213The dataset schema defines how your Actor's output data is structured, transformed, and displayed in the Output tab in the Apify Console. 214 215### Example 216 217Consider an example Actor that calls `Actor.pushData()` to store data into dataset: 218 219```python 220# Dataset push example (Python) 221import asyncio 222from datetime import datetime 223from apify import Actor 224 225async def main(): 226 await Actor.init() 227 228 # Actor code 229 await Actor.push_data({ 230 'numericField': 10, 231 'pictureUrl': 'https://www.google.com/images/branding/googlelogo/2x/googlelogo_color_92x30dp.png', 232 'linkUrl': 'https://google.com', 233 'textField': 'Google', 234 'booleanField': True, 235 'dateField': datetime.now().isoformat(), 236 'arrayField': ['#hello', '#world'], 237 'objectField': {}, 238 }) 239 240 # Exit successfully 241 await Actor.exit() 242 243if __name__ == '__main__': 244 asyncio.run(main()) 245``` 246 247To set up the Actor's output tab UI, reference a dataset schema file in `.actor/actor.json`: 248 249```json 250{ 251 "actorSpecification": 1, 252 "name": "book-library-scraper", 253 "title": "Book Library Scraper", 254 "version": "1.0.0", 255 "storages": { 256 "dataset": "./dataset_schema.json" 257 } 258} 259``` 260 261Then create the dataset schema in `.actor/dataset_schema.json`: 262 263```json 264{ 265 "actorSpecification": 1, 266 "fields": {}, 267 "views": { 268 "overview": { 269 "title": "Overview", 270 "transformation": { 271 "fields": [ 272 "pictureUrl", 273 "linkUrl", 274 "textField", 275 "booleanField", 276 "arrayField", 277 "objectField", 278 "dateField", 279 "numericField" 280 ] 281 }, 282 "display": { 283 "component": "table", 284 "properties": { 285 "pictureUrl": { 286 "label": "Image", 287 "format": "image" 288 }, 289 "linkUrl": { 290 "label": "Link", 291 "format": "link" 292 }, 293 "textField": { 294 "label": "Text", 295 "format": "text" 296 }, 297 "booleanField": { 298 "label": "Boolean", 299 "format": "boolean" 300 }, 301 "arrayField": { 302 "label": "Array", 303 "format": "array" 304 }, 305 "objectField": { 306 "label": "Object", 307 "format": "object" 308 }, 309 "dateField": { 310 "label": "Date", 311 "format": "date" 312 }, 313 "numericField": { 314 "label": "Number", 315 "format": "number" 316 } 317 } 318 } 319 } 320 } 321} 322``` 323 324### Structure 325 326```json 327{ 328 "actorSpecification": 1, 329 "fields": {}, 330 "views": { 331 "<VIEW_NAME>": { 332 "title": "string (required)", 333 "description": "string (optional)", 334 "transformation": { 335 "fields": ["string (required)"], 336 "unwind": ["string (optional)"], 337 "flatten": ["string (optional)"], 338 "omit": ["string (optional)"], 339 "limit": "integer (optional)", 340 "desc": "boolean (optional)" 341 }, 342 "display": { 343 "component": "table (required)", 344 "properties": { 345 "<FIELD_NAME>": { 346 "label": "string (optional)", 347 "format": "text|number|date|link|boolean|image|array|object (optional)" 348 } 349 } 350 } 351 } 352 } 353} 354``` 355 356**Dataset Schema Properties:** 357 358- `actorSpecification` (integer, required) - Specifies the version of dataset schema structure document (currently only version 1) 359- `fields` (JSONSchema object, required) - Schema of one dataset object (use JsonSchema Draft 2020-12 or compatible) 360- `views` (DatasetView object, required) - Object with API and UI views description 361 362**DatasetView Properties:** 363 364- `title` (string, required) - Visible in UI Output tab and API 365- `description` (string, optional) - Only available in API response 366- `transformation` (ViewTransformation object, required) - Data transformation applied when loading from Dataset API 367- `display` (ViewDisplay object, required) - Output tab UI visualization definition 368 369**ViewTransformation Properties:** 370 371- `fields` (string[], required) - Fields to present in output (order matches column order) 372- `unwind` (string[], optional) - Deconstructs nested children into parent object 373- `flatten` (string[], optional) - Transforms nested object into flat structure 374- `omit` (string[], optional) - Removes specified fields from output 375- `limit` (integer, optional) - Maximum number of results (default: all) 376- `desc` (boolean, optional) - Sort order (true = newest first) 377 378**ViewDisplay Properties:** 379 380- `component` (string, required) - Only `table` is available 381- `properties` (Object, optional) - Keys matching `transformation.fields` with ViewDisplayProperty values 382 383**ViewDisplayProperty Properties:** 384 385- `label` (string, optional) - Table column header 386- `format` (string, optional) - One of: `text`, `number`, `date`, `link`, `boolean`, `image`, `array`, `object` 387 388## Key-Value Store Schema Specification 389 390The key-value store schema organizes keys into logical groups called collections for easier data management. 391 392### Example 393 394Consider an example Actor that calls `Actor.setValue()` to save records into the key-value store: 395 396```python 397# Key-Value Store set example (Python) 398import asyncio 399from apify import Actor 400 401async def main(): 402 await Actor.init() 403 404 # Actor code 405 await Actor.set_value('document-1', 'my text data', content_type='text/plain') 406 407 image_id = '123' # example placeholder 408 image_buffer = b'...' # bytes buffer with image data 409 await Actor.set_value(f'image-{image_id}', image_buffer, content_type='image/jpeg') 410 411 # Exit successfully 412 await Actor.exit() 413 414if __name__ == '__main__': 415 asyncio.run(main()) 416``` 417 418To configure the key-value store schema, reference a schema file in `.actor/actor.json`: 419 420```json 421{ 422 "actorSpecification": 1, 423 "name": "data-collector", 424 "title": "Data Collector", 425 "version": "1.0.0", 426 "storages": { 427 "keyValueStore": "./key_value_store_schema.json" 428 } 429} 430``` 431 432Then create the key-value store schema in `.actor/key_value_store_schema.json`: 433 434```json 435{ 436 "actorKeyValueStoreSchemaVersion": 1, 437 "title": "Key-Value Store Schema", 438 "collections": { 439 "documents": { 440 "title": "Documents", 441 "description": "Text documents stored by the Actor", 442 "keyPrefix": "document-" 443 }, 444 "images": { 445 "title": "Images", 446 "description": "Images stored by the Actor", 447 "keyPrefix": "image-", 448 "contentTypes": ["image/jpeg"] 449 } 450 } 451} 452``` 453 454### Structure 455 456```json 457{ 458 "actorKeyValueStoreSchemaVersion": 1, 459 "title": "string (required)", 460 "description": "string (optional)", 461 "collections": { 462 "<COLLECTION_NAME>": { 463 "title": "string (required)", 464 "description": "string (optional)", 465 "key": "string (conditional - use key OR keyPrefix)", 466 "keyPrefix": "string (conditional - use key OR keyPrefix)", 467 "contentTypes": ["string (optional)"], 468 "jsonSchema": "object (optional)" 469 } 470 } 471} 472``` 473 474**Key-Value Store Schema Properties:** 475 476- `actorKeyValueStoreSchemaVersion` (integer, required) - Version of key-value store schema structure document (currently only version 1) 477- `title` (string, required) - Title of the schema 478- `description` (string, optional) - Description of the schema 479- `collections` (Object, required) - Object where each key is a collection ID and value is a Collection object 480 481**Collection Properties:** 482 483- `title` (string, required) - Collection title shown in UI tabs 484- `description` (string, optional) - Description appearing in UI tooltips 485- `key` (string, conditional\*) - Single specific key for this collection 486- `keyPrefix` (string, conditional\*) - Prefix for keys included in this collection 487- `contentTypes` (string[], optional) - Allowed content types for validation 488- `jsonSchema` (object, optional) - JSON Schema Draft 07 format for `application/json` content type validation 489 490\*Either `key` or `keyPrefix` must be specified for each collection, but not both. 491 492## Apify MCP Tools 493 494If MCP server is configured, use these tools for documentation: 495 496- `search-apify-docs` - Search documentation 497- `fetch-apify-docs` - Get full doc pages 498 499Otherwise, reference: `@https://mcp.apify.com/` 500 501## Resources 502 503- [docs.apify.com/llms.txt](https://docs.apify.com/llms.txt) - Quick reference 504- [docs.apify.com/llms-full.txt](https://docs.apify.com/llms-full.txt) - Complete docs 505- [crawlee.dev](https://crawlee.dev) - Crawlee documentation 506- [whitepaper.actor](https://raw.githubusercontent.com/apify/actor-whitepaper/refs/heads/master/README.md) - Complete Actor specification

# Use Apify's Python base image (3.13) FROM apify/actor-python:3.13 # ----------------------------- # Install Playwright + Chromium (as root) # ----------------------------- USER root RUN pip install --no-cache-dir playwright && \ playwright install --with-deps chromium # Switch to non-root user for app USER myuser # ----------------------------- # Install Python dependencies # ----------------------------- # Copy only requirements first to leverage Docker caching COPY --chown=myuser:myuser requirements.txt ./requirements.txt RUN echo "Python version:" \ && python --version \ && echo "Pip version:" \ && pip --version \ && echo "Installing dependencies from requirements.txt:" \ && pip install --no-cache-dir -r requirements.txt \ && echo "All installed Python packages:" \ && pip freeze # ----------------------------- # Copy remaining source code # ----------------------------- COPY --chown=myuser:myuser . ./ # Optional: compile Python files to verify they are valid RUN python3 -m compileall -q src/ # ----------------------------- # ENTRYPOINT # ----------------------------- # Option A: run module (preferred) CMD ["python3", "-m", "src.main"] # Option B: if your entry is plain script: # CMD ["python3", "src/main.py"]

Playwright Mcp

Playwright Mcp

Playwright MCP Server

Excel Mcp Server

Time MCP Server

Mcp Server Generator

Web Search MCP Server

Mcp Validator

Exa MCP Server

Fuel Prices ANZ MCP Server

Firecrawl MCP Server

Tester MCP Client

.actor/README.md

.actor/actor.json

.actor/input_schema.json

src/init.py

src/main.py

src/main.py

src/py.typed

videos/temp_playwright_script.py

videos/working_script.py

.dockerignore

.gitignore

AGENTS.md

Dockerfile

error.png

error_screenshot.png

requirements.txt

unexpected_error_screenshot.png

Playwright Mcp

Playwright Mcp

You might also like

Playwright MCP Server

Excel Mcp Server

Time MCP Server

Mcp Server Generator

Web Search MCP Server

Mcp Validator

Exa MCP Server

Fuel Prices ANZ MCP Server

Firecrawl MCP Server

Tester MCP Client

.actor/README.md

.actor/actor.json

.actor/input_schema.json

src/__init__.py

src/__main__.py

src/main.py

src/py.typed

videos/temp_playwright_script.py

videos/working_script.py

.dockerignore

.gitignore

AGENTS.md

Dockerfile

error.png

error_screenshot.png

requirements.txt

unexpected_error_screenshot.png

src/init.py

src/main.py