MCP Nexus Universal AI Tool Bridge
Pricing
from $0.50 / 1,000 results
MCP Nexus Universal AI Tool Bridge
Connect AI agents to real data. MCP Nexus runs tools that fetch, extract, summarize, classify and crawl web content with caching, multi LLM support, HMAC webhooks, circuit breakers and full observability in a stateless production ready Apify actor.
Pricing
from $0.50 / 1,000 results
Rating
5.0
(1)
Developer

Țugui Dragoș
Actor stats
1
Bookmarked
3
Total users
2
Monthly active users
11 hours ago
Last modified
Categories
Share
AI-powered web data bridge with smart caching, multi-LLM support, and production-grade reliability. Extract, transform, and analyze web content at scale on Apify platform.
Quick Start
Run on Apify Platform
- Configure your input parameters
- Click "Start" to run
- View results in the Dataset tab
30-Second Tutorial
Fetch and extract data from any webpage in three simple steps:
Step 1: Select Tool
Choose fetch_web from the tool dropdown
Step 2: Configure
{"mode": "single","tool": "fetch_web","params": {"url": "https://example.com"}}
Step 3: Run
Click Start and view extracted content in the dataset
One-Line API Call
curl "https://api.apify.com/v2/acts/SRHAma9FEsmuewetK/runs?token=YOUR_TOKEN" \-X POST -H "Content-Type: application/json" \-d '{"mode":"single","tool":"fetch_web","params":{"url":"https://example.com"}}'
Legal & Compliance Note
This actor respects robots.txt by default. Always review target site Terms of Service. Use proxies and rendering responsibly. You are responsible for compliance (GDPR/PII/ToS) in your jurisdiction.
What MCP Nexus Can Do
MCP Nexus provides 9 specialized tools for web data operations:
- fetch_web - Fetch and extract content from web pages
- extract - Extract specific data using CSS, XPath, or regex selectors
- summarize - Generate AI summaries of text content
- classify - Classify text into predefined categories using AI
- transform - Transform JSON data with mapping operations
- crawl_lite - Crawl multiple pages with depth and link following
- extract_structured - Extract structured data using AI and JSON schemas
- search_web - Parse sitemaps and RSS feeds for URL discovery
- diff_text - Compare two texts and calculate semantic differences
Table of Contents
- Chapter 1: Core Concepts
- Chapter 2: Getting Started
- Chapter 3: Tools Reference
- Chapter 4: Execution Modes
- Chapter 5: AI/LLM Integration
- Chapter 6: Performance & Optimization
- Chapter 7: Security & Compliance
- Chapter 8: Production Deployment
- Chapter 9: Development Guide
- Chapter 10: API & Integration
- Appendix A: Input Schema Reference
- Appendix B: Output Schema Reference
- Appendix C: Error Codes
- Appendix D: Troubleshooting
- Appendix E: FAQ
- Appendix F: Changelog
Chapter 1: Core Concepts
What is MCP Nexus
MCP Nexus is a universal AI tool bridge that connects AI agents, workflows, and applications to real-world web data. It provides a production-ready actor on the Apify platform that orchestrates nine specialized tools for web scraping, data extraction, AI-powered analysis, and content transformation.
Key Characteristics:
- Stateless: Each run is independent with no persistent state
- Observable: Full metrics and logging for debugging and monitoring
- Resilient: Built-in circuit breakers and retry logic
- Scalable: Runs on Apify's cloud infrastructure
- Compliant: Respects robots.txt and implements security best practices
Architecture Overview
┌─────────────────────────────────────────────────────────┐│ MCP Nexus Actor │├─────────────────────────────────────────────────────────┤│ Input Validation (Zod) ││ ├─ Single Mode / Batch Mode / DAG Mode ││ └─ Budget Tracking & Quota Management │├─────────────────────────────────────────────────────────┤│ Tool Router ││ ├─ fetch_web ├─ crawl_lite ││ ├─ extract ├─ extract_structured ││ ├─ summarize ├─ search_web ││ ├─ classify ├─ diff_text ││ └─ transform │├─────────────────────────────────────────────────────────┤│ Infrastructure Layer ││ ├─ HTTP Client (caching, ETags, Last-Modified) ││ ├─ Circuit Breakers (per-domain failure detection) ││ ├─ Deduplication (URL/content/hybrid fingerprinting) ││ ├─ LLM Client (OpenAI, Anthropic, Azure) ││ ├─ Browser (Playwright minimal/full rendering) ││ └─ Proxy Manager (Apify Proxy, custom rotation) │├─────────────────────────────────────────────────────────┤│ Output & Storage ││ ├─ Dataset (structured run reports) ││ ├─ Key-Value Store (HTML, screenshots, text) ││ └─ Webhook Delivery (HMAC-signed notifications) │└─────────────────────────────────────────────────────────┘
How It Works
- Input Processing: Validates JSON input against schema, applies defaults
- Tool Selection: Routes to appropriate tool handler based on mode
- Execution: Runs tool with context (config, tracking, storage)
- Metric Collection: Records bytes, tokens, retries, cache hits
- Result Assembly: Builds structured report with metadata
- Output: Pushes to dataset, sends webhook if configured
Key Features
Performance:
- HTTP caching with ETag/Last-Modified support
- Request deduplication (URL, content, hybrid)
- Per-domain circuit breakers
- Browser rendering (none/minimal/full)
- Proxy rotation
AI/LLM:
- Multi-provider support (OpenAI, Anthropic, Azure)
- Cost tracking per request
- Token usage monitoring
- Structured JSON extraction
Observability:
- Per-tool execution metrics
- Cache hit/miss ratios
- Circuit breaker trip counts
- Correlation IDs for request tracking
- Detailed error messages
Security:
- HMAC webhook signatures
- Robots.txt enforcement
- Allow/deny list URL filtering
- Log redaction for PII
- Secret management via Apify
Chapter 2: Getting Started
Installation
Option 1: Use on Apify Console (Recommended)
- Open Actor
- Click "Try for free"
- Configure input via UI
- Click "Start"
Option 2: Deploy to Your Apify Account
- Visit the Actor page
- Click "Schedule" or "API" to integrate
- Use Apify API or SDK to run programmatically
Authentication
Apify API Token:
Get your token from Apify Console → Settings → Integrations
LLM API Keys:
Store as Apify secrets:
- Go to Apify Console → Settings → Secrets
- Add secret:
OPENAI_API_KEY=sk-... - Reference in input:
"apiKeySecret": "OPENAI_API_KEY"
Or set as environment variables:
export OPENAI_API_KEY=sk-...export ANTHROPIC_API_KEY=sk-ant-...
Your First Run
Example 1: Fetch a Web Page
{"mode": "single","tool": "fetch_web","params": {"url": "https://example.com","stripBoilerplate": true}}
Example 2: Summarize Text
{"mode": "single","tool": "summarize","params": {"text": "Long article text here...","language": "en","style": "concise"},"llm": {"provider": "openai","model": "gpt-4o-mini","apiKeySecret": "OPENAI_API_KEY"}}
Example 3: Extract Data
{"mode": "single","tool": "extract","params": {"source": "url","input": "https://news.ycombinator.com","selectors": [{ "name": "titles", "css": ".titleline > a" }]}}
Understanding Results
All runs produce a structured RunReport:
{"correlationId": "abc-123","schemaVersion": 1,"ok": true,"mode": "single","toolsExecuted": 1,"usage": {"durationMs": 1234,"httpBytes": 45678,"llmTokens": 150,"retries": 0,"cacheHits": 0,"cacheMisses": 1,"circuitBreakerTrips": 0},"costEstimateUSD": 0.0002,"warnings": [],"errors": [],"timestamp": "2025-01-07T12:34:56.789Z","result": {"status": 200,"url": "https://example.com","contentText": "Extracted content here...","htmlSnippet": "<html>...","links": []}}
Key Fields:
ok: Overall success indicatorusage: Resource consumption metricscostEstimateUSD: Estimated LLM costsresult: Tool output (single mode)results: Array of outputs (batch mode)
Recommended Default Configuration
For optimal performance and cost savings, use these defaults:
{"cache": {"enabled": true,"ttlSec": 3600},"dedupe": {"enabled": true,"strategy": "url","ttlSec": 86400},"budgets": {"maxDurationSec": 60,"maxTotalBytes": 5242880,"maxTotalTokens": 20000},"security": {"redactLogs": true}}
Why these defaults:
- Caching (1 hour) provides immediate ROI by avoiding duplicate fetches
- URL deduplication (24 hours) prevents processing same pages multiple times
- Budget limits prevent runaway costs
- Log redaction protects sensitive data
Conversion-Optimized Examples
Example 1: Batch Mix (fetch + extract + summarize)
{"mode": "batch","concurrency": 2,"dag": true,"calls": [{"callId": "fetch","tool": "fetch_web","params": {"url": "https://example.com/article"}},{"callId": "extract","tool": "extract","params": {"source": "text","input": {"ref": "fetch.result.contentText"},"selectors": [{"name": "title", "regex": "^#\\s+(.+)$"}]},"dependsOn": ["fetch"]},{"callId": "summarize","tool": "summarize","params": {"text": {"ref": "fetch.result.contentText"}},"dependsOn": ["fetch"]}],"llm": {"provider": "openai","model": "gpt-4o-mini"}}
Example 2: Structured Extract with Schema
{"mode": "single","tool": "extract_structured","params": {"source": "url","input": "https://example.com/pricing","jsonSchema": {"type": "object","properties": {"plans": {"type": "array","items": {"type": "object","properties": {"name": {"type": "string"},"price": {"type": "number"}}}}}}},"llm": {"provider": "openai","model": "gpt-4o-mini"}}
Example 3: Crawl with Storage
{"mode": "single","tool": "crawl_lite","params": {"startUrl": "https://example.com","maxPages": 10,"maxDepth": 2},"store": {"html": true,"text": true}}
Chapter 3: Tools Reference
fetch_web
Purpose: Download and parse web pages with smart content extraction
When to Use:
- Fetching article content
- Downloading HTML for later processing
- Extracting clean text from pages
Parameters:
{url: stringstripBoilerplate?: booleanheaders?: Record<string, string>timeoutMs?: numbermaxBytes?: numberrespectRobotsTxt?: boolean}
Complete Example:
{"mode": "single","tool": "fetch_web","params": {"url": "https://blog.example.com/article","stripBoilerplate": true},"cache": {"enabled": true,"ttlSec": 3600}}
Output:
{"status": 200,"url": "https://blog.example.com/article","contentText": "Clean article text...","htmlSnippet": "<html>...","links": [{ "href": "/about", "text": "About Us" }],"meta": {"finalUrl": "https://blog.example.com/article","contentType": "text/html","bytes": 25678,"language": "en","rendered": false}}
Advanced Usage:
Enable browser rendering for JavaScript-heavy sites:
{"mode": "single","tool": "fetch_web","params": {"url": "https://spa-example.com"},"render": "minimal"}
Store artifacts:
{"mode": "single","tool": "fetch_web","params": {"url": "https://example.com"},"store": {"html": true,"text": true,"screenshot": true}}
extract
Purpose: Parse and extract data from HTML/text using selectors and patterns
When to Use:
- Scraping structured data from web pages
- Extracting specific fields
- Pattern matching with regex
Parameters:
{source: 'url' | 'html' | 'text'input: stringselectors?: Array<{name: stringcss?: stringxpath?: stringregex?: string}>patterns?: Array<{name: stringregex: stringgroup?: number}>}
Complete Example:
{"mode": "single","tool": "extract","params": {"source": "url","input": "https://news.ycombinator.com","selectors": [{"name": "titles","css": ".titleline > a"},{"name": "scores","css": ".score"}],"patterns": [{"name": "points","regex": "(\\d+) points?","group": 1}]}}
Output:
{"fields": {"titles": ["Show HN: My New Project","Ask HN: How do you...","Tell HN: Something..."],"scores": ["123 points", "45 points", "67 points"]},"matches": {"points": ["123", "45", "67"]}}
Advanced Usage:
Extract from HTML string:
{"mode": "single","tool": "extract","params": {"source": "html","input": "<article><h1>Title</h1><p>Body</p></article>","selectors": [{ "name": "headline", "css": "h1" },{ "name": "body", "css": "p" }]}}
Use XPath for complex queries:
{"mode": "single","tool": "extract","params": {"source": "url","input": "https://example.com","selectors": [{"name": "metadata","xpath": "//meta[@property='og:title']/@content"}]}}
summarize
Purpose: AI-powered text summarization with language and style control
When to Use:
- Condensing long articles
- Creating executive summaries
- Generating TL;DR versions
Parameters:
{text: stringlanguage?: stringstyle?: stringmaxTokens?: numbermodel?: stringapiKeySecret?: string}
Complete Example:
{"mode": "single","tool": "summarize","params": {"text": "Long article about climate change spanning multiple paragraphs...","language": "en","style": "concise","maxTokens": 200},"llm": {"provider": "openai","model": "gpt-4o-mini","apiKeySecret": "OPENAI_API_KEY"}}
Output:
{"summary": "Climate change is accelerating due to human activities. Key impacts include rising temperatures, extreme weather, and ecosystem disruption. Immediate action is needed.","tokens": 150}
Advanced Usage:
Multi-language summarization:
{"mode": "single","tool": "summarize","params": {"text": "Article en français...","language": "fr","style": "detailed"},"llm": {"provider": "anthropic","model": "claude-3-5-sonnet-20241022"}}
Bullet-point summaries:
{"mode": "single","tool": "summarize","params": {"text": "Long technical document...","style": "bullet"}}
classify
Purpose: Categorize text into predefined labels using AI
When to Use:
- Support ticket routing
- Content moderation
- Sentiment analysis
- Topic classification
Parameters:
{text: stringlabels: string[]maxTokens?: numbermodel?: stringapiKeySecret?: string}
Complete Example:
{"mode": "single","tool": "classify","params": {"text": "My account was charged twice for the same purchase. How do I get a refund?","labels": ["billing", "technical", "account", "general"]},"llm": {"provider": "openai","model": "gpt-4o-mini","apiKeySecret": "OPENAI_API_KEY"}}
Output:
{"label": "billing","confidence": 0.95,"tokens": 50}
Advanced Usage:
Sentiment classification:
{"mode": "single","tool": "classify","params": {"text": "This product exceeded my expectations!","labels": ["positive", "neutral", "negative"]}}
transform
Purpose: Transform and reshape JSON data with mapping rules
When to Use:
- Data normalization
- API response transformation
- Field mapping and renaming
Parameters:
{inputJson: anymapping: Array<{from?: stringto: stringop?: stringvalue?: any}>}
Complete Example:
{"mode": "single","tool": "transform","params": {"inputJson": {"user": {"firstName": "John","lastName": "Doe","tags": ["vip", "beta"],"created": "2025-01-07"}},"mapping": [{"from": "user.firstName","to": "customer.name"},{"from": "user.tags","to": "customer.segments","op": "join","value": ","},{"from": "user.created","to": "customer.joinDate","op": "dateParse"}]}}
Output:
{"customer": {"name": "John","segments": "vip,beta","joinDate": "2025-01-07T00:00:00.000Z"}}
Available Operations:
copy: Copy value as-is (default)const: Set constant valuejoin: Join array elements with delimitersplit: Split string into arraypick: Extract nested value by pathconcat: Concatenate valuesreplace: Replace text patternsdateParse: Parse date stringsnumberParse: Parse numeric valueslookup: Map values using dictionarypickByPath: Extract by dot notation path
crawl_lite
Purpose: Lightweight web crawler with configurable depth and pagination
When to Use:
- Crawling small to medium sites
- Following pagination
- Discovering internal links
Parameters:
{startUrl: stringmaxPages?: numbermaxDepth?: numbersameOriginOnly?: booleandelayMs?: number}
Complete Example:
{"mode": "single","tool": "crawl_lite","params": {"startUrl": "https://blog.example.com","maxPages": 10,"maxDepth": 2,"sameOriginOnly": true,"delayMs": 500},"dedupe": {"enabled": true,"strategy": "url"}}
Output:
{"pages": [{"url": "https://blog.example.com","status": 200,"bytes": 12345,"linksCount": 15,"cached": false},{"url": "https://blog.example.com/about","status": 200,"bytes": 8900,"linksCount": 5,"cached": false}]}
Advanced Usage:
Store crawled HTML:
{"mode": "single","tool": "crawl_lite","params": {"startUrl": "https://example.com","maxPages": 20},"store": {"html": true}}
extract_structured
Purpose: Extract data matching JSON schemas using AI
When to Use:
- Extracting complex structured data
- Schema-driven extraction
- Semi-structured content parsing
Parameters:
{source: 'text' | 'html' | 'url'input: stringjsonSchema: objectllm?: {provider?: stringmodel?: stringapiKeySecret?: stringmaxTokens?: number}}
Complete Example:
{"mode": "single","tool": "extract_structured","params": {"source": "text","input": "John Doe works as a Senior Engineer at Acme Corp. His email is john@acme.com and phone is +1-555-0123. He joined in January 2020.","jsonSchema": {"type": "object","properties": {"name": { "type": "string" },"position": { "type": "string" },"company": { "type": "string" },"email": { "type": "string" },"phone": { "type": "string" },"joinDate": { "type": "string" }}}},"llm": {"provider": "openai","model": "gpt-4o","apiKeySecret": "OPENAI_API_KEY"}}
Output:
{"data": {"name": "John Doe","position": "Senior Engineer","company": "Acme Corp","email": "john@acme.com","phone": "+1-555-0123","joinDate": "January 2020"},"confidence": 0.9,"tokens": 320}
Advanced Usage:
Extract arrays:
{"mode": "single","tool": "extract_structured","params": {"source": "text","input": "We offer three plans: Basic ($9/mo), Pro ($29/mo), Enterprise ($99/mo)","jsonSchema": {"type": "object","properties": {"plans": {"type": "array","items": {"type": "object","properties": {"name": { "type": "string" },"price": { "type": "number" }}}}}}}}
search_web
Purpose: Find URLs via sitemaps, RSS feeds, or search APIs
When to Use:
- Discovering content URLs
- Sitemap parsing
- RSS feed aggregation
Parameters:
{query?: stringsitemapUrl?: stringrssUrl?: stringmaxResults?: number}
Complete Example:
{"mode": "single","tool": "search_web","params": {"sitemapUrl": "https://example.com/sitemap.xml","maxResults": 50}}
Output:
{"urls": ["https://example.com/page1","https://example.com/page2","https://example.com/page3"],"count": 3,"source": "sitemap"}
Advanced Usage:
Parse RSS feeds:
{"mode": "single","tool": "search_web","params": {"rssUrl": "https://blog.example.com/feed","maxResults": 20}}
diff_text
Purpose: Compare text with semantic or character-level differences
When to Use:
- Content change detection
- Version comparison
- Update monitoring
Parameters:
{text1: stringtext2: stringsemantic?: boolean}
Complete Example:
{"mode": "single","tool": "diff_text","params": {"text1": "The quick brown fox jumps.","text2": "The quick red fox leaps.","semantic": true}}
Output:
{"additions": ["red", "leaps"],"deletions": ["brown", "jumps"],"changeScore": 0.286}
Advanced Usage:
Character-level diff:
{"mode": "single","tool": "diff_text","params": {"text1": "hello","text2": "helo","semantic": false}}
Chapter 4: Execution Modes
Single Mode
Execute one tool at a time.
Example:
{"mode": "single","tool": "fetch_web","params": {"url": "https://example.com"}}
When to Use:
- Simple one-off operations
- Testing tools
- API integrations
Batch Mode
Execute multiple tools in parallel with configurable concurrency.
Example:
{"mode": "batch","concurrency": 3,"calls": [{"tool": "fetch_web","params": { "url": "https://example.com/page1" }},{"tool": "fetch_web","params": { "url": "https://example.com/page2" }},{"tool": "summarize","params": { "text": "Long text..." }}]}
When to Use:
- Processing multiple URLs
- Parallel data operations
- Bulk transformations
Output:
{"results": [{"tool": "fetch_web","ok": true,"output": { "status": 200, "contentText": "..." }},{"tool": "fetch_web","ok": true,"output": { "status": 200, "contentText": "..." }},{"tool": "summarize","ok": true,"output": { "summary": "...", "tokens": 150 }}]}
DAG Dependencies
Execute tools with dependencies using Directed Acyclic Graph resolution.
Example:
{"mode": "batch","dag": true,"calls": [{"callId": "fetch","tool": "fetch_web","params": { "url": "https://example.com" }},{"callId": "extract","tool": "extract","params": {"source": "html","input": { "ref": "fetch.htmlSnippet" },"selectors": [{ "name": "title", "css": "h1" }]},"dependsOn": ["fetch"]},{"callId": "summarize","tool": "summarize","params": {"text": { "ref": "fetch.contentText" }},"dependsOn": ["fetch"]}]}
When to Use:
- Multi-step workflows
- Chained transformations
- Complex data pipelines
Reference Syntax:
{ "ref": "callId" }- Reference entire result{ "ref": "callId.path.to.field" }- Reference nested field{ "ref": "callId.array.0" }- Reference array element
Performance Tips
Optimize Concurrency:
- HTTP-only: 5-10 concurrent
- With proxies: 2-5 concurrent
- Browser rendering: 1-2 concurrent
Use Caching:
{"cache": {"enabled": true,"ttlSec": 3600}}
Enable Deduplication:
{"dedupe": {"enabled": true,"strategy": "url"}}
Set Budgets:
{"budgets": {"maxDurationSec": 300,"maxTotalBytes": 52428800,"maxTotalTokens": 100000}}
Chapter 5: AI/LLM Integration
Supported Providers
OpenAI:
- Models:
gpt-4o,gpt-4o-mini,gpt-4,gpt-3.5-turbo - Best for: General purpose, structured extraction
- Cost: Approximately $0.15-$10 per 1M tokens (subject to change)
Anthropic (Claude):
- Models:
claude-3-5-sonnet-20241022,claude-3-haiku-20240307 - Best for: Long-form content, complex reasoning
- Cost: Approximately $0.25-$15 per 1M tokens (subject to change)
Azure OpenAI:
- Models: Same as OpenAI, deployed to Azure
- Best for: Enterprise compliance, regional requirements
- Cost: Similar to OpenAI, billed through Azure (subject to change)
Model Selection
Configuration:
{"llm": {"provider": "openai","model": "gpt-4o-mini","apiKeySecret": "OPENAI_API_KEY","maxTokens": 4000}}
Choosing Models:
| Task | Recommended Model | Reason |
|---|---|---|
| Summarization | gpt-4o-mini | Fast, cheap, accurate |
| Classification | gpt-4o-mini | Low latency, cost-effective |
| Structured extraction | gpt-4o | Better schema adherence |
| Complex reasoning | claude-3-5-sonnet | Superior reasoning |
| Bulk operations | gpt-4o-mini | Cost optimization |
Cost Optimization
1. Use Cheaper Models:
{"llm": {"provider": "openai","model": "gpt-4o-mini"}}
2. Limit Token Usage:
{"llm": {"maxTokens": 500},"budgets": {"maxTotalTokens": 50000}}
3. Cache Results:
{"cache": {"enabled": true,"ttlSec": 86400}}
4. Monitor Costs:
Check costEstimateUSD in run reports:
{"costEstimateUSD": 0.0045,"usage": {"llmTokens": 3000,"llmCosts": {"openai": 0.0045,"anthropic": 0.0000,"azure": 0.0000,"total": 0.0045}}}
Automatic Cost Tracking
MCP Nexus automatically tracks LLM costs per provider with detailed breakdowns.
How It Works:
- Costs are calculated automatically for each LLM call
- Per-provider breakdown is maintained (OpenAI, Anthropic, Azure)
- Costs are displayed in logs during execution
- Final cost summary included in run report
Cost Tracking in Logs:
During execution, you'll see cost information for each LLM call:
[INFO] LLM cost: $0.0012 (openai, gpt-4o-mini, 450 tokens)[INFO] LLM cost: $0.0035 (anthropic, claude-3-5-sonnet-20241022, 890 tokens)
At the end of the run, a summary is displayed:
[INFO] LLM Costs: OpenAI $0.0024, Anthropic $0.0035, Azure $0.0000, Total $0.0059
Cost Breakdown in Output:
The usage.llmCosts field provides a detailed breakdown:
{"usage": {"llmTokens": 1340,"llmCosts": {"openai": 0.0024,"anthropic": 0.0035,"azure": 0.0000,"total": 0.0059}},"costEstimateUSD": 0.0059}
Per-Tool Cost Tracking:
Costs are tracked individually for each tool that uses LLM:
- summarize: Full cost per summary generated
- classify: Cost per classification
- extract_structured: Cost per extraction
Multi-Provider Support:
If you use multiple LLM providers in a single run (e.g., OpenAI for classification and Anthropic for summarization), costs are tracked separately:
{"mode": "batch","calls": [{"tool": "classify","params": {"text": "...", "labels": ["..."]},"llm": {"provider": "openai", "model": "gpt-4o-mini"}},{"tool": "summarize","params": {"text": "..."},"llm": {"provider": "anthropic", "model": "claude-3-5-sonnet-20241022"}}]}
Result:
{"usage": {"llmCosts": {"openai": 0.0008,"anthropic": 0.0042,"total": 0.0050}}}
Benefits:
- Transparency: Know exactly what each LLM call costs
- Optimization: Identify expensive operations and optimize
- Budgeting: Track costs against allocated budgets
- Multi-Provider: Compare costs across different providers
Token Management
Token Limits by Model:
| Model | Input Limit | Output Limit |
|---|---|---|
| gpt-4o | 128K | 16K |
| gpt-4o-mini | 128K | 16K |
| claude-3-5-sonnet | 200K | 8K |
| claude-3-haiku | 200K | 4K |
Tracking Usage:
Every LLM tool returns token count:
{"summary": "...","tokens": 450}
Total tokens tracked in usage:
{"usage": {"llmTokens": 1250}}
Structured Extraction Details
Use extract_structured for complex data extraction:
{"mode": "single","tool": "extract_structured","params": {"source": "text","input": "Product: iPhone 15 Pro\nPrice: $999\nColor: Blue","jsonSchema": {"type": "object","properties": {"product": { "type": "string" },"price": { "type": "number" },"color": { "type": "string" }},"required": ["product", "price"]}},"llm": {"provider": "openai","model": "gpt-4o"}}
Tips:
- Use detailed schemas with descriptions
- Prefer
gpt-4oovergpt-4o-minifor complex schemas - Validate extracted data in your application
Chapter 6: Performance & Optimization
HTTP Caching
How It Works:
MCP Nexus implements intelligent HTTP caching with:
- ETag header support
- Last-Modified header support
- Configurable TTL
- Per-URL cache entries
Configuration:
{"cache": {"enabled": true,"ttlSec": 3600}}
Cache Metrics:
Monitor effectiveness:
{"usage": {"cacheHits": 15,"cacheMisses": 3}}
Aim for >70% hit rate for repeated workloads.
TTL Guidelines:
| Content Type | Recommended TTL |
|---|---|
| Static content | 86400 (24h) |
| News/blogs | 3600 (1h) |
| Product prices | 300 (5min) |
| Stock data | 60 (1min) |
| User content | 0 (disabled) |
Request Deduplication
Strategies:
- URL-based: Same URL = duplicate
- Content-based: Same content hash = duplicate
- Hybrid: URL + content hash
Configuration:
{"dedupe": {"enabled": true,"strategy": "hybrid","ttlSec": 86400}}
When to Use:
- Crawling workflows
- Batch processing
- RSS/sitemap parsing
- Not for real-time data fetching
- Not for dynamic content
Example:
{"mode": "single","tool": "crawl_lite","params": {"startUrl": "https://example.com","maxPages": 100},"dedupe": {"enabled": true,"strategy": "url"}}
Circuit Breakers
Purpose: Prevent cascading failures by detecting and isolating failing services.
How It Works:
- Track failures per domain
- Open circuit after N failures
- Half-open after cooldown period
- Close after successful requests
Default Behavior:
- Failure threshold: 3 failures
- Cooldown: 60-120 seconds (randomized)
- Success threshold: 2 successes to close
Monitoring:
{"usage": {"circuitBreakerTrips": 2}}
High trip counts indicate:
- Target site issues
- Rate limiting
- Network problems
- Need for tuning
Best Practices:
- Monitor trip counts
- Investigate domains with frequent trips
- Adjust delays between requests
- Use proxies for problematic domains
Proxy Configuration
When to Use Proxies:
- Scraping rate-limited sites
- Avoiding IP blocks
- Geographic targeting
- High-volume scraping
Apify Proxy (Recommended):
{"proxy": {"useApifyProxy": true}}
Benefits:
- Residential and datacenter IPs
- Automatic rotation
- Geographic targeting
- Built-in retry logic
Cost: Approximately $0.50 per GB (subject to change)
Custom Proxies:
{"proxy": {"proxyUrls": ["http://user:pass@proxy1.example.com:8000","http://user:pass@proxy2.example.com:8000"]}}
User-Agent Rotation:
Automatic rotation through realistic browser User-Agents. No configuration needed.
Browser Rendering
Modes:
None (Default):
- HTTP-only fetching
- Fastest (100-500ms per page)
- No JavaScript execution
- Use for static content
Minimal:
{"render": "minimal"}
- Launches headless browser
- Waits 2-3 seconds for JS
- No screenshots
- Use for light JavaScript sites
Full:
{"render": "full"}
- Full browser rendering
- Waits for network idle
- Captures screenshots
- Use for complex SPAs
Performance Impact:
| Mode | Speed | Memory | CPU | Cost |
|---|---|---|---|---|
| None | 1x | 50MB | 1x | 1x |
| Minimal | 20x slower | 300MB | 5x | 5x |
| Full | 40x slower | 500MB | 10x | 10x |
When to Use:
- None: Static HTML, APIs, RSS feeds
- Minimal: E-commerce, news sites with JS
- Full: SPAs, React/Vue apps, complex UIs
Chapter 7: Security & Compliance
HMAC Webhook Verification
Overview:
All webhooks include HMAC-SHA256 signatures for verification.
Signature Format:
X-Signature: sha256=<hex-encoded-hmac>X-Timestamp: <ISO-8601-timestamp>X-Request-Id: <UUID-v4>
HMAC computed over: timestamp + "." + body
Node.js Verification:
const crypto = require('crypto');function verifyWebhook(body, timestamp, signature, secret) {const payload = `${timestamp}.${JSON.stringify(body)}`;const expectedSignature = crypto.createHmac('sha256', secret).update(payload).digest('hex');const expected = Buffer.from(`sha256=${expectedSignature}`, 'utf8');const actual = Buffer.from(signature, 'utf8');if (expected.length !== actual.length) {return false;}return crypto.timingSafeEqual(expected, actual);}app.post('/webhook', (req, res) => {const secret = process.env.WEBHOOK_SECRET;const signature = req.headers['x-signature'];const timestamp = req.headers['x-timestamp'];if (!verifyWebhook(req.body, timestamp, signature, secret)) {return res.status(401).send('Invalid signature');}console.log('Webhook verified:', req.body);res.status(200).send('OK');});
Python Verification:
import hmacimport hashlibdef verify_webhook(signature, timestamp, body, secret):expected = 'sha256=' + hmac.new(secret.encode('utf-8'),f'{timestamp}.{body}'.encode('utf-8'),hashlib.sha256).hexdigest()return hmac.compare_digest(signature, expected)@app.route('/webhook', methods=['POST'])def webhook():signature = request.headers.get('X-Signature')timestamp = request.headers.get('X-Timestamp')body = request.get_data(as_text=True)secret = os.environ['WEBHOOK_SECRET']if not verify_webhook(signature, timestamp, body, secret):return 'Invalid signature', 401data = request.jsonprint('Webhook verified:', data)return 'OK', 200
Replay Attack Prevention:
- Check timestamp (reject >5 minutes old)
- Store and check idempotency keys
- Use HTTPS only
Robots.txt Respect
Default Behavior:
Respects robots.txt for all fetch_web and crawl_lite operations.
Features:
- Wildcard pattern support
- Crawl-delay extraction
- User-agent: * rules
Override Per Domain:
{"security": {"ignoreRobotsFor": ["example.com", "api.example.com"]}}
Legal Considerations:
- Respecting robots.txt is a best practice
- Check Terms of Service of target sites
- Public data ≠ permission to scrape at scale
- Some countries have specific web scraping laws
Domain Allow/Deny Lists
Allowlist (Whitelist):
Only process URLs matching patterns:
{"security": {"allowlist": ["^https://example\\.com/.*","^https://api\\.mysite\\.com/.*"]}}
Denylist (Blacklist):
Block specific patterns:
{"security": {"denylist": ["^https://example\\.com/admin/.*","^https://.*\\.gov/.*","^https://.*\\.mil/.*"]}}
SSRF Protection:
Block internal networks:
{"security": {"denylist": ["^https?://127\\.0\\.0\\.1/.*","^https?://localhost/.*","^https?://169\\.254\\..*","^https?://10\\..*","^https?://172\\.(1[6-9]|2[0-9]|3[0-1])\\..*","^https?://192\\.168\\..*"]}}
PII Redaction
Enable Log Redaction:
{"security": {"redactLogs": true}}
What Gets Redacted:
- Tool results in console logs
resultfield in single moderesultsarray in batch mode
What's NOT Redacted:
- Metadata (timing, tokens, errors)
- Dataset outputs
- Webhook payloads
- Key-value store artifacts
Secret Management
Using Apify Secrets:
- Go to Apify Console → Settings → Secrets
- Add secret (e.g.,
OPENAI_API_KEY) - Reference in input:
{"llm": {"apiKeySecret": "OPENAI_API_KEY"}}
Environment Variables:
export OPENAI_API_KEY=sk-...export ANTHROPIC_API_KEY=sk-ant-...export WEBHOOK_SECRET=your-secret
Best Practices:
- Never commit secrets to repositories
- Use different secrets for dev/staging/prod
- Rotate secrets quarterly
- Use minimal required permissions
- Monitor secret usage
- Delete unused secrets
Content Security
Safe HTML Parsing:
- Uses
cheerioandjsdomsafely - No eval() or code execution
- Sandboxed DOM operations
- XSS-safe by design
PDF Parsing:
- Memory-limited parsing
- No code execution
- Timeout protection
XML Parsing:
- Entity expansion disabled
- DTD processing disabled
- XXE attack prevention
Chapter 8: Production Deployment
Rate Limits & Best Practices
Respecting Target Sites:
- Always respect robots.txt
- Use appropriate delays (300ms minimum)
- Implement exponential backoff for 429 responses
- Monitor circuit breaker trips
Recommended Settings:
{"budgets": {"maxDurationSec": 300,"maxCalls": 100,"maxPages": 50,"maxTotalBytes": 52428800,"maxTotalTokens": 100000}}
Rate Limiting Strategy:
- Per-domain circuit breakers (automatic)
- HTTP caching (reduce requests)
- Deduplication (avoid duplicates)
- Delays in
crawl_lite(300-1000ms)
Anti-Bot Strategies
When to Use Proxies:
- Sites with strict rate limits
- Many concurrent requests
- IP blocking issues
- Geographic targeting needed
User-Agent Rotation:
Automatic rotation through realistic browser User-Agents.
Additional Techniques:
- Random delays in
crawl_lite - Respect crawl-delay from robots.txt
- Use browser rendering for JS-heavy sites
- Limit batch concurrency (2-5)
Example:
{"mode": "single","tool": "fetch_web","params": {"url": "https://strict-site.com"},"proxy": {"useApifyProxy": true},"render": "minimal"}
When to Use Browser Rendering
Use "minimal" mode when:
- Site requires JavaScript but loads quickly
- Need basic interactivity
- Performance is a priority
Use "full" mode when:
- Complex JavaScript applications
- Need to wait for async content
- Screenshots required for verification
- SPAs (Single Page Applications)
Avoid browser rendering when:
- Static HTML is sufficient
- Performance is critical
- Costs need minimization
Cost Comparison:
| Mode | Pages/Hour | Cost Multiplier |
|---|---|---|
| HTTP-only | 3600 | 1x |
| Minimal | 180 | 20x |
| Full | 90 | 40x |
LLM Provider Limits
OpenAI:
| Model | TPM Limit (Free) | Approx. Cost per 1M Tokens |
|---|---|---|
| gpt-4o | 10,000 | ~$2.50 input, ~$10 output |
| gpt-4o-mini | 200,000 | ~$0.15 input, ~$0.60 output |
Anthropic:
| Model | TPM Limit | Approx. Cost per 1M Tokens |
|---|---|---|
| claude-3-5-sonnet | Varies | ~$3 input, ~$15 output |
| claude-3-haiku | Higher | ~$0.25 input, ~$1.25 output |
Optimization Tips:
- Use cheaper models for simple tasks
- Cache LLM results
- Limit
maxTokens - Use structured extraction sparingly
- Monitor
costEstimateUSD
Circuit Breaker Tuning
Default Settings:
- Failure threshold: 3 failures
- Cooldown: 60-120 seconds
- Success threshold: 2 successes
Adjust For:
Aggressive (Critical Production):
- Lower failure threshold (2)
- Longer cooldown (180s)
Lenient (Flaky Sources):
- Higher failure threshold (5)
- Shorter cooldown (30s)
Monitoring:
{"usage": {"circuitBreakerTrips": 3}}
High trips indicate:
- Target site issues
- Rate limiting
- Network problems
- Need for adjustment
Cache TTL Guidelines
By Content Type:
| Type | TTL (seconds) | Rationale |
|---|---|---|
| Static content | 86400 | Changes rarely |
| News/blogs | 3600 | Updated hourly |
| Product prices | 300 | Frequent changes |
| Stock data | 60 | Real-time needs |
| User content | 0 | Always fresh |
Configuration:
{"cache": {"enabled": true,"ttlSec": 3600}}
Monitor Effectiveness:
{"usage": {"cacheHits": 85,"cacheMisses": 15}}
Aim for >70% hit rate for repeated workloads.
Cost Optimization Strategies
1. Tiered Approach:
Try HTTP → Try minimal browser → Use full rendering
2. Batch Similar Operations:
Group by domain to leverage cache and circuit breakers:
{"mode": "batch","calls": [{"tool": "fetch_web", "params": {"url": "https://example.com/page1"}},{"tool": "fetch_web", "params": {"url": "https://example.com/page2"}},{"tool": "fetch_web", "params": {"url": "https://example.com/page3"}}]}
3. Enable Deduplication:
{"dedupe": {"enabled": true,"strategy": "url"}}
4. Minimize LLM Usage:
- Use
extractinstead ofextract_structuredwhen possible - Cache LLM results
- Use smaller models (gpt-4o-mini)
- Set aggressive
maxTokenslimits
5. Optimize Concurrency:
| Scenario | Recommended Concurrency |
|---|---|
| HTTP-only | 5-10 |
| With proxies | 2-5 |
| Browser rendering | 1-2 |
6. Store Only What You Need:
{"store": {"html": false,"screenshot": false,"text": true}}
Chapter 9: Development Guide
Project Structure
mcp-nexus/├── .actor/│ ├── actor.json # Actor metadata and config│ ├── input_schema.json # Input validation schema│ ├── dataset_schema.json # Dataset view schema│ └── key_value_store_schema.json # KVS collection schema├── src/│ ├── main.ts # Entry point and orchestrator│ ├── types.ts # TypeScript type definitions│ ├── lib/│ │ ├── validators.ts # Input validation (Zod)│ │ ├── http.ts # HTTP client with caching│ │ ├── circuitBreaker.ts # Circuit breaker logic│ │ ├── deduplication.ts # Duplicate detection│ │ ├── llm.ts # LLM client wrapper│ │ ├── browser.ts # Playwright browser manager│ │ ├── proxy.ts # Proxy and UA rotation│ │ ├── sitemap.ts # Sitemap/RSS parser│ │ ├── diff.ts # Text diff utilities│ │ ├── transform.ts # JSON transformation│ │ └── webhook.ts # Webhook delivery│ └── tools/│ ├── fetchWeb.ts # Web fetching tool│ ├── extract.ts # Data extraction tool│ ├── summarize.ts # AI summarization tool│ ├── classify.ts # AI classification tool│ ├── transform.ts # JSON transformation tool│ ├── crawlLite.ts # Web crawler tool│ ├── extractStructured.ts # Structured extraction tool│ ├── searchWeb.ts # URL discovery tool│ └── diffText.ts # Text comparison tool├── storage/ # Local dev storage│ ├── datasets/│ ├── key_value_stores/│ └── request_queues/├── Dockerfile # Container image definition├── package.json # Dependencies├── tsconfig.json # TypeScript config└── README.md # This file
Understanding the Code
Key Components:
Main Orchestrator (src/main.ts):
- Entry point using Apify SDK
- Input validation and parsing
- Tool routing and execution
- Metric collection and reporting
- Webhook delivery
Tool Runtime Context:
Each tool receives a context object with:
- Configuration (cache, dedupe, render, etc.)
- Recording functions (HTTP bytes, tokens, retries)
- Key-value store access
- Circuit breaker state
- User agent
Tool Implementation Pattern:
export const runMyTool = async (params: MyToolParams,ctx: ToolRuntimeContext) => {// Tool logic herereturn {// Tool output}}
Validators (src/lib/validators.ts):
- Zod schemas for all tool parameters
- Input parsing and validation
- Default value resolution
- Type safety guarantees
Infrastructure Libraries:
http.ts: Fetch with caching, robots.txt, PDF parsingcircuitBreaker.ts: Per-domain failure trackingdeduplication.ts: URL/content fingerprintingllm.ts: Multi-provider LLM clientbrowser.ts: Playwright renderingproxy.ts: User-agent rotation
Testing
Local Testing:
The Apify platform handles local testing. Use the Apify Console to:
- Configure input
- Run locally or on cloud
- View results in Dataset tab
Test with specific inputs:
Use the Console UI to test different:
- Tool configurations
- Execution modes
- Cache settings
- Error scenarios
Debugging
Enable Verbose Logging:
Check console output for:
- Request/response details
- Cache hits/misses
- Circuit breaker state
- Token usage
Inspect Storage:
Local development stores data in storage/:
datasets/default/- Run reportskey_value_stores/default/- Artifactskey_value_stores/default/INPUT.json- Input
Check Metrics:
Every run includes detailed metrics:
{"usage": {"durationMs": 1234,"httpBytes": 45678,"llmTokens": 150,"retries": 0,"cacheHits": 5,"cacheMisses": 2,"circuitBreakerTrips": 0}}
Use Correlation IDs:
Track requests across systems:
{"correlationId": "my-request-123"}
Chapter 10: API & Integration
Apify API Usage
Run Actor:
curl "https://api.apify.com/v2/acts/USERNAME~mcp-nexus/runs?token=YOUR_TOKEN" \-X POST \-H 'content-type: application/json' \-d '{"mode": "single","tool": "fetch_web","params": {"url": "https://example.com"}}'
Get Run Status:
$curl "https://api.apify.com/v2/acts/USERNAME~mcp-nexus/runs/RUN_ID?token=YOUR_TOKEN"
Get Dataset Items:
$curl "https://api.apify.com/v2/datasets/DATASET_ID/items?token=YOUR_TOKEN"
Full Documentation:
Webhook Setup
Configuration:
{"webhook": {"url": "https://api.example.com/webhook","secret": "your-webhook-secret","batching": true}}
Webhook Payload:
Receives complete RunReport:
{"correlationId": "abc-123","ok": true,"mode": "single","result": {...},"usage": {...}}
Headers:
Content-Type: application/jsonX-Signature: sha256=<hmac>X-Timestamp: <iso-timestamp>X-Request-Id: <uuid>
Verification:
See HMAC Webhook Verification for code examples.
Webhook Batching
Overview:
Webhook batching groups simultaneous webhook updates in batch mode, reducing the number of webhook calls and improving efficiency.
How It Works:
- When multiple tool calls complete within a time window (500ms), their results are batched
- A single webhook is sent with all grouped results
- Only applies to batch mode execution
- Maintains order and correlation
Enable Batching:
{"mode": "batch","calls": [{"tool": "fetch_web", "params": {"url": "https://example.com/page1"}},{"tool": "fetch_web", "params": {"url": "https://example.com/page2"}},{"tool": "summarize", "params": {"text": "..."}}],"webhook": {"url": "https://api.example.com/webhook","secret": "your-secret","batching": true}}
Batched Webhook Payload:
When multiple updates are grouped, the webhook receives:
{"type": "batch","count": 3,"items": [{"tool": "fetch_web","result": {"status": 200,"contentText": "..."}},{"tool": "fetch_web","result": {"status": 200,"contentText": "..."}},{"tool": "summarize","result": {"summary": "...","tokens": 150}}]}
Single vs. Batch Payload:
If only one update is in the batch window, it sends the regular format:
{"correlationId": "abc-123","ok": true,"mode": "batch","results": [...]}
Logs:
During execution with batching enabled:
[INFO] Webhook batch: 3 updates grouped[INFO] Sending batched webhook
Configuration Options:
| Field | Type | Default | Description |
|---|---|---|---|
batching | boolean | true | Enable webhook batching for batch mode |
Disable Batching:
To send individual webhooks for each result:
{"webhook": {"url": "https://api.example.com/webhook","secret": "your-secret","batching": false}}
Benefits:
- Reduced Calls: Fewer webhook requests to your endpoint
- Efficiency: Lower network overhead and processing
- Grouping: Related results arrive together
- Cost Savings: Reduced webhook processing costs
Use Cases:
- High-volume batch processing: Process many tool calls efficiently
- API rate limits: Reduce webhook endpoint load
- Correlated updates: Group related results for easier processing
- Cost optimization: Minimize webhook infrastructure costs
Important Notes:
- Batching only applies to batch mode (
"mode": "batch") - Single mode always sends individual webhooks
- Batch window is 500ms (not configurable)
- Empty batches are not sent
- Default is enabled (
batching: true)
Handling Batched Webhooks:
Your webhook endpoint should handle both regular and batched formats:
app.post('/webhook', (req, res) => {const payload = req.body;if (payload.type === 'batch') {console.log(`Received batch of ${payload.count} items`);payload.items.forEach(item => {console.log(`Tool: ${item.tool}`, item.result);});} else {console.log('Received single result');console.log(payload.result || payload.results);}res.status(200).send('OK');});
n8n Integration
Step 1: HTTP Request Node
Configure HTTP Request node:
- Method: POST
- URL:
https://api.apify.com/v2/acts/USERNAME~mcp-nexus/runs?token=YOUR_TOKEN - Body: JSON
Step 2: Pass Input
{"mode": "single","tool": "fetch_web","params": {"url": "{{$json.url}}"}}
Step 3: Wait for Completion
Add Wait node or use webhooks for async notification.
Step 4: Process Results
Parse dataset output in subsequent nodes.
REST API Examples
Example 1: Fetch and Summarize
curl "https://api.apify.com/v2/acts/USERNAME~mcp-nexus/runs?token=TOKEN" \-H 'content-type: application/json' \-d '{"mode": "batch","dag": true,"calls": [{"callId": "fetch","tool": "fetch_web","params": {"url": "https://example.com/article"}},{"callId": "summarize","tool": "summarize","params": {"text": {"ref": "fetch.contentText"}},"dependsOn": ["fetch"]}]}'
Example 2: Crawl and Extract
curl "https://api.apify.com/v2/acts/USERNAME~mcp-nexus/runs?token=TOKEN" \-H 'content-type: application/json' \-d '{"mode": "single","tool": "crawl_lite","params": {"startUrl": "https://example.com","maxPages": 10},"store": {"html": true}}'
SDK Usage
JavaScript:
import { ApifyClient } from 'apify-client'const client = new ApifyClient({ token: 'YOUR_TOKEN' })const run = await client.actor('USERNAME/mcp-nexus').call({mode: 'single',tool: 'fetch_web',params: {url: 'https://example.com'}})const dataset = await client.dataset(run.defaultDatasetId).listItems()console.log(dataset.items[0])
Python:
from apify_client import ApifyClientclient = ApifyClient('YOUR_TOKEN')run = client.actor('USERNAME/mcp-nexus').call(run_input={'mode': 'single','tool': 'fetch_web','params': {'url': 'https://example.com'}})dataset = client.dataset(run['defaultDatasetId']).list_items()print(dataset.items[0])
Appendices
Appendix A: Input Schema Reference
Top-Level Fields:
| Field | Type | Required | Description |
|---|---|---|---|
mode | 'single' | 'batch' | Yes | Execution mode |
correlationId | string | No | Tracking identifier |
tool | ToolName | Conditional | Tool name (single mode) |
params | object | Conditional | Tool parameters (single mode) |
calls | array | Conditional | Tool calls (batch mode) |
dag | boolean | No | Enable DAG execution |
concurrency | number | No | Batch concurrency (default: 2) |
Configuration Objects:
llm:
{provider: 'openai' | 'anthropic' | 'azure'model: stringapiKeySecret?: stringmaxTokens?: number}
cache:
{enabled: booleanttlSec: number}
dedupe:
{enabled: booleanttlSec: numberstrategy: 'url' | 'content' | 'hybrid'}
render:
'none' | 'minimal' | 'full'
store:
{html: booleanscreenshot: booleantext: boolean}
proxy:
{useApifyProxy?: booleanproxyUrls?: string[]}
security:
{allowlist?: string[]denylist?: string[]ignoreRobotsFor?: string[]redactLogs?: boolean}
budgets:
{maxDurationSec?: numbermaxCalls?: numbermaxPages?: numbermaxTotalBytes?: numbermaxTotalTokens?: numbermaxLLMTokens?: numbermaxFetchBytes?: number}
webhook:
{url?: stringsecret?: stringbatching?: boolean}
Appendix B: Output Schema Reference
RunReport:
{correlationId: stringschemaVersion: numberok: booleanmode: 'single' | 'batch'toolsExecuted: numberusage: {durationMs: numberhttpBytes: numberllmTokens: numberretries: numbercacheHits: numbercacheMisses: numbercircuitBreakerTrips: numberllmCosts: {openai: numberanthropic: numberazure: numbertotal: number}}costEstimateUSD: numberwarnings: string[]errors: string[]timestamp: stringresult?: anyresults?: Array<{tool: stringok: booleanoutput?: anyerror?: string}>toolMetrics?: Record<string, {durationMs: numberretries: numberbytes: numbertokens: number}>}
Appendix C: Error Codes
Common Errors:
| Error | Cause | Solution |
|---|---|---|
Unsupported tool | Invalid tool name | Check tool names in schema |
LLM API key not found | Missing API key | Set apiKeySecret or env var |
Max total bytes quota exceeded | Budget limit hit | Increase maxTotalBytes |
Max total tokens quota exceeded | Token budget exceeded | Increase maxTotalTokens |
Circuit breaker open | Domain failures | Wait for cooldown |
Failed to execute | Tool execution error | Check tool parameters |
Circular dependency detected | Invalid DAG | Fix dependsOn references |
Reference to unknown call | Invalid ref | Check callId values |
Appendix D: Troubleshooting
Issue: Circuit Breaker Constantly Tripping
Symptoms: Many circuit breaker trips in usage
Solutions:
- Check if target site is up
- Increase delay between requests
- Use proxies
- Check if IP is blocked
Issue: High LLM Costs
Symptoms: High costEstimateUSD values
Solutions:
- Use cheaper models (gpt-4o-mini)
- Enable caching
- Reduce
maxTokens - Switch to rule-based extraction
Issue: Browser Rendering Timeouts
Symptoms: Errors with render: "full"
Solutions:
- Increase Actor timeout
- Use "minimal" instead
- Check if site loads locally
- Consider HTTP-only approach
Issue: Low Cache Hit Rate
Symptoms: High cache misses, low hits
Solutions:
- Increase cache TTL
- Check if URLs have unique parameters
- Enable deduplication
- Use canonical URLs
Issue: Webhooks Not Delivered
Symptoms: No webhook received
Solutions:
- Check webhook URL is accessible
- Verify HMAC secret
- Check for 429 responses
- Review idempotency logs
Appendix E: FAQ
Q: Can I run this without Apify?
No, MCP Nexus is designed as an Apify Actor and relies on the Apify platform infrastructure.
Q: How much does it cost?
Costs include:
- Apify compute units (approximately $0.25/hour, subject to change)
- LLM API calls (provider-dependent, subject to change)
- Apify Proxy (if used, approximately $0.50/GB, subject to change)
Q: Can I use my own LLM API keys?
Yes, store them as Apify secrets and reference via apiKeySecret.
Q: Is there a rate limit?
Limits depend on:
- Your Apify plan
- LLM provider limits
- Target site restrictions
Q: Can I scrape any website?
You should:
- Respect robots.txt
- Follow Terms of Service
- Comply with local laws
- Use responsibly
Q: How do I debug failed runs?
Check:
- Error messages in output
- Circuit breaker trips
- Budget violations
- Tool parameters
Q: What's the maximum execution time?
Default: 60 seconds (configurable via maxDurationSec)
Appendix F: Changelog
See CHANGELOG.md for complete version history.
Latest Version: 2.0.0
Major features:
- Multi-provider LLM support
- HTTP caching with ETags
- Circuit breakers
- Browser rendering
- DAG execution mode
- Structured extraction
- 9 specialized tools
Support & Resources
Documentation:
Community:
Commercial Support:
Support the Developer:
License & Support
License: This actor is proprietary software available on the Apify platform.
Support:
- Issues & Questions: Contact via tuguidragos.com
- Feature Requests: Reach out via website
- Commercial Support: Available upon request
Built by Tugui Dragos Web: tuguidragos.com Support Development: Buy Me a Coffee
Last Updated: 2025-11-11