MCP Nexus Universal AI Tool Bridge avatar
MCP Nexus Universal AI Tool Bridge

Pricing

from $0.50 / 1,000 results

Go to Apify Store
MCP Nexus Universal AI Tool Bridge

MCP Nexus Universal AI Tool Bridge

Connect AI agents to real data. MCP Nexus runs tools that fetch, extract, summarize, classify and crawl web content with caching, multi LLM support, HMAC webhooks, circuit breakers and full observability in a stateless production ready Apify actor.

Pricing

from $0.50 / 1,000 results

Rating

5.0

(1)

Developer

Țugui Dragoș

Țugui Dragoș

Maintained by Community

Actor stats

1

Bookmarked

3

Total users

2

Monthly active users

11 hours ago

Last modified

Share

AI-powered web data bridge with smart caching, multi-LLM support, and production-grade reliability. Extract, transform, and analyze web content at scale on Apify platform.

Quick Start

Run on Apify Platform

  1. Configure your input parameters
  2. Click "Start" to run
  3. View results in the Dataset tab

30-Second Tutorial

Fetch and extract data from any webpage in three simple steps:

Step 1: Select Tool

Choose fetch_web from the tool dropdown

Step 2: Configure

{
"mode": "single",
"tool": "fetch_web",
"params": {
"url": "https://example.com"
}
}

Step 3: Run

Click Start and view extracted content in the dataset

One-Line API Call

curl "https://api.apify.com/v2/acts/SRHAma9FEsmuewetK/runs?token=YOUR_TOKEN" \
-X POST -H "Content-Type: application/json" \
-d '{"mode":"single","tool":"fetch_web","params":{"url":"https://example.com"}}'

This actor respects robots.txt by default. Always review target site Terms of Service. Use proxies and rendering responsibly. You are responsible for compliance (GDPR/PII/ToS) in your jurisdiction.

What MCP Nexus Can Do

MCP Nexus provides 9 specialized tools for web data operations:

  1. fetch_web - Fetch and extract content from web pages
  2. extract - Extract specific data using CSS, XPath, or regex selectors
  3. summarize - Generate AI summaries of text content
  4. classify - Classify text into predefined categories using AI
  5. transform - Transform JSON data with mapping operations
  6. crawl_lite - Crawl multiple pages with depth and link following
  7. extract_structured - Extract structured data using AI and JSON schemas
  8. search_web - Parse sitemaps and RSS feeds for URL discovery
  9. diff_text - Compare two texts and calculate semantic differences

Table of Contents


Chapter 1: Core Concepts

What is MCP Nexus

MCP Nexus is a universal AI tool bridge that connects AI agents, workflows, and applications to real-world web data. It provides a production-ready actor on the Apify platform that orchestrates nine specialized tools for web scraping, data extraction, AI-powered analysis, and content transformation.

Key Characteristics:

  • Stateless: Each run is independent with no persistent state
  • Observable: Full metrics and logging for debugging and monitoring
  • Resilient: Built-in circuit breakers and retry logic
  • Scalable: Runs on Apify's cloud infrastructure
  • Compliant: Respects robots.txt and implements security best practices

Architecture Overview

┌─────────────────────────────────────────────────────────┐
MCP Nexus Actor │
├─────────────────────────────────────────────────────────┤
│ Input Validation (Zod)
│ ├─ Single Mode / Batch Mode / DAG Mode │
│ └─ Budget Tracking & Quota Management │
├─────────────────────────────────────────────────────────┤
│ Tool Router │
│ ├─ fetch_web ├─ crawl_lite │
│ ├─ extract ├─ extract_structured │
│ ├─ summarize ├─ search_web │
│ ├─ classify ├─ diff_text │
│ └─ transform │
├─────────────────────────────────────────────────────────┤
│ Infrastructure Layer │
│ ├─ HTTP Client (caching, ETags, Last-Modified)
│ ├─ Circuit Breakers (per-domain failure detection)
│ ├─ Deduplication (URL/content/hybrid fingerprinting)
│ ├─ LLM Client (OpenAI, Anthropic, Azure)
│ ├─ Browser (Playwright minimal/full rendering)
│ └─ Proxy Manager (Apify Proxy, custom rotation)
├─────────────────────────────────────────────────────────┤
│ Output & Storage │
│ ├─ Dataset (structured run reports)
│ ├─ Key-Value Store (HTML, screenshots, text)
│ └─ Webhook Delivery (HMAC-signed notifications)
└─────────────────────────────────────────────────────────┘

How It Works

  1. Input Processing: Validates JSON input against schema, applies defaults
  2. Tool Selection: Routes to appropriate tool handler based on mode
  3. Execution: Runs tool with context (config, tracking, storage)
  4. Metric Collection: Records bytes, tokens, retries, cache hits
  5. Result Assembly: Builds structured report with metadata
  6. Output: Pushes to dataset, sends webhook if configured

Key Features

Performance:

  • HTTP caching with ETag/Last-Modified support
  • Request deduplication (URL, content, hybrid)
  • Per-domain circuit breakers
  • Browser rendering (none/minimal/full)
  • Proxy rotation

AI/LLM:

  • Multi-provider support (OpenAI, Anthropic, Azure)
  • Cost tracking per request
  • Token usage monitoring
  • Structured JSON extraction

Observability:

  • Per-tool execution metrics
  • Cache hit/miss ratios
  • Circuit breaker trip counts
  • Correlation IDs for request tracking
  • Detailed error messages

Security:

  • HMAC webhook signatures
  • Robots.txt enforcement
  • Allow/deny list URL filtering
  • Log redaction for PII
  • Secret management via Apify

Chapter 2: Getting Started

Installation

Option 1: Use on Apify Console (Recommended)

  1. Open Actor
  2. Click "Try for free"
  3. Configure input via UI
  4. Click "Start"

Option 2: Deploy to Your Apify Account

  1. Visit the Actor page
  2. Click "Schedule" or "API" to integrate
  3. Use Apify API or SDK to run programmatically

Authentication

Apify API Token:

Get your token from Apify Console → Settings → Integrations

LLM API Keys:

Store as Apify secrets:

  1. Go to Apify Console → Settings → Secrets
  2. Add secret: OPENAI_API_KEY = sk-...
  3. Reference in input: "apiKeySecret": "OPENAI_API_KEY"

Or set as environment variables:

export OPENAI_API_KEY=sk-...
export ANTHROPIC_API_KEY=sk-ant-...

Your First Run

Example 1: Fetch a Web Page

{
"mode": "single",
"tool": "fetch_web",
"params": {
"url": "https://example.com",
"stripBoilerplate": true
}
}

Example 2: Summarize Text

{
"mode": "single",
"tool": "summarize",
"params": {
"text": "Long article text here...",
"language": "en",
"style": "concise"
},
"llm": {
"provider": "openai",
"model": "gpt-4o-mini",
"apiKeySecret": "OPENAI_API_KEY"
}
}

Example 3: Extract Data

{
"mode": "single",
"tool": "extract",
"params": {
"source": "url",
"input": "https://news.ycombinator.com",
"selectors": [
{ "name": "titles", "css": ".titleline > a" }
]
}
}

Understanding Results

All runs produce a structured RunReport:

{
"correlationId": "abc-123",
"schemaVersion": 1,
"ok": true,
"mode": "single",
"toolsExecuted": 1,
"usage": {
"durationMs": 1234,
"httpBytes": 45678,
"llmTokens": 150,
"retries": 0,
"cacheHits": 0,
"cacheMisses": 1,
"circuitBreakerTrips": 0
},
"costEstimateUSD": 0.0002,
"warnings": [],
"errors": [],
"timestamp": "2025-01-07T12:34:56.789Z",
"result": {
"status": 200,
"url": "https://example.com",
"contentText": "Extracted content here...",
"htmlSnippet": "<html>...",
"links": []
}
}

Key Fields:

  • ok: Overall success indicator
  • usage: Resource consumption metrics
  • costEstimateUSD: Estimated LLM costs
  • result: Tool output (single mode)
  • results: Array of outputs (batch mode)

For optimal performance and cost savings, use these defaults:

{
"cache": {
"enabled": true,
"ttlSec": 3600
},
"dedupe": {
"enabled": true,
"strategy": "url",
"ttlSec": 86400
},
"budgets": {
"maxDurationSec": 60,
"maxTotalBytes": 5242880,
"maxTotalTokens": 20000
},
"security": {
"redactLogs": true
}
}

Why these defaults:

  • Caching (1 hour) provides immediate ROI by avoiding duplicate fetches
  • URL deduplication (24 hours) prevents processing same pages multiple times
  • Budget limits prevent runaway costs
  • Log redaction protects sensitive data

Conversion-Optimized Examples

Example 1: Batch Mix (fetch + extract + summarize)

{
"mode": "batch",
"concurrency": 2,
"dag": true,
"calls": [
{
"callId": "fetch",
"tool": "fetch_web",
"params": {"url": "https://example.com/article"}
},
{
"callId": "extract",
"tool": "extract",
"params": {
"source": "text",
"input": {"ref": "fetch.result.contentText"},
"selectors": [{"name": "title", "regex": "^#\\s+(.+)$"}]
},
"dependsOn": ["fetch"]
},
{
"callId": "summarize",
"tool": "summarize",
"params": {"text": {"ref": "fetch.result.contentText"}},
"dependsOn": ["fetch"]
}
],
"llm": {
"provider": "openai",
"model": "gpt-4o-mini"
}
}

Example 2: Structured Extract with Schema

{
"mode": "single",
"tool": "extract_structured",
"params": {
"source": "url",
"input": "https://example.com/pricing",
"jsonSchema": {
"type": "object",
"properties": {
"plans": {
"type": "array",
"items": {
"type": "object",
"properties": {
"name": {"type": "string"},
"price": {"type": "number"}
}
}
}
}
}
},
"llm": {
"provider": "openai",
"model": "gpt-4o-mini"
}
}

Example 3: Crawl with Storage

{
"mode": "single",
"tool": "crawl_lite",
"params": {
"startUrl": "https://example.com",
"maxPages": 10,
"maxDepth": 2
},
"store": {
"html": true,
"text": true
}
}

Chapter 3: Tools Reference

fetch_web

Purpose: Download and parse web pages with smart content extraction

When to Use:

  • Fetching article content
  • Downloading HTML for later processing
  • Extracting clean text from pages

Parameters:

{
url: string
stripBoilerplate?: boolean
headers?: Record<string, string>
timeoutMs?: number
maxBytes?: number
respectRobotsTxt?: boolean
}

Complete Example:

{
"mode": "single",
"tool": "fetch_web",
"params": {
"url": "https://blog.example.com/article",
"stripBoilerplate": true
},
"cache": {
"enabled": true,
"ttlSec": 3600
}
}

Output:

{
"status": 200,
"url": "https://blog.example.com/article",
"contentText": "Clean article text...",
"htmlSnippet": "<html>...",
"links": [
{ "href": "/about", "text": "About Us" }
],
"meta": {
"finalUrl": "https://blog.example.com/article",
"contentType": "text/html",
"bytes": 25678,
"language": "en",
"rendered": false
}
}

Advanced Usage:

Enable browser rendering for JavaScript-heavy sites:

{
"mode": "single",
"tool": "fetch_web",
"params": {
"url": "https://spa-example.com"
},
"render": "minimal"
}

Store artifacts:

{
"mode": "single",
"tool": "fetch_web",
"params": {
"url": "https://example.com"
},
"store": {
"html": true,
"text": true,
"screenshot": true
}
}

extract

Purpose: Parse and extract data from HTML/text using selectors and patterns

When to Use:

  • Scraping structured data from web pages
  • Extracting specific fields
  • Pattern matching with regex

Parameters:

{
source: 'url' | 'html' | 'text'
input: string
selectors?: Array<{
name: string
css?: string
xpath?: string
regex?: string
}>
patterns?: Array<{
name: string
regex: string
group?: number
}>
}

Complete Example:

{
"mode": "single",
"tool": "extract",
"params": {
"source": "url",
"input": "https://news.ycombinator.com",
"selectors": [
{
"name": "titles",
"css": ".titleline > a"
},
{
"name": "scores",
"css": ".score"
}
],
"patterns": [
{
"name": "points",
"regex": "(\\d+) points?",
"group": 1
}
]
}
}

Output:

{
"fields": {
"titles": [
"Show HN: My New Project",
"Ask HN: How do you...",
"Tell HN: Something..."
],
"scores": ["123 points", "45 points", "67 points"]
},
"matches": {
"points": ["123", "45", "67"]
}
}

Advanced Usage:

Extract from HTML string:

{
"mode": "single",
"tool": "extract",
"params": {
"source": "html",
"input": "<article><h1>Title</h1><p>Body</p></article>",
"selectors": [
{ "name": "headline", "css": "h1" },
{ "name": "body", "css": "p" }
]
}
}

Use XPath for complex queries:

{
"mode": "single",
"tool": "extract",
"params": {
"source": "url",
"input": "https://example.com",
"selectors": [
{
"name": "metadata",
"xpath": "//meta[@property='og:title']/@content"
}
]
}
}

summarize

Purpose: AI-powered text summarization with language and style control

When to Use:

  • Condensing long articles
  • Creating executive summaries
  • Generating TL;DR versions

Parameters:

{
text: string
language?: string
style?: string
maxTokens?: number
model?: string
apiKeySecret?: string
}

Complete Example:

{
"mode": "single",
"tool": "summarize",
"params": {
"text": "Long article about climate change spanning multiple paragraphs...",
"language": "en",
"style": "concise",
"maxTokens": 200
},
"llm": {
"provider": "openai",
"model": "gpt-4o-mini",
"apiKeySecret": "OPENAI_API_KEY"
}
}

Output:

{
"summary": "Climate change is accelerating due to human activities. Key impacts include rising temperatures, extreme weather, and ecosystem disruption. Immediate action is needed.",
"tokens": 150
}

Advanced Usage:

Multi-language summarization:

{
"mode": "single",
"tool": "summarize",
"params": {
"text": "Article en français...",
"language": "fr",
"style": "detailed"
},
"llm": {
"provider": "anthropic",
"model": "claude-3-5-sonnet-20241022"
}
}

Bullet-point summaries:

{
"mode": "single",
"tool": "summarize",
"params": {
"text": "Long technical document...",
"style": "bullet"
}
}

classify

Purpose: Categorize text into predefined labels using AI

When to Use:

  • Support ticket routing
  • Content moderation
  • Sentiment analysis
  • Topic classification

Parameters:

{
text: string
labels: string[]
maxTokens?: number
model?: string
apiKeySecret?: string
}

Complete Example:

{
"mode": "single",
"tool": "classify",
"params": {
"text": "My account was charged twice for the same purchase. How do I get a refund?",
"labels": ["billing", "technical", "account", "general"]
},
"llm": {
"provider": "openai",
"model": "gpt-4o-mini",
"apiKeySecret": "OPENAI_API_KEY"
}
}

Output:

{
"label": "billing",
"confidence": 0.95,
"tokens": 50
}

Advanced Usage:

Sentiment classification:

{
"mode": "single",
"tool": "classify",
"params": {
"text": "This product exceeded my expectations!",
"labels": ["positive", "neutral", "negative"]
}
}

transform

Purpose: Transform and reshape JSON data with mapping rules

When to Use:

  • Data normalization
  • API response transformation
  • Field mapping and renaming

Parameters:

{
inputJson: any
mapping: Array<{
from?: string
to: string
op?: string
value?: any
}>
}

Complete Example:

{
"mode": "single",
"tool": "transform",
"params": {
"inputJson": {
"user": {
"firstName": "John",
"lastName": "Doe",
"tags": ["vip", "beta"],
"created": "2025-01-07"
}
},
"mapping": [
{
"from": "user.firstName",
"to": "customer.name"
},
{
"from": "user.tags",
"to": "customer.segments",
"op": "join",
"value": ","
},
{
"from": "user.created",
"to": "customer.joinDate",
"op": "dateParse"
}
]
}
}

Output:

{
"customer": {
"name": "John",
"segments": "vip,beta",
"joinDate": "2025-01-07T00:00:00.000Z"
}
}

Available Operations:

  • copy: Copy value as-is (default)
  • const: Set constant value
  • join: Join array elements with delimiter
  • split: Split string into array
  • pick: Extract nested value by path
  • concat: Concatenate values
  • replace: Replace text patterns
  • dateParse: Parse date strings
  • numberParse: Parse numeric values
  • lookup: Map values using dictionary
  • pickByPath: Extract by dot notation path

crawl_lite

Purpose: Lightweight web crawler with configurable depth and pagination

When to Use:

  • Crawling small to medium sites
  • Following pagination
  • Discovering internal links

Parameters:

{
startUrl: string
maxPages?: number
maxDepth?: number
sameOriginOnly?: boolean
delayMs?: number
}

Complete Example:

{
"mode": "single",
"tool": "crawl_lite",
"params": {
"startUrl": "https://blog.example.com",
"maxPages": 10,
"maxDepth": 2,
"sameOriginOnly": true,
"delayMs": 500
},
"dedupe": {
"enabled": true,
"strategy": "url"
}
}

Output:

{
"pages": [
{
"url": "https://blog.example.com",
"status": 200,
"bytes": 12345,
"linksCount": 15,
"cached": false
},
{
"url": "https://blog.example.com/about",
"status": 200,
"bytes": 8900,
"linksCount": 5,
"cached": false
}
]
}

Advanced Usage:

Store crawled HTML:

{
"mode": "single",
"tool": "crawl_lite",
"params": {
"startUrl": "https://example.com",
"maxPages": 20
},
"store": {
"html": true
}
}

extract_structured

Purpose: Extract data matching JSON schemas using AI

When to Use:

  • Extracting complex structured data
  • Schema-driven extraction
  • Semi-structured content parsing

Parameters:

{
source: 'text' | 'html' | 'url'
input: string
jsonSchema: object
llm?: {
provider?: string
model?: string
apiKeySecret?: string
maxTokens?: number
}
}

Complete Example:

{
"mode": "single",
"tool": "extract_structured",
"params": {
"source": "text",
"input": "John Doe works as a Senior Engineer at Acme Corp. His email is john@acme.com and phone is +1-555-0123. He joined in January 2020.",
"jsonSchema": {
"type": "object",
"properties": {
"name": { "type": "string" },
"position": { "type": "string" },
"company": { "type": "string" },
"email": { "type": "string" },
"phone": { "type": "string" },
"joinDate": { "type": "string" }
}
}
},
"llm": {
"provider": "openai",
"model": "gpt-4o",
"apiKeySecret": "OPENAI_API_KEY"
}
}

Output:

{
"data": {
"name": "John Doe",
"position": "Senior Engineer",
"company": "Acme Corp",
"email": "john@acme.com",
"phone": "+1-555-0123",
"joinDate": "January 2020"
},
"confidence": 0.9,
"tokens": 320
}

Advanced Usage:

Extract arrays:

{
"mode": "single",
"tool": "extract_structured",
"params": {
"source": "text",
"input": "We offer three plans: Basic ($9/mo), Pro ($29/mo), Enterprise ($99/mo)",
"jsonSchema": {
"type": "object",
"properties": {
"plans": {
"type": "array",
"items": {
"type": "object",
"properties": {
"name": { "type": "string" },
"price": { "type": "number" }
}
}
}
}
}
}
}

search_web

Purpose: Find URLs via sitemaps, RSS feeds, or search APIs

When to Use:

  • Discovering content URLs
  • Sitemap parsing
  • RSS feed aggregation

Parameters:

{
query?: string
sitemapUrl?: string
rssUrl?: string
maxResults?: number
}

Complete Example:

{
"mode": "single",
"tool": "search_web",
"params": {
"sitemapUrl": "https://example.com/sitemap.xml",
"maxResults": 50
}
}

Output:

{
"urls": [
"https://example.com/page1",
"https://example.com/page2",
"https://example.com/page3"
],
"count": 3,
"source": "sitemap"
}

Advanced Usage:

Parse RSS feeds:

{
"mode": "single",
"tool": "search_web",
"params": {
"rssUrl": "https://blog.example.com/feed",
"maxResults": 20
}
}

diff_text

Purpose: Compare text with semantic or character-level differences

When to Use:

  • Content change detection
  • Version comparison
  • Update monitoring

Parameters:

{
text1: string
text2: string
semantic?: boolean
}

Complete Example:

{
"mode": "single",
"tool": "diff_text",
"params": {
"text1": "The quick brown fox jumps.",
"text2": "The quick red fox leaps.",
"semantic": true
}
}

Output:

{
"additions": ["red", "leaps"],
"deletions": ["brown", "jumps"],
"changeScore": 0.286
}

Advanced Usage:

Character-level diff:

{
"mode": "single",
"tool": "diff_text",
"params": {
"text1": "hello",
"text2": "helo",
"semantic": false
}
}

Chapter 4: Execution Modes

Single Mode

Execute one tool at a time.

Example:

{
"mode": "single",
"tool": "fetch_web",
"params": {
"url": "https://example.com"
}
}

When to Use:

  • Simple one-off operations
  • Testing tools
  • API integrations

Batch Mode

Execute multiple tools in parallel with configurable concurrency.

Example:

{
"mode": "batch",
"concurrency": 3,
"calls": [
{
"tool": "fetch_web",
"params": { "url": "https://example.com/page1" }
},
{
"tool": "fetch_web",
"params": { "url": "https://example.com/page2" }
},
{
"tool": "summarize",
"params": { "text": "Long text..." }
}
]
}

When to Use:

  • Processing multiple URLs
  • Parallel data operations
  • Bulk transformations

Output:

{
"results": [
{
"tool": "fetch_web",
"ok": true,
"output": { "status": 200, "contentText": "..." }
},
{
"tool": "fetch_web",
"ok": true,
"output": { "status": 200, "contentText": "..." }
},
{
"tool": "summarize",
"ok": true,
"output": { "summary": "...", "tokens": 150 }
}
]
}

DAG Dependencies

Execute tools with dependencies using Directed Acyclic Graph resolution.

Example:

{
"mode": "batch",
"dag": true,
"calls": [
{
"callId": "fetch",
"tool": "fetch_web",
"params": { "url": "https://example.com" }
},
{
"callId": "extract",
"tool": "extract",
"params": {
"source": "html",
"input": { "ref": "fetch.htmlSnippet" },
"selectors": [{ "name": "title", "css": "h1" }]
},
"dependsOn": ["fetch"]
},
{
"callId": "summarize",
"tool": "summarize",
"params": {
"text": { "ref": "fetch.contentText" }
},
"dependsOn": ["fetch"]
}
]
}

When to Use:

  • Multi-step workflows
  • Chained transformations
  • Complex data pipelines

Reference Syntax:

  • { "ref": "callId" } - Reference entire result
  • { "ref": "callId.path.to.field" } - Reference nested field
  • { "ref": "callId.array.0" } - Reference array element

Performance Tips

Optimize Concurrency:

  • HTTP-only: 5-10 concurrent
  • With proxies: 2-5 concurrent
  • Browser rendering: 1-2 concurrent

Use Caching:

{
"cache": {
"enabled": true,
"ttlSec": 3600
}
}

Enable Deduplication:

{
"dedupe": {
"enabled": true,
"strategy": "url"
}
}

Set Budgets:

{
"budgets": {
"maxDurationSec": 300,
"maxTotalBytes": 52428800,
"maxTotalTokens": 100000
}
}

Chapter 5: AI/LLM Integration

Supported Providers

OpenAI:

  • Models: gpt-4o, gpt-4o-mini, gpt-4, gpt-3.5-turbo
  • Best for: General purpose, structured extraction
  • Cost: Approximately $0.15-$10 per 1M tokens (subject to change)

Anthropic (Claude):

  • Models: claude-3-5-sonnet-20241022, claude-3-haiku-20240307
  • Best for: Long-form content, complex reasoning
  • Cost: Approximately $0.25-$15 per 1M tokens (subject to change)

Azure OpenAI:

  • Models: Same as OpenAI, deployed to Azure
  • Best for: Enterprise compliance, regional requirements
  • Cost: Similar to OpenAI, billed through Azure (subject to change)

Model Selection

Configuration:

{
"llm": {
"provider": "openai",
"model": "gpt-4o-mini",
"apiKeySecret": "OPENAI_API_KEY",
"maxTokens": 4000
}
}

Choosing Models:

TaskRecommended ModelReason
Summarizationgpt-4o-miniFast, cheap, accurate
Classificationgpt-4o-miniLow latency, cost-effective
Structured extractiongpt-4oBetter schema adherence
Complex reasoningclaude-3-5-sonnetSuperior reasoning
Bulk operationsgpt-4o-miniCost optimization

Cost Optimization

1. Use Cheaper Models:

{
"llm": {
"provider": "openai",
"model": "gpt-4o-mini"
}
}

2. Limit Token Usage:

{
"llm": {
"maxTokens": 500
},
"budgets": {
"maxTotalTokens": 50000
}
}

3. Cache Results:

{
"cache": {
"enabled": true,
"ttlSec": 86400
}
}

4. Monitor Costs:

Check costEstimateUSD in run reports:

{
"costEstimateUSD": 0.0045,
"usage": {
"llmTokens": 3000,
"llmCosts": {
"openai": 0.0045,
"anthropic": 0.0000,
"azure": 0.0000,
"total": 0.0045
}
}
}

Automatic Cost Tracking

MCP Nexus automatically tracks LLM costs per provider with detailed breakdowns.

How It Works:

  • Costs are calculated automatically for each LLM call
  • Per-provider breakdown is maintained (OpenAI, Anthropic, Azure)
  • Costs are displayed in logs during execution
  • Final cost summary included in run report

Cost Tracking in Logs:

During execution, you'll see cost information for each LLM call:

[INFO] LLM cost: $0.0012 (openai, gpt-4o-mini, 450 tokens)
[INFO] LLM cost: $0.0035 (anthropic, claude-3-5-sonnet-20241022, 890 tokens)

At the end of the run, a summary is displayed:

[INFO] LLM Costs: OpenAI $0.0024, Anthropic $0.0035, Azure $0.0000, Total $0.0059

Cost Breakdown in Output:

The usage.llmCosts field provides a detailed breakdown:

{
"usage": {
"llmTokens": 1340,
"llmCosts": {
"openai": 0.0024,
"anthropic": 0.0035,
"azure": 0.0000,
"total": 0.0059
}
},
"costEstimateUSD": 0.0059
}

Per-Tool Cost Tracking:

Costs are tracked individually for each tool that uses LLM:

  • summarize: Full cost per summary generated
  • classify: Cost per classification
  • extract_structured: Cost per extraction

Multi-Provider Support:

If you use multiple LLM providers in a single run (e.g., OpenAI for classification and Anthropic for summarization), costs are tracked separately:

{
"mode": "batch",
"calls": [
{
"tool": "classify",
"params": {"text": "...", "labels": ["..."]},
"llm": {"provider": "openai", "model": "gpt-4o-mini"}
},
{
"tool": "summarize",
"params": {"text": "..."},
"llm": {"provider": "anthropic", "model": "claude-3-5-sonnet-20241022"}
}
]
}

Result:

{
"usage": {
"llmCosts": {
"openai": 0.0008,
"anthropic": 0.0042,
"total": 0.0050
}
}
}

Benefits:

  • Transparency: Know exactly what each LLM call costs
  • Optimization: Identify expensive operations and optimize
  • Budgeting: Track costs against allocated budgets
  • Multi-Provider: Compare costs across different providers

Token Management

Token Limits by Model:

ModelInput LimitOutput Limit
gpt-4o128K16K
gpt-4o-mini128K16K
claude-3-5-sonnet200K8K
claude-3-haiku200K4K

Tracking Usage:

Every LLM tool returns token count:

{
"summary": "...",
"tokens": 450
}

Total tokens tracked in usage:

{
"usage": {
"llmTokens": 1250
}
}

Structured Extraction Details

Use extract_structured for complex data extraction:

{
"mode": "single",
"tool": "extract_structured",
"params": {
"source": "text",
"input": "Product: iPhone 15 Pro\nPrice: $999\nColor: Blue",
"jsonSchema": {
"type": "object",
"properties": {
"product": { "type": "string" },
"price": { "type": "number" },
"color": { "type": "string" }
},
"required": ["product", "price"]
}
},
"llm": {
"provider": "openai",
"model": "gpt-4o"
}
}

Tips:

  • Use detailed schemas with descriptions
  • Prefer gpt-4o over gpt-4o-mini for complex schemas
  • Validate extracted data in your application

Chapter 6: Performance & Optimization

HTTP Caching

How It Works:

MCP Nexus implements intelligent HTTP caching with:

  • ETag header support
  • Last-Modified header support
  • Configurable TTL
  • Per-URL cache entries

Configuration:

{
"cache": {
"enabled": true,
"ttlSec": 3600
}
}

Cache Metrics:

Monitor effectiveness:

{
"usage": {
"cacheHits": 15,
"cacheMisses": 3
}
}

Aim for >70% hit rate for repeated workloads.

TTL Guidelines:

Content TypeRecommended TTL
Static content86400 (24h)
News/blogs3600 (1h)
Product prices300 (5min)
Stock data60 (1min)
User content0 (disabled)

Request Deduplication

Strategies:

  1. URL-based: Same URL = duplicate
  2. Content-based: Same content hash = duplicate
  3. Hybrid: URL + content hash

Configuration:

{
"dedupe": {
"enabled": true,
"strategy": "hybrid",
"ttlSec": 86400
}
}

When to Use:

  • Crawling workflows
  • Batch processing
  • RSS/sitemap parsing
  • Not for real-time data fetching
  • Not for dynamic content

Example:

{
"mode": "single",
"tool": "crawl_lite",
"params": {
"startUrl": "https://example.com",
"maxPages": 100
},
"dedupe": {
"enabled": true,
"strategy": "url"
}
}

Circuit Breakers

Purpose: Prevent cascading failures by detecting and isolating failing services.

How It Works:

  1. Track failures per domain
  2. Open circuit after N failures
  3. Half-open after cooldown period
  4. Close after successful requests

Default Behavior:

  • Failure threshold: 3 failures
  • Cooldown: 60-120 seconds (randomized)
  • Success threshold: 2 successes to close

Monitoring:

{
"usage": {
"circuitBreakerTrips": 2
}
}

High trip counts indicate:

  • Target site issues
  • Rate limiting
  • Network problems
  • Need for tuning

Best Practices:

  • Monitor trip counts
  • Investigate domains with frequent trips
  • Adjust delays between requests
  • Use proxies for problematic domains

Proxy Configuration

When to Use Proxies:

  • Scraping rate-limited sites
  • Avoiding IP blocks
  • Geographic targeting
  • High-volume scraping

Apify Proxy (Recommended):

{
"proxy": {
"useApifyProxy": true
}
}

Benefits:

  • Residential and datacenter IPs
  • Automatic rotation
  • Geographic targeting
  • Built-in retry logic

Cost: Approximately $0.50 per GB (subject to change)

Custom Proxies:

{
"proxy": {
"proxyUrls": [
"http://user:pass@proxy1.example.com:8000",
"http://user:pass@proxy2.example.com:8000"
]
}
}

User-Agent Rotation:

Automatic rotation through realistic browser User-Agents. No configuration needed.

Browser Rendering

Modes:

None (Default):

  • HTTP-only fetching
  • Fastest (100-500ms per page)
  • No JavaScript execution
  • Use for static content

Minimal:

{
"render": "minimal"
}
  • Launches headless browser
  • Waits 2-3 seconds for JS
  • No screenshots
  • Use for light JavaScript sites

Full:

{
"render": "full"
}
  • Full browser rendering
  • Waits for network idle
  • Captures screenshots
  • Use for complex SPAs

Performance Impact:

ModeSpeedMemoryCPUCost
None1x50MB1x1x
Minimal20x slower300MB5x5x
Full40x slower500MB10x10x

When to Use:

  • None: Static HTML, APIs, RSS feeds
  • Minimal: E-commerce, news sites with JS
  • Full: SPAs, React/Vue apps, complex UIs

Chapter 7: Security & Compliance

HMAC Webhook Verification

Overview:

All webhooks include HMAC-SHA256 signatures for verification.

Signature Format:

X-Signature: sha256=<hex-encoded-hmac>
X-Timestamp: <ISO-8601-timestamp>
X-Request-Id: <UUID-v4>

HMAC computed over: timestamp + "." + body

Node.js Verification:

const crypto = require('crypto');
function verifyWebhook(body, timestamp, signature, secret) {
const payload = `${timestamp}.${JSON.stringify(body)}`;
const expectedSignature = crypto
.createHmac('sha256', secret)
.update(payload)
.digest('hex');
const expected = Buffer.from(`sha256=${expectedSignature}`, 'utf8');
const actual = Buffer.from(signature, 'utf8');
if (expected.length !== actual.length) {
return false;
}
return crypto.timingSafeEqual(expected, actual);
}
app.post('/webhook', (req, res) => {
const secret = process.env.WEBHOOK_SECRET;
const signature = req.headers['x-signature'];
const timestamp = req.headers['x-timestamp'];
if (!verifyWebhook(req.body, timestamp, signature, secret)) {
return res.status(401).send('Invalid signature');
}
console.log('Webhook verified:', req.body);
res.status(200).send('OK');
});

Python Verification:

import hmac
import hashlib
def verify_webhook(signature, timestamp, body, secret):
expected = 'sha256=' + hmac.new(
secret.encode('utf-8'),
f'{timestamp}.{body}'.encode('utf-8'),
hashlib.sha256
).hexdigest()
return hmac.compare_digest(signature, expected)
@app.route('/webhook', methods=['POST'])
def webhook():
signature = request.headers.get('X-Signature')
timestamp = request.headers.get('X-Timestamp')
body = request.get_data(as_text=True)
secret = os.environ['WEBHOOK_SECRET']
if not verify_webhook(signature, timestamp, body, secret):
return 'Invalid signature', 401
data = request.json
print('Webhook verified:', data)
return 'OK', 200

Replay Attack Prevention:

  1. Check timestamp (reject >5 minutes old)
  2. Store and check idempotency keys
  3. Use HTTPS only

Robots.txt Respect

Default Behavior:

Respects robots.txt for all fetch_web and crawl_lite operations.

Features:

  • Wildcard pattern support
  • Crawl-delay extraction
  • User-agent: * rules

Override Per Domain:

{
"security": {
"ignoreRobotsFor": ["example.com", "api.example.com"]
}
}

Legal Considerations:

  • Respecting robots.txt is a best practice
  • Check Terms of Service of target sites
  • Public data ≠ permission to scrape at scale
  • Some countries have specific web scraping laws

Domain Allow/Deny Lists

Allowlist (Whitelist):

Only process URLs matching patterns:

{
"security": {
"allowlist": [
"^https://example\\.com/.*",
"^https://api\\.mysite\\.com/.*"
]
}
}

Denylist (Blacklist):

Block specific patterns:

{
"security": {
"denylist": [
"^https://example\\.com/admin/.*",
"^https://.*\\.gov/.*",
"^https://.*\\.mil/.*"
]
}
}

SSRF Protection:

Block internal networks:

{
"security": {
"denylist": [
"^https?://127\\.0\\.0\\.1/.*",
"^https?://localhost/.*",
"^https?://169\\.254\\..*",
"^https?://10\\..*",
"^https?://172\\.(1[6-9]|2[0-9]|3[0-1])\\..*",
"^https?://192\\.168\\..*"
]
}
}

PII Redaction

Enable Log Redaction:

{
"security": {
"redactLogs": true
}
}

What Gets Redacted:

  • Tool results in console logs
  • result field in single mode
  • results array in batch mode

What's NOT Redacted:

  • Metadata (timing, tokens, errors)
  • Dataset outputs
  • Webhook payloads
  • Key-value store artifacts

Secret Management

Using Apify Secrets:

  1. Go to Apify Console → Settings → Secrets
  2. Add secret (e.g., OPENAI_API_KEY)
  3. Reference in input:
{
"llm": {
"apiKeySecret": "OPENAI_API_KEY"
}
}

Environment Variables:

export OPENAI_API_KEY=sk-...
export ANTHROPIC_API_KEY=sk-ant-...
export WEBHOOK_SECRET=your-secret

Best Practices:

  • Never commit secrets to repositories
  • Use different secrets for dev/staging/prod
  • Rotate secrets quarterly
  • Use minimal required permissions
  • Monitor secret usage
  • Delete unused secrets

Content Security

Safe HTML Parsing:

  • Uses cheerio and jsdom safely
  • No eval() or code execution
  • Sandboxed DOM operations
  • XSS-safe by design

PDF Parsing:

  • Memory-limited parsing
  • No code execution
  • Timeout protection

XML Parsing:

  • Entity expansion disabled
  • DTD processing disabled
  • XXE attack prevention

Chapter 8: Production Deployment

Rate Limits & Best Practices

Respecting Target Sites:

  • Always respect robots.txt
  • Use appropriate delays (300ms minimum)
  • Implement exponential backoff for 429 responses
  • Monitor circuit breaker trips

Recommended Settings:

{
"budgets": {
"maxDurationSec": 300,
"maxCalls": 100,
"maxPages": 50,
"maxTotalBytes": 52428800,
"maxTotalTokens": 100000
}
}

Rate Limiting Strategy:

  1. Per-domain circuit breakers (automatic)
  2. HTTP caching (reduce requests)
  3. Deduplication (avoid duplicates)
  4. Delays in crawl_lite (300-1000ms)

Anti-Bot Strategies

When to Use Proxies:

  • Sites with strict rate limits
  • Many concurrent requests
  • IP blocking issues
  • Geographic targeting needed

User-Agent Rotation:

Automatic rotation through realistic browser User-Agents.

Additional Techniques:

  1. Random delays in crawl_lite
  2. Respect crawl-delay from robots.txt
  3. Use browser rendering for JS-heavy sites
  4. Limit batch concurrency (2-5)

Example:

{
"mode": "single",
"tool": "fetch_web",
"params": {
"url": "https://strict-site.com"
},
"proxy": {
"useApifyProxy": true
},
"render": "minimal"
}

When to Use Browser Rendering

Use "minimal" mode when:

  • Site requires JavaScript but loads quickly
  • Need basic interactivity
  • Performance is a priority

Use "full" mode when:

  • Complex JavaScript applications
  • Need to wait for async content
  • Screenshots required for verification
  • SPAs (Single Page Applications)

Avoid browser rendering when:

  • Static HTML is sufficient
  • Performance is critical
  • Costs need minimization

Cost Comparison:

ModePages/HourCost Multiplier
HTTP-only36001x
Minimal18020x
Full9040x

LLM Provider Limits

OpenAI:

ModelTPM Limit (Free)Approx. Cost per 1M Tokens
gpt-4o10,000~$2.50 input, ~$10 output
gpt-4o-mini200,000~$0.15 input, ~$0.60 output

Anthropic:

ModelTPM LimitApprox. Cost per 1M Tokens
claude-3-5-sonnetVaries~$3 input, ~$15 output
claude-3-haikuHigher~$0.25 input, ~$1.25 output

Optimization Tips:

  1. Use cheaper models for simple tasks
  2. Cache LLM results
  3. Limit maxTokens
  4. Use structured extraction sparingly
  5. Monitor costEstimateUSD

Circuit Breaker Tuning

Default Settings:

  • Failure threshold: 3 failures
  • Cooldown: 60-120 seconds
  • Success threshold: 2 successes

Adjust For:

Aggressive (Critical Production):

  • Lower failure threshold (2)
  • Longer cooldown (180s)

Lenient (Flaky Sources):

  • Higher failure threshold (5)
  • Shorter cooldown (30s)

Monitoring:

{
"usage": {
"circuitBreakerTrips": 3
}
}

High trips indicate:

  • Target site issues
  • Rate limiting
  • Network problems
  • Need for adjustment

Cache TTL Guidelines

By Content Type:

TypeTTL (seconds)Rationale
Static content86400Changes rarely
News/blogs3600Updated hourly
Product prices300Frequent changes
Stock data60Real-time needs
User content0Always fresh

Configuration:

{
"cache": {
"enabled": true,
"ttlSec": 3600
}
}

Monitor Effectiveness:

{
"usage": {
"cacheHits": 85,
"cacheMisses": 15
}
}

Aim for >70% hit rate for repeated workloads.

Cost Optimization Strategies

1. Tiered Approach:

Try HTTP → Try minimal browser → Use full rendering

2. Batch Similar Operations:

Group by domain to leverage cache and circuit breakers:

{
"mode": "batch",
"calls": [
{"tool": "fetch_web", "params": {"url": "https://example.com/page1"}},
{"tool": "fetch_web", "params": {"url": "https://example.com/page2"}},
{"tool": "fetch_web", "params": {"url": "https://example.com/page3"}}
]
}

3. Enable Deduplication:

{
"dedupe": {
"enabled": true,
"strategy": "url"
}
}

4. Minimize LLM Usage:

  • Use extract instead of extract_structured when possible
  • Cache LLM results
  • Use smaller models (gpt-4o-mini)
  • Set aggressive maxTokens limits

5. Optimize Concurrency:

ScenarioRecommended Concurrency
HTTP-only5-10
With proxies2-5
Browser rendering1-2

6. Store Only What You Need:

{
"store": {
"html": false,
"screenshot": false,
"text": true
}
}

Chapter 9: Development Guide

Project Structure

mcp-nexus/
├── .actor/
│ ├── actor.json # Actor metadata and config
│ ├── input_schema.json # Input validation schema
│ ├── dataset_schema.json # Dataset view schema
│ └── key_value_store_schema.json # KVS collection schema
├── src/
│ ├── main.ts # Entry point and orchestrator
│ ├── types.ts # TypeScript type definitions
│ ├── lib/
│ │ ├── validators.ts # Input validation (Zod)
│ │ ├── http.ts # HTTP client with caching
│ │ ├── circuitBreaker.ts # Circuit breaker logic
│ │ ├── deduplication.ts # Duplicate detection
│ │ ├── llm.ts # LLM client wrapper
│ │ ├── browser.ts # Playwright browser manager
│ │ ├── proxy.ts # Proxy and UA rotation
│ │ ├── sitemap.ts # Sitemap/RSS parser
│ │ ├── diff.ts # Text diff utilities
│ │ ├── transform.ts # JSON transformation
│ │ └── webhook.ts # Webhook delivery
│ └── tools/
│ ├── fetchWeb.ts # Web fetching tool
│ ├── extract.ts # Data extraction tool
│ ├── summarize.ts # AI summarization tool
│ ├── classify.ts # AI classification tool
│ ├── transform.ts # JSON transformation tool
│ ├── crawlLite.ts # Web crawler tool
│ ├── extractStructured.ts # Structured extraction tool
│ ├── searchWeb.ts # URL discovery tool
│ └── diffText.ts # Text comparison tool
├── storage/ # Local dev storage
│ ├── datasets/
│ ├── key_value_stores/
│ └── request_queues/
├── Dockerfile # Container image definition
├── package.json # Dependencies
├── tsconfig.json # TypeScript config
└── README.md # This file

Understanding the Code

Key Components:

Main Orchestrator (src/main.ts):

  • Entry point using Apify SDK
  • Input validation and parsing
  • Tool routing and execution
  • Metric collection and reporting
  • Webhook delivery

Tool Runtime Context:

Each tool receives a context object with:

  • Configuration (cache, dedupe, render, etc.)
  • Recording functions (HTTP bytes, tokens, retries)
  • Key-value store access
  • Circuit breaker state
  • User agent

Tool Implementation Pattern:

export const runMyTool = async (
params: MyToolParams,
ctx: ToolRuntimeContext
) => {
// Tool logic here
return {
// Tool output
}
}

Validators (src/lib/validators.ts):

  • Zod schemas for all tool parameters
  • Input parsing and validation
  • Default value resolution
  • Type safety guarantees

Infrastructure Libraries:

  • http.ts: Fetch with caching, robots.txt, PDF parsing
  • circuitBreaker.ts: Per-domain failure tracking
  • deduplication.ts: URL/content fingerprinting
  • llm.ts: Multi-provider LLM client
  • browser.ts: Playwright rendering
  • proxy.ts: User-agent rotation

Testing

Local Testing:

The Apify platform handles local testing. Use the Apify Console to:

  1. Configure input
  2. Run locally or on cloud
  3. View results in Dataset tab

Test with specific inputs:

Use the Console UI to test different:

  • Tool configurations
  • Execution modes
  • Cache settings
  • Error scenarios

Debugging

Enable Verbose Logging:

Check console output for:

  • Request/response details
  • Cache hits/misses
  • Circuit breaker state
  • Token usage

Inspect Storage:

Local development stores data in storage/:

  • datasets/default/ - Run reports
  • key_value_stores/default/ - Artifacts
  • key_value_stores/default/INPUT.json - Input

Check Metrics:

Every run includes detailed metrics:

{
"usage": {
"durationMs": 1234,
"httpBytes": 45678,
"llmTokens": 150,
"retries": 0,
"cacheHits": 5,
"cacheMisses": 2,
"circuitBreakerTrips": 0
}
}

Use Correlation IDs:

Track requests across systems:

{
"correlationId": "my-request-123"
}

Chapter 10: API & Integration

Apify API Usage

Run Actor:

curl "https://api.apify.com/v2/acts/USERNAME~mcp-nexus/runs?token=YOUR_TOKEN" \
-X POST \
-H 'content-type: application/json' \
-d '{
"mode": "single",
"tool": "fetch_web",
"params": {"url": "https://example.com"}
}'

Get Run Status:

$curl "https://api.apify.com/v2/acts/USERNAME~mcp-nexus/runs/RUN_ID?token=YOUR_TOKEN"

Get Dataset Items:

$curl "https://api.apify.com/v2/datasets/DATASET_ID/items?token=YOUR_TOKEN"

Full Documentation:

Apify API Reference

Webhook Setup

Configuration:

{
"webhook": {
"url": "https://api.example.com/webhook",
"secret": "your-webhook-secret",
"batching": true
}
}

Webhook Payload:

Receives complete RunReport:

{
"correlationId": "abc-123",
"ok": true,
"mode": "single",
"result": {...},
"usage": {...}
}

Headers:

  • Content-Type: application/json
  • X-Signature: sha256=<hmac>
  • X-Timestamp: <iso-timestamp>
  • X-Request-Id: <uuid>

Verification:

See HMAC Webhook Verification for code examples.

Webhook Batching

Overview:

Webhook batching groups simultaneous webhook updates in batch mode, reducing the number of webhook calls and improving efficiency.

How It Works:

  • When multiple tool calls complete within a time window (500ms), their results are batched
  • A single webhook is sent with all grouped results
  • Only applies to batch mode execution
  • Maintains order and correlation

Enable Batching:

{
"mode": "batch",
"calls": [
{"tool": "fetch_web", "params": {"url": "https://example.com/page1"}},
{"tool": "fetch_web", "params": {"url": "https://example.com/page2"}},
{"tool": "summarize", "params": {"text": "..."}}
],
"webhook": {
"url": "https://api.example.com/webhook",
"secret": "your-secret",
"batching": true
}
}

Batched Webhook Payload:

When multiple updates are grouped, the webhook receives:

{
"type": "batch",
"count": 3,
"items": [
{
"tool": "fetch_web",
"result": {
"status": 200,
"contentText": "..."
}
},
{
"tool": "fetch_web",
"result": {
"status": 200,
"contentText": "..."
}
},
{
"tool": "summarize",
"result": {
"summary": "...",
"tokens": 150
}
}
]
}

Single vs. Batch Payload:

If only one update is in the batch window, it sends the regular format:

{
"correlationId": "abc-123",
"ok": true,
"mode": "batch",
"results": [...]
}

Logs:

During execution with batching enabled:

[INFO] Webhook batch: 3 updates grouped
[INFO] Sending batched webhook

Configuration Options:

FieldTypeDefaultDescription
batchingbooleantrueEnable webhook batching for batch mode

Disable Batching:

To send individual webhooks for each result:

{
"webhook": {
"url": "https://api.example.com/webhook",
"secret": "your-secret",
"batching": false
}
}

Benefits:

  • Reduced Calls: Fewer webhook requests to your endpoint
  • Efficiency: Lower network overhead and processing
  • Grouping: Related results arrive together
  • Cost Savings: Reduced webhook processing costs

Use Cases:

  • High-volume batch processing: Process many tool calls efficiently
  • API rate limits: Reduce webhook endpoint load
  • Correlated updates: Group related results for easier processing
  • Cost optimization: Minimize webhook infrastructure costs

Important Notes:

  • Batching only applies to batch mode ("mode": "batch")
  • Single mode always sends individual webhooks
  • Batch window is 500ms (not configurable)
  • Empty batches are not sent
  • Default is enabled (batching: true)

Handling Batched Webhooks:

Your webhook endpoint should handle both regular and batched formats:

app.post('/webhook', (req, res) => {
const payload = req.body;
if (payload.type === 'batch') {
console.log(`Received batch of ${payload.count} items`);
payload.items.forEach(item => {
console.log(`Tool: ${item.tool}`, item.result);
});
} else {
console.log('Received single result');
console.log(payload.result || payload.results);
}
res.status(200).send('OK');
});

n8n Integration

Step 1: HTTP Request Node

Configure HTTP Request node:

  • Method: POST
  • URL: https://api.apify.com/v2/acts/USERNAME~mcp-nexus/runs?token=YOUR_TOKEN
  • Body: JSON

Step 2: Pass Input

{
"mode": "single",
"tool": "fetch_web",
"params": {
"url": "{{$json.url}}"
}
}

Step 3: Wait for Completion

Add Wait node or use webhooks for async notification.

Step 4: Process Results

Parse dataset output in subsequent nodes.

REST API Examples

Example 1: Fetch and Summarize

curl "https://api.apify.com/v2/acts/USERNAME~mcp-nexus/runs?token=TOKEN" \
-H 'content-type: application/json' \
-d '{
"mode": "batch",
"dag": true,
"calls": [
{
"callId": "fetch",
"tool": "fetch_web",
"params": {"url": "https://example.com/article"}
},
{
"callId": "summarize",
"tool": "summarize",
"params": {
"text": {"ref": "fetch.contentText"}
},
"dependsOn": ["fetch"]
}
]
}'

Example 2: Crawl and Extract

curl "https://api.apify.com/v2/acts/USERNAME~mcp-nexus/runs?token=TOKEN" \
-H 'content-type: application/json' \
-d '{
"mode": "single",
"tool": "crawl_lite",
"params": {
"startUrl": "https://example.com",
"maxPages": 10
},
"store": {"html": true}
}'

SDK Usage

JavaScript:

import { ApifyClient } from 'apify-client'
const client = new ApifyClient({ token: 'YOUR_TOKEN' })
const run = await client.actor('USERNAME/mcp-nexus').call({
mode: 'single',
tool: 'fetch_web',
params: {
url: 'https://example.com'
}
})
const dataset = await client.dataset(run.defaultDatasetId).listItems()
console.log(dataset.items[0])

Python:

from apify_client import ApifyClient
client = ApifyClient('YOUR_TOKEN')
run = client.actor('USERNAME/mcp-nexus').call(
run_input={
'mode': 'single',
'tool': 'fetch_web',
'params': {
'url': 'https://example.com'
}
}
)
dataset = client.dataset(run['defaultDatasetId']).list_items()
print(dataset.items[0])

Appendices

Appendix A: Input Schema Reference

Top-Level Fields:

FieldTypeRequiredDescription
mode'single' | 'batch'YesExecution mode
correlationIdstringNoTracking identifier
toolToolNameConditionalTool name (single mode)
paramsobjectConditionalTool parameters (single mode)
callsarrayConditionalTool calls (batch mode)
dagbooleanNoEnable DAG execution
concurrencynumberNoBatch concurrency (default: 2)

Configuration Objects:

llm:

{
provider: 'openai' | 'anthropic' | 'azure'
model: string
apiKeySecret?: string
maxTokens?: number
}

cache:

{
enabled: boolean
ttlSec: number
}

dedupe:

{
enabled: boolean
ttlSec: number
strategy: 'url' | 'content' | 'hybrid'
}

render:

'none' | 'minimal' | 'full'

store:

{
html: boolean
screenshot: boolean
text: boolean
}

proxy:

{
useApifyProxy?: boolean
proxyUrls?: string[]
}

security:

{
allowlist?: string[]
denylist?: string[]
ignoreRobotsFor?: string[]
redactLogs?: boolean
}

budgets:

{
maxDurationSec?: number
maxCalls?: number
maxPages?: number
maxTotalBytes?: number
maxTotalTokens?: number
maxLLMTokens?: number
maxFetchBytes?: number
}

webhook:

{
url?: string
secret?: string
batching?: boolean
}

Appendix B: Output Schema Reference

RunReport:

{
correlationId: string
schemaVersion: number
ok: boolean
mode: 'single' | 'batch'
toolsExecuted: number
usage: {
durationMs: number
httpBytes: number
llmTokens: number
retries: number
cacheHits: number
cacheMisses: number
circuitBreakerTrips: number
llmCosts: {
openai: number
anthropic: number
azure: number
total: number
}
}
costEstimateUSD: number
warnings: string[]
errors: string[]
timestamp: string
result?: any
results?: Array<{
tool: string
ok: boolean
output?: any
error?: string
}>
toolMetrics?: Record<string, {
durationMs: number
retries: number
bytes: number
tokens: number
}>
}

Appendix C: Error Codes

Common Errors:

ErrorCauseSolution
Unsupported toolInvalid tool nameCheck tool names in schema
LLM API key not foundMissing API keySet apiKeySecret or env var
Max total bytes quota exceededBudget limit hitIncrease maxTotalBytes
Max total tokens quota exceededToken budget exceededIncrease maxTotalTokens
Circuit breaker openDomain failuresWait for cooldown
Failed to executeTool execution errorCheck tool parameters
Circular dependency detectedInvalid DAGFix dependsOn references
Reference to unknown callInvalid refCheck callId values

Appendix D: Troubleshooting

Issue: Circuit Breaker Constantly Tripping

Symptoms: Many circuit breaker trips in usage

Solutions:

  • Check if target site is up
  • Increase delay between requests
  • Use proxies
  • Check if IP is blocked

Issue: High LLM Costs

Symptoms: High costEstimateUSD values

Solutions:

  • Use cheaper models (gpt-4o-mini)
  • Enable caching
  • Reduce maxTokens
  • Switch to rule-based extraction

Issue: Browser Rendering Timeouts

Symptoms: Errors with render: "full"

Solutions:

  • Increase Actor timeout
  • Use "minimal" instead
  • Check if site loads locally
  • Consider HTTP-only approach

Issue: Low Cache Hit Rate

Symptoms: High cache misses, low hits

Solutions:

  • Increase cache TTL
  • Check if URLs have unique parameters
  • Enable deduplication
  • Use canonical URLs

Issue: Webhooks Not Delivered

Symptoms: No webhook received

Solutions:

  • Check webhook URL is accessible
  • Verify HMAC secret
  • Check for 429 responses
  • Review idempotency logs

Appendix E: FAQ

Q: Can I run this without Apify?

No, MCP Nexus is designed as an Apify Actor and relies on the Apify platform infrastructure.

Q: How much does it cost?

Costs include:

  • Apify compute units (approximately $0.25/hour, subject to change)
  • LLM API calls (provider-dependent, subject to change)
  • Apify Proxy (if used, approximately $0.50/GB, subject to change)

Q: Can I use my own LLM API keys?

Yes, store them as Apify secrets and reference via apiKeySecret.

Q: Is there a rate limit?

Limits depend on:

  • Your Apify plan
  • LLM provider limits
  • Target site restrictions

Q: Can I scrape any website?

You should:

  • Respect robots.txt
  • Follow Terms of Service
  • Comply with local laws
  • Use responsibly

Q: How do I debug failed runs?

Check:

  • Error messages in output
  • Circuit breaker trips
  • Budget violations
  • Tool parameters

Q: What's the maximum execution time?

Default: 60 seconds (configurable via maxDurationSec)

Appendix F: Changelog

See CHANGELOG.md for complete version history.

Latest Version: 2.0.0

Major features:

  • Multi-provider LLM support
  • HTTP caching with ETags
  • Circuit breakers
  • Browser rendering
  • DAG execution mode
  • Structured extraction
  • 9 specialized tools

Support & Resources

Documentation:

Community:

Commercial Support:

Support the Developer:


License & Support

License: This actor is proprietary software available on the Apify platform.

Support:

  • Issues & Questions: Contact via tuguidragos.com
  • Feature Requests: Reach out via website
  • Commercial Support: Available upon request

Built by Tugui Dragos Web: tuguidragos.com Support Development: Buy Me a Coffee


Last Updated: 2025-11-11