Pricing

Pay per event + usage

Knowledge Graph Causal Discovery MCP

Construct causal graphs from multi-domain data, apply do-calculus reasoning, and estimate causal effects via semiparametric methods -- all through a single MCP interface.

Pricing

Pay per event + usage

Rating

0.0

(0)

Developer

Ryan Clinton

Actor stats

Bookmarked

Total users

Monthly active users

2 months ago

Last modified

Knowledge Graph Causal Discovery MCP Server

Knowledge graph causal discovery over multi-domain research data, delivered through a single Model Context Protocol interface. This MCP server is built for researchers, data scientists, and AI agents that need to go beyond correlation — discovering directed causal structure, estimating treatment effects, and reasoning about counterfactuals from the published literature and public datasets.

The server orchestrates 17 Apify actors in parallel across five source domains — academic, biomedical, regulatory, economic, and safety — assembling the results into a unified causal knowledge graph. Eight specialized tools then apply rigorous causal inference algorithms: FCI skeleton learning, GES with BIC scoring, Pearl's do-calculus with the ID algorithm, twin network counterfactuals, TMLE estimation, RotatE knowledge graph embeddings, sheaf cohomology consistency checking, and Shapley source attribution. Every tool call returns structured JSON with mathematical scores and supporting evidence.

⬇️ What data can you access?

Data Point	Source	Coverage
📄 Academic papers and citations	OpenAlex, Semantic Scholar, Crossref	250M+ scholarly works with citation graphs
📑 Preprints and open access	arXiv, CORE	Physics, CS, quantitative biology, math
🧬 Biomedical literature	PubMed	36M+ citations with MeSH indexing
🏥 Clinical trials	ClinicalTrials.gov	450K+ registered studies with protocol data
💊 Drug adverse event reports	OpenFDA	FDA FAERS pharmacovigilance database
🔬 NIH research grants	NIH Reporter	Active and historical funded projects
📜 Federal regulations	Federal Register	US regulatory actions and proposed rules
🏛️ Congressional legislation	Congress.gov	Bills, resolutions, and amendments
🗂️ Government datasets	Data.gov	300K+ federal open data assets
📈 Economic time series	FRED	Federal Reserve GDP, inflation, employment
🌍 World development indicators	World Bank	200+ country development metrics
⚠️ Product recall notices	CPSC	Consumer product safety recall database
💬 Consumer complaints	CFPB	Financial protection complaint records
📖 Encyclopedia context	Wikipedia	Background knowledge and concept disambiguation

Why use Knowledge Graph Causal Discovery MCP?

Assembling a causal inference pipeline from scratch requires integrating a dozen data sources, implementing graph construction logic, and coding algorithms that span three decades of academic literature. A typical research team spending a week on this still ends up with a pipeline that covers two or three data domains at best.

This MCP server covers 17 data sources, applies 10 peer-reviewed causal algorithms, and returns structured results in seconds — directly inside Claude, Cursor, Windsurf, or any MCP-compatible AI client.

Always-live data — every tool call fetches fresh results from source APIs; no stale snapshots or cached indexes
Parallel execution — up to 17 actors run simultaneously per query, not sequentially, so response time scales with the slowest source rather than the sum
Standby mode — the server stays warm between calls, eliminating cold-start latency for interactive research sessions
Pay-per-call — no monthly subscription; each tool costs between $0.035 and $0.050, so a full 8-tool pipeline costs under $0.35
MCP-native — works in Claude Desktop, Cursor, Windsurf, Cline, and any client that speaks the Model Context Protocol

⬆️ MCP tools

Tool	Price	Algorithm	Best for
`discover_causal_structure`	$0.045	FCI + GES + additive noise model	Initial causal graph structure from observational data
`compute_interventional_effects`	$0.050	Pearl's do-calculus + ID algorithm + Balke-Pearl LP	Policy evaluation, treatment planning, intervention design
`simulate_counterfactuals`	$0.045	Twin network method + Tian-Pearl bounds	"What if" analysis, legal causation, necessity/sufficiency
`extract_causal_claims_literature`	$0.035	NLP pattern matching + evidence classification	Systematic reviews, evidence synthesis, claim auditing
`embed_causal_knowledge_graph`	$0.040	RotatE complex-valued embeddings	Link prediction, entity similarity, pathway discovery
`estimate_causal_effect_tmle`	$0.050	TMLE + Super Learner ensemble + influence function CI	Semiparametric ATE estimation with doubly-robust CI
`check_graph_consistency`	$0.035	Sheaf cohomology H¹(G,F)	Validating causal assumptions, identifiability checks
`attribute_source_contribution`	$0.040	Shapley values + nucleolus + core stability	Data source prioritization, budget allocation

Use cases for knowledge graph causal discovery

Drug safety signal detection

Pharmacovigilance teams combine PubMed biomedical literature, ClinicalTrials.gov outcome data, and FDA adverse event reports into a single causal graph. The discover_causal_structure tool identifies directed edges between compounds and adverse outcomes. The compute_interventional_effects tool estimates P(adverse event | do(prescribe drug)) using back-door adjustment on confounders sourced from NIH grant data and OpenAlex citations.

Policy impact assessment

Policy analysts estimate causal effects of regulatory interventions on economic outcomes by combining Federal Register rules, FRED economic time series, and World Bank development indicators. The estimate_causal_effect_tmle tool applies TMLE with Super Learner to produce doubly-robust average treatment effect estimates with 95% confidence intervals from the influence function — going beyond naive before/after comparison.

Systematic review and evidence synthesis

Literature reviewers use extract_causal_claims_literature to scan thousands of academic papers across OpenAlex, Semantic Scholar, Crossref, arXiv, and CORE simultaneously. Claims are classified by strength (strong/moderate/weak/correlational) and evidence level (RCT/observational/case study/review). Conflicting claims across sources are flagged automatically, replacing weeks of manual screening.

Counterfactual reasoning for legal and regulatory causation

Legal teams and regulators assessing causation in product liability or pharmaceutical harm cases use simulate_counterfactuals to compute the Probability of Necessity (PN = P(Y_x'=0 | X=x, Y=y)) and Probability of Sufficiency (PS = P(Y_x=1 | X=x', Y=0)) via the twin network method. Tian-Pearl monotonicity bounds are validated to constrain the counterfactual probabilities.

Knowledge graph completion in biomedical AI

AI research teams use embed_causal_knowledge_graph to generate RotatE complex-valued entity embeddings where relations are unit-modulus rotations in complex space (t = h · r, |r_i| = 1). MRR and Hits@10 link prediction metrics identify missing drug-disease or gene-pathway edges. Self-adversarial negative sampling with margin gamma ensures high-quality embeddings even in sparse graph regions.

Data acquisition prioritization

Research operations teams with limited budgets use attribute_source_contribution to calculate Shapley values for each data domain (academic, biomedical, regulatory, economic, safety). The Shapley allocation phi_i quantifies each source's marginal contribution to causal graph quality across all subsets. Nucleolus computation and core non-emptiness check confirm allocation stability before committing to data subscriptions.

How to connect this MCP server

Claude Desktop

Add to your claude_desktop_config.json:

{
  "mcpServers": {
    "knowledge-graph-causal-discovery": {
      "url": "https://knowledge-graph-causal-discovery-mcp.apify.actor/mcp",
      "headers": {
        "Authorization": "Bearer YOUR_APIFY_TOKEN"
      }
    }
  }
}

Cursor / Windsurf / Cline

Add the MCP endpoint in your editor's MCP settings panel:

Endpoint URL: https://knowledge-graph-causal-discovery-mcp.apify.actor/mcp
Authentication: Bearer token with your Apify API token

Python (MCP client)

import anthropic

client = anthropic.Anthropic()

# The MCP server exposes 8 tools — ask Claude to use them
response = client.messages.create(
    model="claude-opus-4-5",
    max_tokens=4096,
    tools=[{
        "type": "custom",
        "name": "discover_causal_structure",
        # Claude resolves this via the connected MCP server
    }],
    messages=[{
        "role": "user",
        "content": "Discover the causal structure linking smoking exposure to lung cancer outcomes using academic and biomedical sources."
    }],
    mcp_servers=[{
        "url": "https://knowledge-graph-causal-discovery-mcp.apify.actor/mcp",
        "authorization_token": "YOUR_APIFY_TOKEN"
    }]
)
print(response.content)

Direct cURL

# Discover causal structure
curl -X POST "https://knowledge-graph-causal-discovery-mcp.apify.actor/mcp" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_APIFY_TOKEN" \
  -d '{
    "jsonrpc": "2.0",
    "method": "tools/call",
    "params": {
      "name": "discover_causal_structure",
      "arguments": {
        "query": "smoking lung cancer mortality",
        "sources": ["academic", "biomedical"]
      }
    },
    "id": 1
  }'

# Estimate treatment effect via TMLE
curl -X POST "https://knowledge-graph-causal-discovery-mcp.apify.actor/mcp" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_APIFY_TOKEN" \
  -d '{
    "jsonrpc": "2.0",
    "method": "tools/call",
    "params": {
      "name": "estimate_causal_effect_tmle",
      "arguments": {
        "query": "statin therapy cardiovascular mortality reduction",
        "sources": ["academic", "biomedical", "regulatory"]
      }
    },
    "id": 2
  }'

Tool reference

`discover_causal_structure`

Discovers causal graph structure from observational data using three combined algorithms:

FCI (Fast Causal Inference) — constraint-based skeleton discovery via Kernel Conditional Independence (KCI) tests, tolerant of latent confounders. Builds the CPDAG (completed partially directed acyclic graph) including bidirectional edges for hidden common causes.
GES (Greedy Equivalence Search) — score-based refinement using BIC (Bayesian Information Criterion) to navigate Markov equivalence classes. BIC = log(likelihood) − (k/2) · log(N) where k is the number of free parameters.
Additive noise model — edge orientation via HSIC (Hilbert-Schmidt Independence Criterion) between residuals and cause. If HSIC(e, X) < HSIC(e, Y), the model orients X → Y.

Returns: directed and bidirectional edges, Markov equivalence class size, BIC score, p-values per edge.

Price: $0.045 per call. Calls up to 10 actors for academic + biomedical sources.

`compute_interventional_effects`

Computes P(Y | do(X)) — the distribution of Y under intervention on X — via Pearl's do-calculus:

Rule 1 — insertion/deletion of observations
Rule 2 — action/observation exchange
Rule 3 — insertion/deletion of actions
ID algorithm — systematic identifiability test for interventional queries in semi-Markovian models
Back-door criterion — adjustment for observed confounders
Front-door criterion — adjustment via mediating variables when confounders are unobserved
Balke-Pearl LP bounds — linear programming bounds for effects not identifiable by do-calculus, constraining via observable distributions

Returns: do-effects with adjustment sets, identifiability flags, LP bound intervals.

Price: $0.050 per call.

`simulate_counterfactuals`

Simulates counterfactual outcomes via the structural twin network method:

Constructs a factual world (X=x, Y=y observed) and a counterfactual world (X=x' intervened)
Both worlds share the same exogenous variables U (the twin network's key property)
Computes Probability of Necessity (PN): P(Y_{x'}=0 | X=x, Y=y)
Computes Probability of Sufficiency (PS): P(Y_x=1 | X=x', Y=0)
Validates Tian-Pearl monotonicity bounds: PN ≤ P(Y=y|X=x), PS ≤ P(Y=0|X=x')

Returns: PN and PS per outcome pair, twin network size, monotonicity check result.

Price: $0.045 per call.

`extract_causal_claims_literature`

Extracts and classifies causal claims from academic literature via NLP pattern matching:

Claim strength classification: strong / moderate / weak / correlational based on verb and hedge patterns
Evidence level classification: RCT / observational / case_study / review based on study design signals in titles and abstracts
Conflict detection: flags pairs of sources making opposing claims about the same cause-effect pair

Draws from OpenAlex, Semantic Scholar, Crossref, arXiv, CORE (academic), and PubMed, ClinicalTrials.gov, NIH Grants, OpenFDA (biomedical) depending on selected sources.

Returns: classified claim list with citations, counts by strength and evidence level, conflicting claim pairs.

Price: $0.035 per call.

`embed_causal_knowledge_graph`

Embeds the causal knowledge graph using RotatE, a complex-valued knowledge graph embedding model:

Relations are rotations in complex space: t = h · r where each component satisfies |r_i| = 1 (unit modulus constraint)
Scoring function: f(h, r, t) = −||h · r − t|| (L1 norm of the complex residual)
Self-adversarial negative sampling — samples negative triples with probability proportional to their current score, weighted by softmax temperature
Margin-based loss with margin gamma separating positive and negative triple scores
Cluster assignment via k-means over entity embedding norms

Returns: entity embeddings with norms and nearest neighbours, MRR (Mean Reciprocal Rank), Hits@10, cluster labels, phase range.

Price: $0.040 per call.

`estimate_causal_effect_tmle`

Estimates average treatment effects via TMLE (Targeted Maximum Likelihood Estimation) following the semiparametric efficiency pipeline:

Initial estimate Q⁰(A, W) via Super Learner ensemble (weighted cross-validated learner combination)
Propensity score g(A | W) with positivity truncation at [0.01, 0.99] to prevent near-deterministic treatment
Clever covariate H(A, W) = A/g(1|W) − (1−A)/g(0|W)
Targeting step — fit epsilon via MLE of logistic model indexed by H, updating Q⁰
Updated estimate Q*(A, W) = expit(logit(Q⁰) + epsilon · H)
ATE = E[Q*(1, W)] − E[Q*(0, W)] (plug-in estimator from targeted fit)
Influence function IC(O) for 95% Wald confidence interval: ATE ± 1.96 · SE(IC)

Returns: ATE per treatment-outcome pair, standard error, 95% CI, influence function norm, cross-validated risk, Super Learner weights.

Price: $0.050 per call.

`check_graph_consistency`

Checks causal graph consistency using sheaf cohomology over the graph structure:

Sheaf F on graph G assigns vector spaces F(v) to vertices and linear maps F(e) to edges
Coboundary operator: (δ₀s)(e) = F(e)(s(v)) − s(w) measures local section disagreement
H¹(G, F) = ker(δ₁) / im(δ₀) — first cohomology group dimension measures obstructions to global consistency
Separate checks: acyclicity (no directed cycles), faithfulness (no spurious independencies), causal sufficiency (no hidden common causes), instrument validity (exclusion restriction), positivity (treatment overlap), Markov compatibility (observed independencies match graph)

Returns: pass/fail per check with violation counts, H¹ cohomology dimension, global section existence flag.

Price: $0.035 per call.

`attribute_source_contribution`

Attributes each data source's contribution to causal graph quality via cooperative game theory:

Each data domain (academic, biomedical, regulatory, economic, safety) is a player in the coalition game
Value function v(S) = quality of causal graph (node count · edge density · mean edge weight) using only sources in coalition S
Shapley value phi_i = Σ_S [|S|!(n−|S|−1)!/n!] · [v(S ∪ {i}) − v(S)] — fair marginal contribution
Nucleolus — lexicographically minimises the maximum excess, finding the most stable payoff allocation
Core non-emptiness check — tests whether the Shapley allocation is stable against all coalitional deviations

Best used with all five source categories to get meaningful attribution across the full coalition space.

Returns: Shapley values per source, marginal contributions, nucleolus allocation, core stability flag.

Price: $0.040 per call.

Output examples

`discover_causal_structure` — smoking and lung cancer

{
  "nodeCount": 94,
  "edgeCount": 187,
  "relations": [
    {
      "cause": "Cigarette smoking and lung adenocarcinoma risk: a pooled analysis",
      "effect": "Lung cancer incidence in never-smokers vs. ever-smokers cohort",
      "edgeType": "causes",
      "strength": 0.74,
      "pValue": 0.003,
      "method": "FCI-KCI"
    },
    {
      "cause": "KRAS mutation frequency in tobacco-exposed lung tissue",
      "effect": "Non-small-cell lung carcinoma progression",
      "edgeType": "causes",
      "strength": 0.61,
      "pValue": 0.011,
      "method": "GES-BIC"
    },
    {
      "cause": "Secondhand smoke exposure biomarker cotinine",
      "effect": "Lung cancer incidence in never-smokers vs. ever-smokers cohort",
      "edgeType": "bidirectional",
      "strength": 0.43,
      "pValue": 0.048,
      "method": "additive-noise-HSIC"
    }
  ],
  "totalEdges": 187,
  "directedEdges": 141,
  "bidirectionalEdges": 46,
  "markovEquivalenceSize": 12,
  "bicScore": -4823.7
}

`estimate_causal_effect_tmle` — statin therapy and cardiovascular mortality

{
  "nodeCount": 112,
  "estimates": [
    {
      "treatment": "High-intensity statin therapy (atorvastatin 40-80mg)",
      "outcome": "Major adverse cardiovascular events at 5 years",
      "ate": -0.082,
      "standardError": 0.019,
      "confidenceInterval": [-0.119, -0.045],
      "influenceFunctionNorm": 0.041
    },
    {
      "treatment": "High-intensity statin therapy (atorvastatin 40-80mg)",
      "outcome": "All-cause mortality",
      "ate": -0.031,
      "standardError": 0.014,
      "confidenceInterval": [-0.058, -0.004],
      "influenceFunctionNorm": 0.028
    }
  ],
  "significantCount": 2,
  "averageATE": -0.056,
  "crossValidatedRisk": 0.113,
  "superLearnerWeights": {
    "logistic": 0.34,
    "randomForest": 0.41,
    "xgboost": 0.25
  }
}

`simulate_counterfactuals` — treatment necessity and sufficiency

{
  "nodeCount": 87,
  "outcomes": [
    {
      "factual": "Patient received antihypertensive therapy (X=1), experienced stroke (Y=1)",
      "counterfactual": "Patient did not receive antihypertensive therapy (X=0)",
      "factualValue": 1.0,
      "counterfactualValue": 0.0,
      "probabilityOfNecessity": 0.71,
      "probabilityOfSufficiency": 0.38
    }
  ],
  "twinNetworkSize": 174,
  "averagePN": 0.71,
  "averagePS": 0.38,
  "monotonicityHolds": true
}

`check_graph_consistency` — causal assumption validation

{
  "nodeCount": 94,
  "edgeCount": 187,
  "checks": [
    { "check": "acyclicity", "passed": true, "violationCount": 0, "details": "No directed cycles detected" },
    { "check": "faithfulness", "passed": true, "violationCount": 2, "details": "2 near-cancelling paths detected" },
    { "check": "causal_sufficiency", "passed": false, "violationCount": 7, "details": "7 bidirectional edges suggest latent confounders" },
    { "check": "instrument_validity", "passed": true, "violationCount": 0, "details": "NIH grant instruments satisfy exclusion restriction" },
    { "check": "positivity", "passed": true, "violationCount": 0, "details": "Propensity scores in [0.04, 0.96]" },
    { "check": "markov_compatibility", "passed": true, "violationCount": 1, "details": "1 d-separation violation" }
  ],
  "totalChecks": 6,
  "passedChecks": 5,
  "sheafCohomologyDim": 3,
  "globalSectionExists": false
}

How much does it cost to use the Knowledge Graph Causal Discovery MCP?

This MCP uses pay-per-event pricing — you are charged only when a tool is called. Platform compute costs are included.

Tool	Price per call	10 calls	50 calls
`discover_causal_structure`	$0.045	$0.45	$2.25
`compute_interventional_effects`	$0.050	$0.50	$2.50
`simulate_counterfactuals`	$0.045	$0.45	$2.25
`extract_causal_claims_literature`	$0.035	$0.35	$1.75
`embed_causal_knowledge_graph`	$0.040	$0.40	$2.00
`estimate_causal_effect_tmle`	$0.050	$0.50	$2.50
`check_graph_consistency`	$0.035	$0.35	$1.75
`attribute_source_contribution`	$0.040	$0.40	$2.00

Full 8-tool pipeline per query: $0.34. Running the complete causal discovery pipeline daily for a month costs approximately $10.

Apify's free plan includes $5 of monthly platform credits, which covers roughly 14 full-pipeline runs at no cost.

You can set a maximum spending limit per session in your Apify account to prevent unexpected charges. The MCP server stops charging and returns an error message if your event limit is reached.

How the Knowledge Graph Causal Discovery MCP works

Phase 1 — parallel data ingestion

When a tool is called, the server identifies which source categories are requested (academic, biomedical, regulatory, economic, safety) and constructs a call list of up to 17 actor invocations:

Academic (6 actors): OpenAlex (30 results), Semantic Scholar (30), Crossref (20), arXiv (20), CORE (20), Wikipedia (15)
Biomedical (4 actors): PubMed (30), ClinicalTrials.gov (20), NIH Grants (15), OpenFDA (20)
Regulatory (3 actors): Federal Register (20), Congress Bills (15), Data.gov (15)
Economic (2 actors): FRED (20), World Bank (15)
Safety (2 actors): CPSC Recalls (15), CFPB Complaints (15)

All actors run via Promise.all — parallel, not sequential. Each actor has a 180-second timeout. A failed actor returns an empty array rather than failing the entire request, ensuring partial results are always returned.

Phase 2 — causal graph construction

Results from all actors are merged into a typed causal graph (CausalGraph). Nodes are classified by domain signals:

Biomedical results containing "trial", "treatment", "therapy", or "drug" → intervention nodes
Other biomedical results → outcome nodes
Wikipedia articles → confounder nodes (background knowledge)
NIH grants → instrument nodes (funding as instrumental variable)
Clinical trial records → intervention nodes
Regulatory and economic results → confounder and variable nodes respectively

Edges are built from domain heuristics: interventions connect to outcomes with causal weights; confounders connect to both interventions and outcomes; instruments connect to their intervention targets. Variable-to-variable edges are oriented by the additive noise model: HSIC(residual, X) vs HSIC(residual, Y) determines direction.

Phase 3 — algorithm application

The requested algorithm is applied to the constructed graph:

FCI builds a skeleton from KCI tests, then runs orientation rules for v-structures and Meek's propagation rules, producing the CPDAG. GES refines via BIC-scored forward/backward/turning phases. The additive noise model resolves remaining unoriented edges via HSIC.
Do-calculus applies Rules 1-3 iteratively, testing back-door and front-door criteria against the graph topology. The ID algorithm determines identifiability. Balke-Pearl LP bounds are computed for non-identifiable effects.
Twin network duplicates the graph, wires shared exogenous nodes, then propagates structural equations through both copies to compute PN and PS.
TMLE initialises Q⁰ via Super Learner, estimates propensity scores with truncation, constructs the clever covariate H, fits epsilon via logistic regression, and computes ATE from the targeted Q*.
RotatE initialises entity embeddings, applies unit-modulus rotational updates via self-adversarial negative sampling, and reports MRR and Hits@10.
Sheaf cohomology constructs coboundary matrices δ₀ and δ₁ from the graph's incidence structure, computes ker(δ₁)/im(δ₀), and maps violations to specific causal assumption failures.
Shapley enumerates all 2^n subsets of the source coalition, computes graph quality for each, and applies the Shapley formula. Nucleolus is found via lexicographic minimax excess optimisation.

Phase 4 — structured response

Results are serialised to JSON and returned via the MCP protocol. Every response includes nodeCount and edgeCount from the constructed graph, plus the algorithm-specific metrics.

Tips for best results

Start with discover_causal_structure before interventional tools. The FCI/GES structure output tells you which adjustment sets are valid for do-calculus. Running compute_interventional_effects without knowing the graph structure risks incorrect confounder adjustment.
Use academic + biomedical sources as your baseline. These two categories trigger 10 actors and cover the densest evidence base. Add regulatory for policy questions, economic for macroeconomic analyses, and safety for product harm or financial misconduct queries.
For counterfactual and legal causation work, use check_graph_consistency first. The sheaf cohomology check confirms whether the graph satisfies causal sufficiency and instrument validity — two assumptions that simulate_counterfactuals relies on for valid PN/PS estimates.
Run attribute_source_contribution with all five sources to get meaningful Shapley values. With fewer than three sources, the coalition game has too few subsets to produce stable marginal contributions. The nucleolus calculation requires at least three active players.
For systematic reviews, extract_causal_claims_literature with academic + biomedical is the most cost-effective entry point at $0.035 per call. Use the returned conflicting claim pairs to identify which relationships need deeper structure discovery or TMLE estimation.
Phrase queries as domain-variable pairs for best graph construction: "smoking lung cancer" rather than "does smoking cause cancer?" The graph builder identifies causal nodes from result titles, and specific entity names produce cleaner node classification.
For rare or niche topics, add academic and include arXiv/CORE by selecting all academic sources — preprint servers often have earlier causal evidence than indexed journals for fast-moving research areas.
Combine tools in a pipeline for full causal analysis: discover_causal_structure → check_graph_consistency → compute_interventional_effects → estimate_causal_effect_tmle. Total pipeline cost: $0.18 per complete analysis.

Combine with other Apify MCP servers

MCP Server	How to combine
ryanclinton/market-microstructure-manipulation-mcp	Feed causal structure output into market microstructure analysis; Granger causality in that MCP complements Pearl-style do-calculus here
ryanclinton/litigation-intelligence-mcp	Use counterfactual PN/PS scores as inputs to pre-litigation risk scoring; necessary causation probability is a key legal standard
ryanclinton/open-source-supply-chain-risk-mcp	Use causal structure discovery to identify which OSS dependencies causally propagate vulnerabilities vs. correlate with them
ryanclinton/esg-risk-assessment-mcp	Combine regulatory causal graphs with ESG risk scoring to distinguish causal regulatory exposure from correlated industry effects
ryanclinton/drug-pipeline-intelligence-mcp	Feed TMLE treatment effect estimates into drug pipeline analysis to supplement trial data with observational causal evidence

Limitations

No primary data access. This server analyses published literature, trial registries, and government databases. It does not access raw patient-level data, proprietary biobank records, or paywalled journal content.
Graph construction uses heuristic node classification, not ground-truth ontology mapping. Node types (intervention, outcome, confounder) are inferred from title text patterns, which can misclassify ambiguous entities.
Causal algorithms operate on the constructed proxy graph, not on the original numeric data. The FCI, GES, TMLE, and other algorithms produce relative estimates calibrated to the graph structure rather than estimates from primary observations.
TMLE requires sufficient node density to produce meaningful Super Learner estimates. Queries returning fewer than 20 nodes may produce wide confidence intervals.
RotatE embeddings are initialised fresh per call — there is no persistent knowledge graph that improves over time with repeated queries. Embedding quality scales with node count; sparse graphs produce lower MRR.
Sheaf cohomology results are sensitive to bidirectional edge prevalence. Graphs with many hidden-confounder edges (common in observational literature) will show positive H¹ dimension even for well-studied domains.
Source availability is not guaranteed. All 17 upstream actors call live public APIs. Outages, rate limiting, or temporary API changes at any source return empty arrays rather than errors, which reduces graph density but does not fail the request.
Regulatory and economic sources are US-centric. The Federal Register, Congress Bills, FRED, and CPSC cover US institutions. For international regulatory causal analysis, rely on academic and biomedical sources which have global coverage.

Integrations

Apify API — call the MCP server programmatically from Python, JavaScript, or any HTTP client using the Apify Actor API
Webhooks — trigger downstream workflows (Slack alerts, database writes, report generation) when a causal analysis completes
Zapier — connect causal discovery results to Google Sheets, HubSpot, Notion, or any of Zapier's 6,000+ apps without code
Make — build multi-step automation scenarios that chain causal discovery with data enrichment, notifications, and CRM updates
LangChain / LlamaIndex — use the MCP server as a causal reasoning tool within RAG pipelines and autonomous agent frameworks

❓ FAQ

How many data sources does a single causal discovery query touch? Up to 17 actors run in parallel depending on which source categories you select. The academic category triggers 6 actors (OpenAlex, Semantic Scholar, Crossref, arXiv, CORE, Wikipedia). biomedical triggers 4 (PubMed, ClinicalTrials.gov, NIH Grants, OpenFDA). regulatory triggers 3, economic 2, and safety 2. Selecting all five categories runs all 17 actors simultaneously.

How is this different from a standard literature review tool or RAG pipeline? Standard literature review tools return ranked documents. This server constructs a typed causal graph from those documents and applies formal causal inference algorithms — FCI, do-calculus, twin networks, TMLE — to extract directional causal relationships, not just associations. The output is mathematical causal structure, not retrieved text.

How fresh is the data returned? All data is fetched live at query time from each source API. There is no cached index. Results reflect the current state of OpenAlex, PubMed, FRED, and the other databases at the moment of the call.

Can I use only one or two source categories to reduce cost? Yes. Every tool accepts a sources array with any combination of academic, biomedical, regulatory, economic, and safety. Using only academic + biomedical is sufficient for most research questions and is the default for all tools except attribute_source_contribution.

What does a Shapley value of 0.4 for biomedical sources mean? It means biomedical data sources (PubMed, ClinicalTrials.gov, NIH Grants, OpenFDA) contribute 40% of the total causal graph quality, measured as the average marginal contribution of that data domain across all possible subsets of the five source categories.

Is it legal to use the data from these sources? All 17 sources are publicly available APIs and open government databases. PubMed, ClinicalTrials.gov, FDA, FRED, World Bank, and the others are free public resources. See Apify's guide on web scraping legality.

Can this replace a randomised controlled trial? No. TMLE and do-calculus provide observational causal inference, which relies on assumptions (no unmeasured confounding, positivity, consistency) that are untestable from data alone. The tools identify causal hypotheses and estimate effect sizes from observational evidence — they do not generate experimental evidence. The check_graph_consistency tool explicitly flags violations of causal sufficiency and other key assumptions.

How long does a typical tool call take? Most tool calls complete in 20–60 seconds. Time depends on source category selection — academic + biomedical (10 actors) typically takes 25–45 seconds; all five categories (17 actors) may take 45–90 seconds. Actor timeouts are set to 180 seconds per source.

Can I use this with a custom MCP client or agent framework? Yes. The server implements the standard MCP protocol at /mcp. Any client that supports MCP — including Cursor, Windsurf, Cline, custom Python MCP clients, or LangChain agent frameworks — can connect to https://knowledge-graph-causal-discovery-mcp.apify.actor/mcp.

What happens if a data source is temporarily unavailable? Individual actor failures return empty arrays rather than propagating errors. The graph is built from available sources, and the causal algorithm runs on the reduced graph. The response always includes nodeCount and edgeCount so you can verify graph density and re-run with different sources if needed.

Can I run structure discovery and TMLE estimation on the same query to cross-validate results? Yes, and this is the recommended workflow for high-stakes analyses. discover_causal_structure identifies the graph topology and adjustment sets. estimate_causal_effect_tmle uses that topology to select valid confounders for the Super Learner and propensity model. Running both costs $0.095.

Does the server support streaming responses for long-running queries? The server uses the Streamable HTTP transport from the MCP SDK, which supports streaming. MCP clients that implement streaming (including Claude Desktop) will receive incremental updates during long-running actor calls.

Help us improve

If you encounter unexpected results or errors, enable run sharing so we can diagnose issues faster:

Go to Account Settings > Privacy
Enable Share runs with public Actor creators

This lets us see your run details when something goes wrong. Your data is visible only to the actor developer, not publicly.

Support

Found a bug or need a feature? Open an issue in the Issues tab on this actor's page. For custom causal inference configurations, domain-specific ontology integration, or enterprise deployments, reach out through the Apify platform.

Causal Panopticon MCP Server

ryanclinton/causal-panopticon-mcp

MCP intelligence server for causal panopticon detection and analysis.

Ryan Clinton

Google Knowledge Graph

seemuapps/google-knowledge-graph

Enrich a list of entity names (people, companies, places, things) with metadata from the Google Knowledge Graph.

Andrew

Real Estate MCP Server Multi-Platform Property Data for AI

alizarin_refrigerator-owner/real-estate-mcp-server

MCP Server providing AI assistants with unified access to 4+ real estate platforms through a single interface. Search properties, analyze market trends, compare listings, and generate investment reports across Zillow, Redfin, Realtor.com, and MLS.

The Howlers

MCP Server Directory Scraper: MCP Discovery Export

lovely_sequoia/mcp-directory-scraper

Aggregate public MCP server, ChatGPT app, and Claude connector listings from Glama.ai, PulseMCP, mcp.so, and MCP App Store into one deduplicated dataset.

Kris Jensen

Forage — MCP Server for AI Agents

ernesta_labs/forage

MCP server giving AI agents real-time web search, page scraping, company intelligence, email discovery, local lead generation, and a persistent knowledge graph. Pay only for what you use, no subscriptions.

Riccardo Minniti

5.0

B2B Research MCP Server Company Intel & Sales Research for AI

alizarin_refrigerator-owner/b2b-research-mcp-server

MCP Server providing AI assistants w/unified access to B2B company research through a single interface. Analyze tech stacks, scrape LinkedIn profiles, get Glassdoor reviews, research Crunchbase data, and enrich company contacts.

The Howlers

CogniGraph Weaver

monumental_wardrobe/cognigraph-weaver

A powerful Apify Actor that converts web content into interactive knowledge graphs using artificial intelligence. This Python-based web crawler and AI system extracts content from websites, analyzes it with LLMs, and generates comprehensive knowledge graphs with learning paths.

Enrique Meza

AI Content MCP Server

alizarin_refrigerator-owner/ai-content-mcp-server

AI-ready content generation with 6 specialized tools. Generate text with Claude/Gemini, create images with Imagen/DALL-E, extract structured data from any URL, and enrich leads with AI - all through a unified MCP interface.

The Howlers

Memory MCP Server

constant_quadruped/memory-mcp-server

Persistent memory for AI agents via knowledge graph. Store entities, relations, and observations that persist across sessions. MCP-compatible.

Industry Research MCP Server for AI Assistants

alizarin_refrigerator-owner/industry-research-mcp-server

MCP Server providing AI assistants with unified access to industry research, market trends, competitor analysis, and business intelligence through a single interface. SEO news aggregation, Quora insights, Crunchbase data, review platform aggregation, and Reddit discussions.

The Howlers

Knowledge Graph Causal Discovery MCP

Knowledge Graph Causal Discovery MCP Server

⬇️ What data can you access?

Why use Knowledge Graph Causal Discovery MCP?

⬆️ MCP tools

Use cases for knowledge graph causal discovery

Drug safety signal detection

Policy impact assessment

Systematic review and evidence synthesis

Counterfactual reasoning for legal and regulatory causation

Knowledge graph completion in biomedical AI

Data acquisition prioritization

How to connect this MCP server

Claude Desktop

Cursor / Windsurf / Cline

Python (MCP client)

Direct cURL

Tool reference

discover_causal_structure

compute_interventional_effects

simulate_counterfactuals

extract_causal_claims_literature

embed_causal_knowledge_graph

estimate_causal_effect_tmle

check_graph_consistency

attribute_source_contribution

Output examples

discover_causal_structure — smoking and lung cancer

estimate_causal_effect_tmle — statin therapy and cardiovascular mortality

simulate_counterfactuals — treatment necessity and sufficiency

check_graph_consistency — causal assumption validation

How much does it cost to use the Knowledge Graph Causal Discovery MCP?

How the Knowledge Graph Causal Discovery MCP works

Phase 1 — parallel data ingestion

Phase 2 — causal graph construction

Phase 3 — algorithm application

Phase 4 — structured response

Tips for best results

Combine with other Apify MCP servers

Limitations

Integrations

❓ FAQ

Help us improve

Support

You might also like

Causal Panopticon MCP Server

Google Knowledge Graph

Real Estate MCP Server Multi-Platform Property Data for AI

MCP Server Directory Scraper: MCP Discovery Export

Forage — MCP Server for AI Agents

B2B Research MCP Server Company Intel & Sales Research for AI

CogniGraph Weaver

AI Content MCP Server

Memory MCP Server

Industry Research MCP Server for AI Assistants

Related articles

`discover_causal_structure`

`compute_interventional_effects`

`simulate_counterfactuals`

`extract_causal_claims_literature`

`embed_causal_knowledge_graph`

`estimate_causal_effect_tmle`

`check_graph_consistency`

`attribute_source_contribution`

`discover_causal_structure` — smoking and lung cancer

`estimate_causal_effect_tmle` — statin therapy and cardiovascular mortality

`simulate_counterfactuals` — treatment necessity and sufficiency

`check_graph_consistency` — causal assumption validation