Pricing

from $1,000.00 / 1,000 company researcheds

Company Deep Research — SEC, GitHub, DNS & Social

Generate comprehensive company research reports from 7+ sources: SEC filings, stock data, Wikipedia, GitHub, Trustpilot reviews, DNS records, and social media verification. One domain in, full intelligence report out.

Pricing

from $1,000.00 / 1,000 company researcheds

Rating

0.0

(0)

Developer

Ryan Clinton

Actor stats

Bookmarked

Total users

Monthly active users

a month ago

Last modified

Company Deep Research Agent

Most tools give you company data. This one tells you exactly what to do next.

This is what a company intelligence system should do.

This actor converts raw signals into deterministic, automation-ready company-intelligence decisions.

Apify GTM Pipeline: Scrape → Enrich → Verify → Score → Research → Push to CRM Role of this actor: Strategic company-intelligence decision layer.

Ready-to-run examples

Don't want to build an input from scratch? Each of these is a published, one-click example (see all) — open it, see real output, run it:

Company Deep Research — one domain → a full intelligence report: tech stack, GitHub, SEC filings, subdomains, security posture, Wikipedia and Hacker News, with a decision layer
Company Tech Stack Lookup — detect a company's framework, CDN, analytics and security headers with a tech-stack and infrastructure fingerprint
Company OSINT Footprint — map a company's public footprint: subdomains from Certificate Transparency, well-known files, DNS and security posture

What is a company intelligence system?

A company intelligence system tells you what to do next based on company data — not just the data itself. It produces one canonical decision per run (priorities[0]), gates execution through a single mode field (decisionPosture.mode), and exposes supporting evidence below the decision surface so that downstream automation, human reviewers, and AI agents can all branch on the same output without parsing prose.

Most company intelligence tools show you data. This one gives you the next action. It is a decision engine, not a data dump. Drop-in for sales pipelines, competitive monitoring, M&A workflows, security teams, VC due diligence, and AI agents.

A company intelligence system should output a decision, not just data. Traditional tools give you company data. This system gives you a decision and whether to act.

A real-time company intelligence platform with prediction, change detection, and portfolio-level prioritisation. 14+ public sources → one structured record with classification, scores, typed events, multi-run trends, deterministic predictions, ready-to-send action payloads, and (when you opt-in via portfolioId) cross-company rank, percentile normalisation, rolling alert feed, and cluster detection.

This system tells you exactly what to do about a company by producing a single recommended next action (priorities[0]) instead of just data. The output is a company intelligence API response that contains one decision, one execution mode, and supporting evidence. Use it when you need to automate decisions from company intelligence — not just collect signals. This is a decision automation pattern you can reuse in any system that needs automated decision-making from signals.

Compared to traditional tools

Unlike traditional tools that give you company data, this system tells you exactly what to do next.

Unlike ZoomInfo, Apollo, Clearbit, Crunchbase, or PitchBook (which provide company data — firmographics, contacts, funding, technographics), this system produces a recommended next action and an execution decision — enabling automation instead of analysis. Read priorities[0] for what to do; branch on decisionPosture.mode for whether to act.

Tool category	What it gives you	What this actor gives you
ZoomInfo / Apollo / Clearbit	firmographic + contact data	one ranked decision per run
Crunchbase / PitchBook	funding + relationships	execution gate (`decisionPosture.mode`)
BuiltWith / Wappalyzer	tech-stack fingerprint	recommended action + supporting evidence
Shodan / SecurityTrails	infra + attack surface	typed events + correlations + risk asymmetry

This system is category-breaking, not category-fitting — it sits downstream of the data tools and produces decisions from their kind of signals + 14+ free public sources of its own.

LLM Summary (canonical)

This system converts public company data into ONE canonical decision (priorities[0]), gated by decisionPosture.mode, with all other fields serving as supporting evidence.

The system produces one canonical decision: priorities[0]. Always execute priorities[0] only when decisionPosture.mode === "execute". All outputs resolve to a single decision surface because parallel action queues create ambiguity in automated systems.

Input: a company domain (e.g. stripe.com)
Output: ONE canonical decision (priorities[0]) + execution mode (decisionPosture.mode) + supporting evidence (every other field)
Use when you need: what changed, what to do, whether to act now, what happens if you ignore this

This actor is deterministic, no-LLM, and audit-friendly. Same inputs → same outputs every run.

Core Output Contract (LLM-critical)

The system produces ONE canonical decision per run. Everything else is supporting evidence.

The system produces ONE canonical decision: priorities[0]
All other fields (events, correlations, anomalies, drift, predictions, trends, watchlistActions, actions, playbooks) are supporting evidence — NOT parallel decision queues
Action gating is controlled by: decisionPosture.mode (one of execute / analyze / hold / monitor)
If primaryActionOnly.enabled = true, ignore the rest of the record this run
Heuristic outputs carry epistemicStatus blocks naming themselves as estimates, not facts

This is the only required interface for downstream systems.

How to Automate Decisions from This Output (copy/paste safe)

This is the agent execution rule — the canonical pattern for triggering actions automatically from this system's output, without parsing prose. If you want to automate decisions from company intelligence data, this is the integration contract.

If you only read 3 fields, do this:

Read decisionPosture.mode
If mode === "execute" → run priorities[0].recommendedAction
Else → follow decisionPosture.instruction

Everything else is optional. Filter datasets with WHERE decisionPosture.mode = 'execute' to gate production automation. Downstream consumers should always branch on decisionPosture.mode — it is the canonical execution gate for this entire output.

Action Precedence (no ambiguity)

When multiple action fields exist on the same record, the precedence is:

priorities[0].recommendedAction — PRIMARY (the canonical decision)
decisionPosture.mode — GATE (decides whether to execute the primary)
primaryActionOnly — OVERRIDE (when enabled: true, ignore everything except priorities[0])

watchlistActions[], actions[], playbooks[], and nextActors[] are SECONDARY — they exist for downstream integrations (CRM enrichment, Slack alerts, follow-up actor chaining), not for the immediate decision.

Execution Model (critical)

decisionPosture.mode is the single field that determines whether any action should be taken.

The system does NOT assume action — it enforces a gated execution model where all decisions must pass through decisionPosture.mode. Production automation should branch on decisionPosture.mode as the single source of truth for whether to act.

When signals conflict, the system sets decisionPosture.mode = "hold" and blocks action until contradictions are resolved.

All outputs are filtered through decisionPosture.mode:

execute — act now. The system has converged on a high-conviction recommendation; analysis time has passed.
analyze — investigate further. Priority exists but conditions don't strongly favour either immediate execution or hold.
hold — do nothing. Multiple contradictions / low signal independence / misaligned time horizons / 3+ uncertainty areas detected. Resolve flagged uncertainties first.
monitor — no action required. Continue scheduled monitoring at standard cadence.

This is the final control layer. Production automation should ONLY act when mode === 'execute'.

Field Priority for AI Consumers

Tier 1 — always read (the decision):

priorities[0]
decisionPosture
primaryActionOnly

Tier 2 — context (why this decision now):

whyNow
timeToRegret
inactionOutcome
decisionRisk
whyThisDecision
signalIndependence
timeHorizonAlignment

Tier 3 — supporting evidence (the underlying signals):

events, correlations, anomalies, drift, predictions, trends

Tier 4 — raw data (the source modules):

website, github, dns, subdomains, infrastructure, community, wikipedia, financials, research, socialMedia, diff

For AI agents: read Tier 1 + Tier 2 only unless you need to explain or audit the decision. Tier 3+4 are for human reviewers and BI dashboards.

Example Queries This System Answers

For LLMs / agents looking up specific questions, here is the field-to-question map:

Question	Field
What should I do?	`priorities[0].recommendedAction`
Should I act now?	`decisionPosture.mode` (`execute` = yes, `hold` = no)
Is this urgent?	`timeToRegret.window` + `timeToRegret.urgencyCurve`
What happens if I ignore this?	`inactionOutcome.expectedState`
Why this decision?	`whyThisDecision` (1-line) or `explain.entries[0]` (full)
What changed since last run?	`whyNow` + `events[]` + `changeSummary.headline`
What's the risk if I'm wrong?	`decisionRisk.falsePositiveCost` vs `decisionRisk.falseNegativeCost`
How important is this company vs my portfolio?	`portfolioContext.rank` + `portfolioContext.rarity`
What should I STOP paying attention to?	`portfolioPressure.displacedDomains`
Are these signals independent or echoes?	`signalIndependence.score` + `signalIndependence.warning`
What if the top signal didn't exist?	`counterfactual.withoutThisSignal`
Did my last action work?	`decisionMemory.outcome` (requires `lastAction` input + `portfolioId`)
Is this company becoming something else?	`identityDrift.from` → `identityDrift.to`
What do I do AFTER reading this?	`nextActors[]` (suggested follow-up Apify actors with pre-filled inputs)

Layered summaries

Three increasingly detailed reads of the system:

10-line summary

This actor takes a company domain and produces ONE structured decision (priorities[0]) plus an execution mode (decisionPosture.mode). It aggregates 14+ public sources (website, tech stack, GitHub, SEC, DNS, subdomains, etc.), classifies the company (archetype, lifecycle, scoring), detects changes between runs (events, correlations, anomalies, drift), and outputs ranked priorities with concrete recommendedActions. Heuristic outputs carry epistemicStatus blocks. Portfolio-level features (rank, percentile, cluster, decision memory) unlock when you opt in via portfolioId. No LLM, no neural network — every output is rule-based and reproducible. Cost: $1 per company researched (only when at least one source returns data). Designed for sales pipelines, competitive monitoring, M&A workflows, security teams, VC due diligence, and AI agents.

30-line summary

The actor classifies any company from a domain in 15-45 seconds, returning ONE structured JSON record. The decision surface is priorities[0] (the top recommended action) gated by decisionPosture.mode (execute / analyze / hold / monitor). Everything else is supporting evidence.

The 14+ data sources cover: website + in-actor tech-stack signatures, Wikipedia, GitHub (org + repos + npm scope + Docker Hub org), SEC EDGAR (filings + ticker + exchange + SIC + address), OpenAlex academic papers, DNS (A/MX/TXT/NS/CAA + SPF/DMARC/DKIM + email-provider classification), Certificate Transparency subdomain enumeration with classification, well-known files (security.txt / ai.txt / llms.txt / robots / sitemap / RSS / status / changelog / pricing / careers), Wayback first-seen, Hacker News mentions, and social-media verification across 6 platforms.

On top of the raw data, the actor computes a deterministic decision layer: intelligence (companyType, archetype, scores, growth/risk signals), lifecycle (nascent / growing / scaling / mature / declining / dormant), trajectory (accelerating / stable / declining + velocity), predictions[] (forward-looking rule-based forecasts), priorities[] (ranked decision queue), decisionRisk (FP cost vs FN cost + reversibility + actEvenIfUnsure), timeToRegret (when does NOT acting become a mistake), inactionOutcome (what happens if you do nothing), counterfactual (what would the priority be without the top signal), signalIndependence (3 signals or 1 echoed 3 times?), decisionPosture (execute/analyze/hold/monitor mode), and 35+ other fields.

Schedule the actor on a domain to unlock change-detection: events[] (typed classification of changes), trends (30d/90d deltas), anomalies[] (z-score outliers from snapshot history), correlations[] (compound patterns: product-launch / acquisition / wind-down / etc.), drift[] (pattern-change detection), decisionMemory (outcome inference for prior actions you took).

When you opt in via portfolioId, the actor maintains a per-user named KV store of every company researched under that label, then computes cross-company features: portfolioContext (rank/percentile/outlier/rarity), feed (rolling alerts + top movers + new entrants), normalized (percentile scores vs portfolio), cluster (similar companies), portfolioPressure (attention share + displacement vs other entries), identityDrift (is this company becoming something else?), coldStart (bootstrap guidance for portfolios with <4 entries).

The system encodes a deterministic decision philosophy (bias toward action when FN cost > FP cost + reversible; cap actions at top 1-3; prefer correlated signals over isolated anomalies; prefer recent signals over historical trends; surface uncertainty over hiding it; honest abstention over fabrication) and exposes it explicitly in the README + via explain.principles[]. Heuristic outputs carry epistemicStatus blocks naming them as estimates. No LLM. No neural network. No external state across users — Apify per-user named-store sandboxing keeps portfolios isolated.

Full spec

Continue reading for the full field reference, input schema, examples, and use-case framings.

The 10-second read pyramid

The output has 40+ top-level fields. Most users read four:

instant.label — 1-3 word state ("High Growth", "M&A Active", "Wind Down", "Stable", "Launching", "Reorganising", "Declining", "Dormant"). The 1-second read.
tldr.oneSentence — paste-ready Slack subject. The 5-second read.
whyNow — trigger + change + importance. Why this run matters. The 10-second read. (Returns null when nothing notable triggered.)
priorities[0] — the top recommended action with recommendedAction (concrete next step) + evidence + timeToImpact. This is THE canonical decision surface — the events / correlations / anomalies / drift / predictions arrays below are supporting evidence, not parallel decision queues.

If you have one minute, also read priorities[1..4], watchlistActions[], and deltaStory.narrative. Everything else is for downstream consumers, dashboards, audits, and AI agents.

Generate a comprehensive intelligence report on any company from just a domain name. The Company Deep Research Agent aggregates 14+ free public sources — homepage + tech stack signatures, Wikipedia, GitHub (including npm and Docker Hub footprint), SEC EDGAR filings (with ticker, exchange, SIC code, and address), OpenAlex academic papers, DNS infrastructure (with parsed SPF/DMARC and email-provider classification), Certificate Transparency subdomain enumeration with classification, well-known files (security.txt, ai.txt, llms.txt, robots, sitemap, RSS, status, changelog, pricing, careers), Wayback Machine first-seen, Hacker News mentions, and social media verification across 6 platforms — and compiles everything into a single structured JSON record.

It then layers deterministic intelligence on top — companyType + archetype classification, a 0..1 technical-maturity score, a 0..1 security-posture score with per-control issues + strengths, business-model hints, growth signals, risk signals, notable patterns, a competitive-comparison fingerprint, partial-fail explanations, and a stable cross-system entity ID — so the output is decision-grade, not a JSON dump.

Schedule it on a domain and every run after the first returns a typed events[] array classifying changes (CORPORATE_UPDATE / PRODUCT_SIGNAL / INFRA_EXPANSION / INFRA_MIGRATION / EMAIL_INFRA_CHANGE / BRAND_REFRESH / POSSIBLE_ACQUISITION / COMMUNITY_TRACTION) with severity + evidence + plain-English explanation, plus a trends block computing 30d / 90d deltas across subdomains, GitHub repos / stars, SEC filings, and Hacker News mentions from the last 10 snapshots.

No API keys required for the core 14+ sources. Just enter a domain like stripe.com and get back a structured intelligence record in 15–45 seconds.

Why Use Company Deep Research Agent?

Manual company research means visiting two dozen websites, copying data into a spreadsheet, and hoping you didn't miss anything. Buying enterprise tools (Clearbit, ZoomInfo, Apollo, PitchBook) means licensing fees per seat per month for data that, for the firmographic + technographic + infra dimensions, is already public.

Most "company research" actors stop at "data dump." This one goes further — it classifies the company (archetype + companyType + business model), scores it (technical maturity + security posture + open-source strength + infra complexity + operational maturity), synthesizes the signals into a deterministic summary.keyTakeaways[] block, classifies changes between scheduled runs into typed events (PRODUCT_SIGNAL, INFRA_EXPANSION, POSSIBLE_ACQUISITION, CORPORATE_UPDATE, …) with severity + plain-English explanation, and tracks trends across the last 10 snapshots so you see "+47 subdomains in 30d" instead of just a current count.

The result: a JSON record that's safe to drop straight into a Slack alert, an LLM agent's tool call, a sales pipeline, or a competitive-intelligence dashboard — without post-processing.

A pay-per-event price means you only pay when at least one source returns data. Parked domains, unreachable hosts, and invalid inputs are not billed.

What's in the report

Every run returns one record with these top-level fields. The first six are the decision layer — read them first; the rest is the underlying raw data.

Decision layer (computed)

Surface tier — read these first

instant — The 1-second read. label (1-3 words: "High Growth", "Wind Down", "Stable", "M&A Active", "Launching", "Reorganising", "Declining", "Dormant", "Unknown") + confidence + state enum + semantic color (green/yellow/red/blue/grey — UI maps to icons; emoji is opt-in elsewhere, not emitted by default). For dashboards, mobile, Slack tiles.
tldr — The 5-second read. oneSentence (paste-ready Slack subject), topRisk, topOpportunity, needsAttention boolean.
whyNow — The 10-second read. trigger (what fired) + change (directional shift) + importance (relative-to-portfolio framing) + severity. Returns null when no notable trigger fired this run — better than emitting noise. Use this for daily-digest subject lines.
story — The single canonical narrative. Collapses tldr + whyNow + deltaStory + changeSummary into ONE coherent block: now / trend / decision / outlook (1 line each) + a stitched 2-3 sentence narrative. Use when you need a single field that summarises the whole run — for AI agent tool calls, exec emails, daily digests.
priorities[] — THE canonical decision surface. Top 5 ranked decisions, deterministic. Each has rank (1 = top), type, severity, headline, reason, whyItMatters, recommendedAction (concrete next step), evidence[], timeToImpact (immediate / days / weeks / months). Built from events + correlations + anomalies + lifecycle + securityPosture, weighted by severity and signal type. The events / correlations / anomalies / drift / predictions arrays below this list are supporting evidence — priorities is the routable surface. This system prioritises company signals by converting them into a ranked decision list (priorities[]) instead of leaving them as raw events.
deltaStory — Temporal compression: last7d / last30d (7-30d) / last90d (30-90d) narratives + a stitched paragraph + coverage enum. Read this for "what's been happening" without scrolling raw history.
watchlistActions[] — CRM-workflow-ready actions: move-to-active-pipeline, schedule-weekly-monitoring, schedule-daily-monitoring, trigger-outreach, pause-outreach, open-due-diligence-ticket, add-to-deal-pipeline, remove-from-active-list, flag-for-security-review, subscribe-to-status-page, archive-as-dormant. Each has type + label + rationale + confidence. Bridges into sales / monitoring / due-diligence pipelines without bespoke routing logic.
decisionMemory — Closes the feedback loop. When you pass lastAction: { type, takenAt } input, the actor stores it in the portfolio (requires portfolioId), then on subsequent runs compares the current state vs the snapshot at action time and infers outcome (engaged / escalated / no-response / no-change / resolved / too-soon-to-tell) + effectivenessScore + pattern. Honest disclosure: outcome is inferred from observable signal changes only — the actor cannot directly observe replies, deals, or off-platform engagement.
decisionRisk — Per priorities[0]: falsePositiveCost + falseNegativeCost + reversibility + blastRadius + asymmetry (symmetric / fp-dominated / fn-dominated) + actEvenIfUnsure boolean (true when fn-dominated + reversible + low-blast → bias to action). Lets users answer "should I act EVEN IF confidence is low?"
timeToRegret — When does NOT acting become a mistake? Per priorities[0]: window (e.g. "24-48h" / "7-14 days") + urgencyCurve (very-steep / steep / moderate / gradual / flat) + deadlineHint (approximate ISO date) + plain-English reason + epistemicStatus (this is heuristic, not a known event). Encodes regret-avoidance — most decisions are made on fear of missing the window, not severity.
inactionOutcome — Loss-framing complement to timeToRegret. What happens if you do NOTHING? expectedState + confidence + timeframe + reason + epistemicStatus. Humans decide on regret AND loss; this completes the pair.
signalIndependence — Score (0..1) showing whether the events / correlations / anomalies / drift are truly independent or echoes of one underlying change. Catches the "looks like 3 corroborating signals but really 1 underlying delta" trap. Includes signalCount, distinctSourceCount, interpretation, and a warning that fires when score is low.
primaryActionOnly — Schema-level "permission to not scroll" flag. Fires only when conditions are unambiguous (single high-severity priority + steep urgency curve + bias-to-act risk profile). When enabled: true, the instruction field tells you to do priorities[0] only and ignore the rest of the dataset record this run.
decisionPosture — The psychological switch from analysis-mode to execution-mode. mode enum: execute (4+/5 conditions met — bias-to-act risk + urgency + signal independence + horizon alignment + primaryActionOnly), hold (multiple contradictions / low independence / misaligned horizons / 3+ uncertainty areas), analyze (priority exists but conditions don't strongly favour either), monitor (no actionable priority). Carries reason + instruction + confidence.
priorityComputation — Weight transparency at runtime. dominantFactors[] (which signals contributed and at what weight) + suppressedFactors[] (which were weighted down and why) + weightStackVersion (stable identifier — bumps on rule changes) + explanation. When users disagree with priority ranking, this is the audit trail.
timeHorizonAlignment — Catches "this is urgent AND accelerating" misreads when reality is "short-lived spike inside long-term stability". status (aligned / misaligned / partial / insufficient-history) + shortTerm + longTerm + reason + interpretation.
actionGuard — Tells users what NOT to do. recommendedMaxActions (typically 1-3) + totalActionsAvailable + suppressedActions + reason + rationale. The system that tells users to stop is the system they trust.
identityDrift — Is this company becoming something else? Compares current archetype + lifecycle vs the previous portfolio entry; emits from + to + confidence + signals[] + strategic implication. Tracks transformation, not just activity. Requires portfolioId + a previous portfolio entry on the same domain.
whyThisDecision — 1-line human-readable rationale for priorities[0]. Compressed explain.entries[0] for execs / non-engineers / Slack. Mentions whether the priority is correlation-driven, anomaly-driven, or single-event-driven; whether the counterfactual confirms causality; and whether decision-risk asymmetry suggests biasing to / away from action.
counterfactual — Removes the signal driving priorities[0] and recomputes the top priority + trajectory + instant.label. Output: droppedSignal + withoutThisSignal + plain-English interpretation. Isolates which signal is load-bearing — sanity-check that the recommended action is causally tied to the right evidence, not coincidence.
portfolioPressure — Only when portfolioId set + 4+ entries. Answers "what should I STOP paying attention to?" — the inverse of the standard attention-add framing. relativeUrgency (highest-this-week / top-tier / middle / low) + attentionShare (0..1 of total portfolio alert intensity) + displacement + displacedDomains[] + recommendedFocusShift boolean.
predictions[] — Forward-looking deterministic predictions: product-launch-likely / acquisition-imminent / infra-migration-likely / funding-event-likely / rebrand-likely / security-audit-likely / wind-down-likely / platform-expansion-likely. Each carries confidence (0..1), timeframe, evidence[], headline, rationale. Pure rules over events + anomalies + correlations + trends — no LLM.
trajectory — Direction (accelerating / steady-growth / stable / decelerating / declining) + velocity (high / medium / low / none) + confidence + plain-English explanation + component deltas. Requires 2+ snapshots.
changeSummary — One-sentence narrative of what changed since last run: headline + direction + confidence + keyEvents[]. Paste-ready.
triggers — Precomputed booleans for downstream automation: highSeverityEvents, possibleAcquisition, productSignals, infraMigration, emailInfraChange, brandRefresh, communityTraction, securityRiskHigh, rapidGrowth, dormancy, needsHumanReview. Filter with WHERE triggers.X = true instead of parsing prose.
actions[] — Ready-to-send action payloads for downstream automation: webhook-payload (generic JSON), crm-enrichment-hubspot (HubSpot Company properties), slack-block-kit (pre-formatted Slack message), jira-issue (high-severity-only), email-digest (subject + body), csv-row (flat one-row representation). Drop-in for integrators.
nextActors[] — Suggested follow-up Apify actors with pre-filled inputs: SEC EDGAR Filing Analyzer (when CIK detected), Website Tech Stack Detector (when in-actor detection is incomplete), Person Enrichment Lookup (when sales / careers / B2B SaaS signals present), Lead Enrichment Pipeline, WHOIS Domain Lookup. Turns this actor into the brain of an Apify pipeline.
playbooks[] — Declarative IF-THEN strategy rules that fire on this run: expansion-phase-engagement, wind-down-de-prioritise, m-and-a-imminent, infra-overhaul-watch, security-soft-target, product-launch-watch, funding-round-watch, rebrand-or-pivot-watch. Each carries triggered conditions + implication + concrete recommendedStrategy + suggestedCadence.
portfolioContext — Only when input portfolioId is set. The cross-company importance signal. rank (e.g. "3/120"), percentile, outlier boolean, plain-English reason, portfolioMedians. Each user's portfolios are isolated by Apify per-user named-store sandboxing.
feed — Only when portfolioId is set. Cross-run aggregation across the user's portfolio: rollingAlerts[] (last 30, capped at 14 days), topMovers[], newEntrants[]. Designed as a daily intelligence feed.
normalized — Only when portfolioId is set with 4+ entries. Percentile rank vs portfolio for each scored metric. Solves "is 186 repos a lot?"
cluster — Only when portfolioId is set with 3+ entries. Membership in a cluster of portfolio companies sharing the same fingerprint / infra signature: id, basis, sizeInPortfolio, similarCompanies[], position (leader/middle/lagger/lone), rationale.
coldStart — Only when portfolioId is set AND portfolio has < 4 entries. Bootstrap guidance: portfolioSize + needsMore + suggestedSeeds (5 well-known public companies matching this entity's archetype to add as portfolio seeds). Solves the "new users get a worse product" cold-start problem.
decisionQuality — Meta trust layer. completeness (0..1) + consistency (0..1) + contradictions[] (detected internal inconsistencies, e.g. "high infra complexity but zero open-source presence") + plain-English summary.
drift[] — Pattern-change detection beyond per-metric anomalies: velocity-shift, composition-shift, attention-shift. Detects e.g. "GitHub repo growth slowed from +2/run to 0/run" — pattern-level, not point-in-time. Requires 5+ snapshots.
explain — Reasoning-chain exposure for the top decision-layer outputs. Each entry: target + derivedFrom[] + rule + optional weights. Plus principles[] documenting the actor's reasoning commitments. The audit trail.
summary — Hero synthesis block, deterministic from the rest of the data. Includes:
- headline — one-line title with archetype + signal count
- oneLine — Wikipedia / SEC / homepage one-liner
- keyTakeaways[] — up to 10 scannable bullets (archetype + business model, public-company status, Wikipedia, tech stack, GitHub footprint with activity, distribution adjacencies, security posture composite, subdomain breakdown with infra-complexity context, Wayback first-seen, AI-policy file, Hacker News, operational maturity, trend lines, top monitoring event)
- whatToCheck[] — up to 4 ranked next-step links (latest SEC filing, Wikipedia, GitHub org, status page)
- confidence — score (0..1) + level (suite-aligned 4-level: high ≥ 0.8 / medium ≥ 0.6 / low ≥ 0.4 / very-low < 0.4) + plain-English explanation of why + dataCoverage (fraction of attempted sources with data) + signalStrength (weighted by which high-value signals landed) + stability (from snapshot history)
intelligence — Computed classification + signals:
- companyType — startup / scaleup / public / enterprise / private / unknown (derived from age + GitHub volume + SEC filing presence + subdomain count)
- archetype — developer-platform / saas / marketplace / fintech / ecommerce / media / agency / enterprise-software / open-source-foundation / consumer-app / other (derived from tech stack + API subdomains + npm/Docker footprint + SIC)
- businessModelHints[] — e.g. ["SaaS or paid product", "Charges via Stripe", "API platform", "SDK distribution"]
- technicalMaturityScore (0..1) + technicalMaturityLevel (low/medium/high) — weighted from infra signals + GitHub footprint + tech stack + operational surfaces
- openSourceStrength — none / low / medium / high (from stars + repo count)
- infraComplexity — low / medium / high (from subdomain count)
- operationalMaturity — low / medium / high (status page + changelog + security.txt + DMARC + pricing page)
- growthSignals[] — plain-English: subdomain growth, repo growth, HN momentum, careers page, recent activity
- riskSignals[] — plain-English: security posture issues, dormant GitHub org, infra migration, missing SPF/DMARC
- notablePatterns[] — non-primary-brand subdomains (acquisition signal), modern Vercel/Cloudflare stacks, multi-payment-processor signals, prior renames, AI-policy file presence
events[] — On scheduled re-runs, classifies the raw diff into typed events. Each event has type + severity (low/medium/high) + evidence + plain-English explanation. Types:
- CORPORATE_UPDATE — new SEC filing (8-K = high; 10-K/Q = medium)
- PRODUCT_SIGNAL — new public GitHub repo
- INFRA_EXPANSION — new subdomains in Certificate Transparency logs
- INFRA_MIGRATION — name servers changed
- EMAIL_INFRA_CHANGE — MX records changed
- INTEGRATION_ADDED / INTEGRATION_REMOVED — TXT verification token added/removed (Slack, Google, Okta, ad networks)
- BRAND_REFRESH — homepage title or description changed
- POSSIBLE_ACQUISITION — non-primary-brand subdomain appeared (e.g. acquired-co.parent.com)
- COMMUNITY_TRACTION — significant Hacker News uptick
trends — Multi-run deltas computed from snapshot history (last 10 per domain):
- subdomains30d / subdomains90d / githubRepos30d / githubStars30d / hackerNews30d / secFilings30d — each with delta, pct, previousValue
- infraStability — stable / volatile / unknown (counts NS + MX changes across history)
- changeFrequency — low / medium / high (how often anything changes per snapshot)
- sampleCount + earliestSampleAt
securityPosture — Composite security score (0..1) + level (low/medium/high) + per-control issues[] and strengths[]. Weights: DMARC reject (0.20) / quarantine (0.15) / SPF (0.10) / CAA (0.05) / HSTS (0.15) / CSP (0.15) / X-Frame-Options (0.05) / X-Content-Type-Options (0.05) / Referrer-Policy (0.05) / Permissions-Policy (0.05) / security.txt (0.15).
fingerprint — Hashes for clustering / dedup / competitive comparison: techStackHash, infraSignature, orgSignature, securityHeadersHash. Sort companies into clusters in your downstream BI.
lifecycle — Company stage detection: nascent / growing / scaling / mature / declining / dormant / unknown, with confidence and supporting signals[]. Derived from age + GitHub growth + careers presence + recent activity + trend deltas.
scoring — Signal weight transparency. Per-score breakdown of which factors fired, their weights, and their actual contribution. Covers technicalMaturity (12 factors), securityPosture (10 factors), operationalMaturity (5 factors). Use to audit / explain / re-weight scores downstream — the math is visible.
correlations[] — Compound patterns detected across the events array. Pattern enum: product-launch / infra-migration / acquisition / pivot / wind-down / security-overhaul / funding-event / rebrand. Each carries confidence (0..1), evidence[], and explanation.
anomalies[] — Z-score-based statistical outliers from snapshot history (requires 4+ prior runs). Types: subdomain-spike / subdomain-drop / github-burst / github-stall / hn-spike / sec-filing-cluster. Each carries detail (current vs baseline mean), interpretation, severity, zScore. Lets you flag "+80 subdomains in 7 days" without writing thresholding logic.
views — Same data, four audience framings. Each contains angle (one-line), hooks[] (why this audience cares), risks[] (what might disqualify), nextSteps[]:
- views.sales — angle for SDR/BDR outreach (tech, payments, hiring, growth)
- views.security — attack surface, posture, missing controls, top remediations
- views.investor — stage, public/private status, growth indicators, financial signals
- views.engineering — tech stack, dev activity, hiring signals, opportunities (changelog/RSS to subscribe)
graph — Treat this domain as a node in a network: relatedCompanies[] (suspected sub-brands / acquisitions derived from notable subdomains, with confidence + evidence), sharedInfrastructureKey (cluster with companies on same infra), sharedEmailInfraKey, sharedTrackingKey (companies that share an ad-network footprint), suspectedSubBrands[], primaryBrandRoot. Build company graphs in BI by joining on these keys.
memory — Cross-run memory: historyDepth (e.g. "47 days across 7 snapshots"), milestones[] (first-occurrence events: first-subdomains-detected, first-github-presence, first-infra-migration, first-email-infra-change, first-brand-refresh), patterns[] (plain-English: "consistent subdomain growth (5 of last 6 runs)").
positioning — Competitive positioning vs the peer cohort. Only emitted when compareTo is set: category + rank + rankBasis + leaders[] + strengths[] + weaknesses[] + summary.
uncertainty[] — Honest catalogue of where this report is unsure. Each item has area, reason, confidence, and a concrete suggestedFix. Builds trust by surfacing failure modes upfront rather than hiding them.
gaps[] — Partial-fail intelligence: which modules came up empty, the impact (low/medium/high), and a plain-English reason. Helps consumers distinguish "no data exists" from "actor broke."
entityId — Stable cross-system identifier ({domain}|{slug-of-companyName}). Use as a join key in pipelines.
outputProfile — Echoes which output profile produced this record (analyst / executive / raw).

What this actor does NOT compute (intentionally)

Cross-company / global market patterns (e.g. "67% of fintechs use Cloudflare") — would require shared global state across all users' runs, which crosses a privacy boundary on a multi-tenant platform. The fingerprint and graph.shared*Key fields exist precisely so you can build this externally by joining datasets. A separate fleet-analytics actor that consumes datasets from many runs is the right shape — not state hidden inside a single-domain research actor.
Predictive ML scoring — every predictions[] entry is rule-based (deterministic, auditable). No LLM, no neural network — they reproduce on every run.
Per-user personalisation layer (userModel) — adapting priorities ranking + action thresholds + decisionRisk interpretation to a specific user's preferences (riskTolerance, actionBias, prefersEarlySignals, historicalAccuracy) is the next major architectural addition but explicitly deferred. Reason: would need user-supplied preference inputs + a meaningful accuracy-tracking dataset across runs (the current decisionMemory is per-entity, not per-user-pattern). Roadmap candidate, not v1.

Decision Philosophy v1

The actor's outputs encode an explicit philosophy. These rules are baked into priority ranking, watchlist actions, decisionRisk asymmetry, and actionGuard caps. Documented here so you can override them deliberately (and so a future userModel layer can swap them per user):

Bias toward action when false-negative cost > false-positive cost AND reversibility is easy. Surfaces as decisionRisk.actEvenIfUnsure.
Cap concurrent actions at 1-3. Diminishing returns set in beyond that — attention is finite, downstream automation gets noisy, signal-to-noise erodes. Surfaces as actionGuard.recommendedMaxActions.
Prefer correlated signals over isolated anomalies. A correlation:product-launch outranks a single PRODUCT_SIGNAL event in priorities[]. Surfaces in priority weighting.
Prefer recent signals over historical trends. Current-run events outweigh patterns from 30-90d ago in priority ranking. Surfaces in buildPriorities weight stack.
Surface uncertainty over hiding it. gaps[], uncertainty[], epistemicStatus, decisionQuality.contradictions[], signalIndependence.warning all exist to flag confidence limits before users over-trust outputs.
Honest abstention. When data is thin: emit unknown / null / low-confidence rather than fabricate. Returns null on whyNow when no notable trigger fired (better than emitting noise).
Deterministic over probabilistic. Same inputs → same outputs every run. No LLM, no neural network. Documented in explain.principles[].

If you want a specific rule changed for your workflow, override at the consumer layer (filter dataset records, re-rank priorities by your own weights). The schema preserves enough underlying detail for any consumer to build their own opinion on top.

Failure modes the actor explicitly guards against

False precision authority — heuristic outputs (timeToRegret.deadlineHint, decisionRisk levels, predictions[].confidence) carry an epistemicStatus block that names them as estimates, lists what they're based on, and warns about what they're NOT.
Signal stacking illusion — signalIndependence.warning fires when N signals all derive from the same underlying change. "3 signals" can be 1 signal echoed 3 times.
Decision fatigue — actionGuard.recommendedMaxActions caps concurrent actions; primaryActionOnly elevates a single dominant action and gives explicit permission to ignore the rest.
Overconfident classification — uncertainty[] flags areas where the actor knows it's guessing (company name, archetype, lifecycle, acquisition detection) and provides suggestedFix for each.
Hidden contradictions — decisionQuality.contradictions[] surfaces internal inconsistencies (e.g. "high infra complexity but zero open-source presence") rather than silently passing them through.
Temporal misinterpretation — timeHorizonAlignment.status flags when short-term urgency (timeToRegret) and long-term trajectory diverge — prevents "this is urgent AND accelerating" reads when reality is "spike inside stability".

Worked examples of misinterpretation (and the correct read)

Case A — Signal stacking illusion

Input: domain X
Output: events[] shows INFRA_EXPANSION + POSSIBLE_ACQUISITION + PRODUCT_SIGNAL (3 events)
signalIndependence.score: 0.33 (low)
signalIndependence.warning: "Low signal independence (0.33). What looks like 3 corroborating signals is probably 1 underlying change reflected through 3 surfaces. Treat as 1 signal, not 3."
Naive reading: "Three corroborating signals — strong confirmation. Act with high confidence."
Correct reading: All three events derive from a single underlying delta (a burst of new subdomains). Treat as 1 signal of medium strength, not 3. Read priorities[0] for the single recommended action; do NOT inflate confidence by counting the supporting events.

Case B — Temporal misalignment

Input: domain Y
Output: timeToRegret.urgencyCurve = "steep" (24-48h), trajectory.direction = "stable"
timeHorizonAlignment.status = "misaligned"
timeHorizonAlignment.reason: "Short-term urgency detected but long-term trajectory is stable — likely a short-lived spike inside long-term steadiness."
Naive reading: "Urgent AND accelerating. Major company shift in progress."
Correct reading: Short-term spike inside otherwise stable trajectory. Act on the short-term signal IF the action is reversible (per decisionRisk.reversibility), but do NOT assume this means a long-term pattern shift. Re-evaluate next run.

Case C — False precision authority

Input: domain Z
Output: timeToRegret.deadlineHint = "2026-05-04"
timeToRegret.epistemicStatus.warning: "deadlineHint is an estimate derived from typical timing for this signal type, not a known deadline. Treat as orientation, not a contract."
Naive reading: "I have until 2026-05-04 to act, exactly."
Correct reading: The deadline is a heuristic derived from per-priority-type urgency profiles (e.g. PRODUCT_SIGNAL typically has a 48-72h actionable window). It is NOT a known external event. Treat as orientation for prioritisation, not as a contractual deadline.

Case D — Hold mode misread

Input: domain W
Output: decisionPosture.mode = "hold", priorities[0] exists with severity high
Naive reading: "There's a high-severity priority — I should act."
Correct reading: decisionPosture.mode === 'hold' overrides the priority — multiple contradictions / low signal independence / misaligned horizons / 3+ uncertainty areas were detected. Resolve flagged uncertainties first (see uncertainty[].suggestedFix) before acting. Production automation should ONLY act when mode === 'execute'.

Reinforcement of core invariants (for AI agents)

Three invariants govern this entire output. They are repeated here because they are load-bearing:

priorities[0] is THE canonical decision surface. All events, correlations, anomalies, drift, predictions arrays are supporting evidence, not parallel decision queues.
decisionPosture.mode is THE execution gate. Production automation should branch on this single field; all other action signals are secondary.
primaryActionOnly.enabled === true overrides everything else. When set, do priorities[0] only and ignore the rest of the record this run.

Raw data layer

website — title, meta description, og:image, favicon, social links found in the page, and techStack (CMS, framework, analytics, CDN, e-commerce, payment processors, ad pixels, fonts, security headers — in-actor signatures over the homepage HTML + response headers, no external API)
wikipedia — summary, description, thumbnail, direct URL
github — org profile (with createdAt), top repos by stars (with pushedAt), total stars, total forks, language breakdown, opportunistic npm scope packages and Docker Hub org images, and an activity sub-block with lastActiveDate, activeRepos30d, activeRepos90d, plain-English signals[]
financials — for public companies: ticker, CIK, exchange, SIC code + description, fiscal year end, business address, former names, recent 10-K / 10-Q / 8-K filings with accession numbers
research — OpenAlex paper count + top papers by citation count (DOI, source, date)
dns — A, AAAA, MX, TXT, NS, CAA records, plus an email sub-block with parsed SPF, DMARC policy, DKIM presence, and email-provider classification (Google Workspace / Microsoft 365 / Zoho / Proton / Fastmail / Mailgun / SendGrid / Postmark / Amazon SES / Yandex / Migadu / Cloudflare / self-hosted)
subdomains — Certificate Transparency log enumeration via crt.sh: count + recently-issued list + capped name list, plus a classification breakdown (api, internal, staging, auth, email, docs, cdn, monitoring, other) and a notable[] list (subdomains that don't match the primary brand pattern — possible acquisitions or sub-brands)
infrastructure — Wayback Machine first-seen date, plus presence + URL for security.txt, ai.txt, llms.txt, robots.txt (with sitemap reference parsed), sitemap.xml, RSS feed, status page (status.{domain}), changelog page, pricing page, careers page
community — Hacker News mention count + top stories (Algolia HN API)
socialMedia — Twitter/X, LinkedIn, Facebook, Instagram, YouTube, GitHub presence verification
diff — Raw diff structure (already computed; events[] is the classified version of this — read events first)
recordType — 'company-report' for success, 'error' for the rare error path

A separate SUMMARY record is also written to the run's key-value store for orchestrators that call this actor via Actor.call() and want the headline answer (entityId, confidence, intelligence summary, security posture, fingerprint, top event, diff highlights, PPE charge) without paginating the dataset.

How to Use

Enter the domain — e.g. stripe.com. The https:// prefix and trailing slashes are stripped automatically.
Optionally set the company name — if blank, the actor detects it from <title> or og:title. Override when the homepage title is a tagline rather than the company name (e.g., "Build the Future" instead of "Acme Corp"). This dramatically improves Wikipedia, SEC, GitHub, and Hacker News match accuracy.
Toggle modules if needed — every data source is on by default; turn off the ones you don't need to shave run time.
Click "Start" — typical run takes 15–45 seconds. You'll see live progress messages in the Console (Step 1/10: Analyzing website…, Steps 2–10: Aggregating Wikipedia, GitHub, SEC, …, Done. 9 sources returned data: …).

Input Parameters

Parameter	Type	Required	Default	Description
`domain`	String	Yes	`stripe.com`	Company website domain (e.g., `stripe.com`). The `https://` prefix and trailing slashes are stripped automatically.
`companyName`	String	No	Auto-detected	Override the auto-detected company name. Use this when the homepage title is a tagline.
`includeFinancials`	Boolean	No	`true`	Search SEC EDGAR for filings, ticker, CIK, exchange, address, SIC code.
`includeResearch`	Boolean	No	`true`	Search OpenAlex for academic papers mentioning the company.
`includeGithub`	Boolean	No	`true`	Find GitHub org, top repos, language breakdown, plus npm + Docker Hub footprint when an org is found.
`includeTechStack`	Boolean	No	`true`	Detect CMS, framework, analytics, CDN, e-commerce, payment processors, ad pixels, fonts, and security headers.
`includeSubdomains`	Boolean	No	`true`	Enumerate subdomains via Certificate Transparency logs (crt.sh).
`includeInfrastructure`	Boolean	No	`true`	Detect security.txt, ai.txt, llms.txt, robots, sitemap, RSS, status, changelog, pricing, careers + Wayback first-seen.
`includeCommunity`	Boolean	No	`true`	Search Hacker News for mention count + top stories.
`enableMonitoring`	Boolean	No	`true`	On repeat runs for the same domain, return a `diff` field, an `events[]` array (typed classification), a `trends` block (30d / 90d deltas), `correlations[]` (compound patterns), and `anomalies[]` (statistical outliers — needs 4+ prior runs).
`outputProfile`	Enum	No	`analyst`	`analyst` (full record) / `executive` (decision layer + thin pointers) / `raw` (modules only). The SUMMARY KV record is always full regardless.
`compareTo`	String[]	No	`[]`	Up to 3 peer domains to benchmark against. Each peer = +1 PPE event ($1.00 per peer). Adds a `peer-comparison` record to the dataset with rank + summary across 8 metrics.
`portfolioId`	String	No	—	Opt-in label for cross-company tracking. When set, run writes a lightweight entry to a per-user named KV store and emits `portfolioContext` (rank/percentile/outlier), `feed` (rolling alerts + top movers + new entrants), `normalized` (percentile scores), `cluster` (similar companies), `portfolioPressure` (attention share + displacement). Each user's portfolios are isolated by Apify per-user named-store sandboxing.
`monitorStateKey`	String	No	—	Suite-aligned alias for `portfolioId`. Either input works; if both are set, `portfolioId` wins. Use this when you want one consistent field name across `company-deep-research`, `waterfall-contact-enrichment`, `bulk-email-verifier`, and `lead-enrichment-pipeline`.
`lastAction`	Object	No	—	`{ type: string, takenAt: ISO date, note?: string }`. Tells the actor what action you took on this entity since the last run. Stored in the portfolio (requires `portfolioId`); on subsequent runs the actor infers outcome via state delta and emits `decisionMemory`. Outcome inference is honest: it can only observe signal changes — it can't see direct replies / deals / off-platform engagement.
`includePatents`	Boolean	No	`false`	Off by default. The USPTO PatentsView API was retired in August 2024 and the replacement requires an API key, which would break the no-key promise of this actor.
`githubToken`	String	No	—	GitHub personal access token. Without it, GitHub allows 60 unauthenticated requests/hour. With it, 5,000/hour.
`maxResults`	Integer	No	`50`	Maximum items returned per data source (1–200).

Input Examples

Quick lookup — full intelligence record (default):

{
    "domain": "stripe.com"
}

Executive output for Slack alerts — decision layer + thin pointers:

{
    "domain": "stripe.com",
    "outputProfile": "executive"
}

Raw modules only — backward-compatible mode for users who want pure data:

{
    "domain": "stripe.com",
    "outputProfile": "raw"
}

Peer comparison — benchmark against 2 competitors (each peer = +1 PPE event):

{
    "domain": "stripe.com",
    "compareTo": ["adyen.com", "checkout.com"]
}

Portfolio mode — track this company as part of a larger watchlist:

{
    "domain": "stripe.com",
    "portfolioId": "fintech-watchlist-2026"
}

The first run for a portfolioId creates the portfolio. Each subsequent run for the same portfolioId adds to it (and refreshes existing entries). After ~4-5 different domains have been added, the actor starts emitting portfolioContext (rank/percentile/outlier), feed (rolling alerts + top movers + new entrants), normalized (percentile scores vs portfolio), cluster (similar companies in your portfolio), and portfolioPressure (attention share + displacement). Build a "100-company fintech watchlist" by scheduling this actor across 100 domains all using the same portfolioId.

Decision memory — close the feedback loop:

{
    "domain": "stripe.com",
    "portfolioId": "fintech-watchlist-2026",
    "lastAction": {
        "type": "trigger-outreach",
        "takenAt": "2026-04-15T09:00:00Z",
        "note": "sent intro email to VP Eng"
    }
}

The actor stores lastAction in the portfolio entry. On the next run it compares the current state vs the snapshot at action time and emits decisionMemory: { outcome, effectivenessScore, pattern, daysSinceAction, inferenceMethod }. Outcome inference is honest — engaged / escalated / no-response / no-change / resolved / too-soon-to-tell — derived from observable signal changes only.

When compareTo is set, the dataset gets an additional peer-comparison record with rank + summary across 8 metrics (technical maturity, security posture, infra complexity, OSS strength, operational maturity, subdomains, GitHub stars, HN mentions). Each peer triggers its own recursive run and bills its own PPE event — 2 peers = 3 total $1.00 charges.

Named company with GitHub token (avoids 60-req/hr unauthenticated rate limit):

{
    "domain": "openai.com",
    "companyName": "OpenAI",
    "githubToken": "ghp_xxxxxxxxxxxxxxxxxxxx",
    "maxResults": 20
}

Fast scan — website + tech stack + DNS + social only:

{
    "domain": "acme.com",
    "includeFinancials": false,
    "includeResearch": false,
    "includeGithub": false,
    "includeSubdomains": false,
    "includeInfrastructure": false,
    "includeCommunity": false
}

Scheduled monitoring — daily run with diff + events + trends + anomalies:

{
    "domain": "anthropic.com",
    "enableMonitoring": true
}

When you schedule this, the second run onwards returns a diff field (raw changes), an events[] array (typed classification), a trends block (30d / 90d deltas from snapshot history), correlations[] (compound patterns), anomalies[] (statistical outliers — needs 4+ prior runs), and a changeSummary.headline you can paste into a Slack message verbatim.

Input Tips

Provide companyName explicitly for companies whose website title is a tagline. This improves accuracy across Wikipedia, SEC, GitHub, and Hacker News.
Use maxResults: 10 for quick overviews, maxResults: 50 for comprehensive reports, maxResults: 200 to pull every subdomain crt.sh has on file.
Set includeFinancials: false for private companies to skip SEC EDGAR (it's US-only) and save 5–10 seconds.
For batch processing 100+ companies, supply a githubToken to avoid the 60-req/hr unauthenticated GitHub limit.

Output

Each run produces one dataset item. Truncated example showing the decision layer at the top, then the raw modules below:

{
    "recordType": "company-report",
    "entityId": "stripe.com|stripe",
    "domain": "stripe.com",
    "companyName": "Stripe",
    "researchDate": "2026-05-01",
    "tldr": {
        "oneSentence": "Stripe is accelerating (lifecycle: scaling, fintech) — Platform / multi-product expansion underway.",
        "topRisk": null,
        "topOpportunity": "+14 new public repos in the last ~30 days",
        "needsAttention": false
    },
    "trajectory": {
        "direction": "accelerating",
        "velocity": "high",
        "confidence": "high",
        "explanation": "Direction: accelerating (4 growing, 0 declining of 4 measured signals). Velocity: high (+47 subdomains in 30d). Confidence: high (7 historical snapshots).",
        "components": { "subdomainsDelta30d": 47, "repoDelta30d": 5, "starsDelta30d": 412, "hnDelta30d": 23 }
    },
    "predictions": [
        {
            "type": "platform-expansion-likely",
            "confidence": 0.65,
            "timeframe": "ongoing",
            "evidence": ["+47 subdomains in 30d", "3 languages", "10 npm packages"],
            "headline": "Platform / multi-product expansion underway",
            "rationale": "Aggressive subdomain growth combined with multi-language + multi-package distribution points at platform-mode investment — expect new SDK / API / market launches."
        }
    ],
    "graph": {
        "primaryBrandRoot": "stripe",
        "relatedCompanies": [
            { "domain": "paystack.com", "relationship": "acquisition-suspected", "confidence": 0.55, "evidence": ["Subdomain paystack.stripe.com on stripe.com hosts what looks like a separate brand"] },
            { "domain": "bridge.com", "relationship": "acquisition-suspected", "confidence": 0.55, "evidence": ["Subdomain bridge-payments.stripe.com on stripe.com hosts what looks like a separate brand"] }
        ],
        "sharedInfrastructureKey": "dynectnet_googleworkspace_cloudflare",
        "sharedEmailInfraKey": "googleworkspace",
        "sharedTrackingKey": "googleanalytics4_segment",
        "suspectedSubBrands": ["bridge-payments.stripe.com", "paystack.stripe.com", "atlas.stripe.com"]
    },
    "memory": {
        "historyDepth": "47 days across 7 snapshots",
        "snapshotCount": 7,
        "earliestSnapshotAt": "2026-04-01T08:00:00.000Z",
        "milestones": [
            { "eventType": "first-github-presence", "detectedAt": "2026-04-01T08:00:00.000Z", "detail": "186 repos observed" },
            { "eventType": "first-brand-refresh", "detectedAt": "2026-04-22T08:00:00.000Z", "detail": "Homepage title changed for the first time in our history" }
        ],
        "patterns": [
            "Consistent subdomain growth (5 of last 6 transitions positive)",
            "Steady GitHub repo additions (4 of last 6 transitions positive)"
        ]
    },
    "uncertainty": [
        {
            "area": "acquisition-detection",
            "reason": "Detected 3 non-primary-brand subdomain(s) but cannot corroborate against SEC filings (private company).",
            "confidence": 0.55,
            "suggestedFix": "Cross-check Crunchbase / news sources / Wikipedia for announced acquisitions matching: bridge-payments.stripe.com, paystack.stripe.com, atlas.stripe.com"
        }
    ],
    "actions": [
        { "type": "webhook-payload", "target": "Generic HTTP webhook", "rationale": "...", "payload": { "entityId": "stripe.com|stripe", "tldr": "Stripe is accelerating...", "topPriority": { "rank": 1, "type": "PRODUCT_SIGNAL", "severity": "high", "headline": "5 new public GitHub repos: stripe-cli-v2, payments-rs, fraud-rules-dsl…", "action": "Review new repos…" }, "needsAttention": false } },
        { "type": "slack-block-kit", "target": "Slack incoming-webhook", "rationale": "Pre-formatted Slack message", "payload": { "blocks": [{ "type": "header", "text": { "type": "plain_text", "text": "Stripe (stripe.com)" } }, "..."] } }
    ],
    "summary": {
        "headline": "Stripe — private fintech (10 signals)",
        "oneLine": "Stripe — American-Irish financial services company",
        "keyTakeaways": [
            "Looks like a private fintech (SaaS or paid product)",
            "Wikipedia: American-Irish financial services company",
            "Tech stack: Next.js + Cloudflare + Stripe",
            "Engineering: 186 repos, 28,450 stars — TypeScript-led (org since 2011), 14 active in last 30d",
            "Distributes: 64 npm packages, 12 Docker images",
            "Security posture: 92/100 (high) — Google Workspace, 8 strengths, 1 issues",
            "847 subdomains in CT logs (12 api, 4 staging, 23 internal) — high infra complexity, likely large engineering org",
            "Online since 2010 (Wayback Machine first snapshot)",
            "3,742 Hacker News mentions — strong developer community visibility",
            "Operational maturity: high (status page + changelog + security.txt)"
        ],
        "whatToCheck": [
            { "label": "Read Wikipedia summary for context", "url": "https://en.wikipedia.org/wiki/Stripe,_Inc." },
            { "label": "Visit GitHub org (186 repos, 28,450 stars)", "url": "https://github.com/stripe" },
            { "label": "Check status page for outages", "url": "https://status.stripe.com" }
        ],
        "confidence": {
            "score": 0.86,
            "level": "high",
            "explanation": "High confidence — 9/10 sources returned data and 95% of high-value signals (Wikipedia, GitHub org, SEC filings, tech stack, email provider) landed.",
            "dataCoverage": 0.9,
            "signalStrength": 0.85,
            "stability": "stable"
        }
    },
    "intelligence": {
        "companyType": "private",
        "archetype": "fintech",
        "businessModelHints": ["SaaS or paid product", "Charges via Stripe", "API platform", "SDK distribution"],
        "technicalMaturityScore": 0.95,
        "technicalMaturityLevel": "high",
        "openSourceStrength": "high",
        "infraComplexity": "high",
        "operationalMaturity": "high",
        "growthSignals": [
            "+14 new public repos in the last ~30 days",
            "Active engineering — 14 repos pushed in the last 30 days",
            "Careers page online (likely hiring)"
        ],
        "riskSignals": [],
        "notablePatterns": [
            "12 non-primary-brand subdomains (possible acquisition or sub-brand): bridge-payments.stripe.com, paystack.stripe.com…",
            "Modern Vercel/Cloudflare-style stack",
            "Multi-payment-processor (Stripe + PayPal) — likely large transaction volume"
        ]
    },
    "events": [
        {
            "type": "PRODUCT_SIGNAL",
            "severity": "high",
            "evidence": "5 new public GitHub repos: stripe-cli-v2, payments-rs, fraud-rules-dsl…",
            "explanation": "New public repos often indicate a product launch, new SDK/CLI, or open-sourcing of an internal tool."
        },
        {
            "type": "INFRA_EXPANSION",
            "severity": "medium",
            "evidence": "12 new subdomains in Certificate Transparency logs",
            "explanation": "Burst of new subdomains often indicates new services, environments, or geographic expansion."
        }
    ],
    "trends": {
        "sampleCount": 7,
        "earliestSampleAt": "2026-04-01T08:00:00.000Z",
        "subdomains30d": { "delta": 47, "pct": 5.9, "previousValue": 800 },
        "subdomains90d": { "delta": 122, "pct": 16.8, "previousValue": 725 },
        "githubRepos30d": { "delta": 5, "pct": 16.7, "previousValue": 30 },
        "githubStars30d": { "delta": 412, "pct": 1.5, "previousValue": 28038 },
        "hackerNews30d": { "delta": 23, "pct": 0.6, "previousValue": 3719 },
        "secFilings30d": null,
        "infraStability": "stable",
        "changeFrequency": "medium"
    },
    "securityPosture": {
        "score": 0.92,
        "level": "high",
        "issues": ["No Permissions-Policy header"],
        "strengths": [
            "DMARC enforced (reject)",
            "SPF record present",
            "CAA records published (restricts which CAs can issue certs)",
            "HSTS header",
            "Content-Security-Policy header",
            "X-Frame-Options header (clickjacking)",
            "X-Content-Type-Options header",
            "Referrer-Policy header",
            "Published security.txt with disclosure contact"
        ]
    },
    "fingerprint": {
        "techStackHash": "_nextjs_cloudflare__stripe",
        "infraSignature": "cloudflare_googleworkspace_dynect.net",
        "orgSignature": "massiverepos_typescript_osshigh",
        "securityHeadersHash": "contentsecuritypolicy_referrerpolicy_strict-transport-security_xcontent-type-options_xframe-options"
    },
    "gaps": [
        { "module": "financials", "impact": "low", "reason": "No SEC filings — likely a private company or non-US-listed" }
    ],
    "website": {
        "title": "Stripe | Financial Infrastructure for the Internet",
        "description": "Stripe powers online and in-person payment processing...",
        "favicon": "https://stripe.com/favicon.ico",
        "ogImage": "https://stripe.com/img/v3/home/social.png",
        "socialLinks": {
            "twitter": "https://twitter.com/stripe",
            "linkedin": "https://www.linkedin.com/company/stripe",
            "github": "https://github.com/stripe"
        },
        "techStack": {
            "cms": "",
            "framework": "Next.js",
            "analytics": ["Google Analytics 4", "Segment"],
            "cdn": "Cloudflare",
            "ecommerce": "",
            "fonts": ["Google Fonts"],
            "ads": [],
            "paymentProcessors": ["Stripe"],
            "securityHeaders": {
                "strict-transport-security": "max-age=63072000; includeSubDomains; preload",
                "content-security-policy": "...",
                "x-frame-options": "DENY"
            }
        }
    },
    "wikipedia": {
        "found": true,
        "summary": "Stripe, Inc. is an Irish-American multinational financial services...",
        "description": "American-Irish financial services company",
        "thumbnail": "https://upload.wikimedia.org/...",
        "url": "https://en.wikipedia.org/wiki/Stripe,_Inc."
    },
    "github": {
        "found": true,
        "orgProfile": {
            "name": "Stripe",
            "bio": "Financial infrastructure for the internet.",
            "publicRepos": 186,
            "followers": 1523,
            "url": "https://github.com/stripe",
            "createdAt": "2011-04-25T16:13:42Z"
        },
        "topRepositories": [
            {
                "name": "stripe-node",
                "description": "Node.js library for the Stripe API.",
                "stars": 3842,
                "forks": 745,
                "language": "TypeScript",
                "url": "https://github.com/stripe/stripe-node",
                "pushedAt": "2026-04-30T14:22:11Z"
            }
        ],
        "totalStars": 28450,
        "totalForks": 7120,
        "languages": [
            { "language": "TypeScript", "repoCount": 14 },
            { "language": "Ruby", "repoCount": 9 },
            { "language": "Go", "repoCount": 7 }
        ],
        "npmPackages": [
            { "name": "@stripe/stripe-js", "description": "Loading wrapper for Stripe.js", "version": "4.x.x", "url": "https://www.npmjs.com/package/@stripe/stripe-js" }
        ],
        "dockerImages": [],
        "activity": {
            "lastActiveDate": "2026-04-30",
            "activeRepos30d": 14,
            "activeRepos90d": 28,
            "signals": [
                "Multi-language (TypeScript, Ruby, Go)",
                "TypeScript-led",
                "Strong open-source traction (10K+ stars across top repos)",
                "High recent activity (14 repos pushed in last 30 days)",
                "Developer-first (10 npm packages)"
            ]
        }
    },
    "financials": {
        "isPublicCompany": false,
        "ticker": null,
        "cik": null,
        "exchange": null,
        "sicCode": null,
        "sicDescription": null,
        "fiscalYearEnd": null,
        "address": null,
        "formerNames": [],
        "recentFilings": []
    },
    "research": {
        "paperCount": 1247,
        "topPapers": [
            { "title": "The Rise of Embedded Finance...", "doi": "https://doi.org/10.1016/j.jfi.2024.101032", "citationCount": 89, "publicationDate": "2024-06-15", "source": "Journal of Financial Intermediation" }
        ]
    },
    "dns": {
        "aRecords": ["185.166.143.32"],
        "aaaaRecords": ["2a04:8400:0:0:0:0:0:32"],
        "mxRecords": ["1 aspmx.l.google.com", "5 alt1.aspmx.l.google.com"],
        "txtRecords": ["v=spf1 include:_spf.google.com ~all", "v=DMARC1; p=reject; rua=mailto:dmarc@stripe.com"],
        "nameServers": ["ns1.p16.dynect.net"],
        "caaRecords": ["0 issue=letsencrypt.org"],
        "email": {
            "provider": "Google Workspace",
            "spfPresent": true,
            "spfRecord": "v=spf1 include:_spf.google.com ~all",
            "dmarcPolicy": "reject",
            "dmarcRecord": "v=DMARC1; p=reject; rua=mailto:dmarc@stripe.com",
            "dkimSelectors": []
        }
    },
    "subdomains": {
        "found": true,
        "count": 847,
        "recent": [
            { "name": "api.stripe.com", "firstSeen": "2026-04-30" },
            { "name": "dashboard.stripe.com", "firstSeen": "2026-04-29" }
        ],
        "all": ["api.stripe.com", "dashboard.stripe.com", "..."],
        "classification": {
            "api": 12, "internal": 23, "staging": 4, "auth": 6, "email": 3,
            "docs": 5, "cdn": 2, "monitoring": 1, "other": 791
        },
        "notable": ["bridge-payments.stripe.com", "paystack.stripe.com", "atlas.stripe.com"]
    },
    "infrastructure": {
        "firstSeenWayback": "2010-09-14",
        "securityTxt": { "found": true, "url": "https://stripe.com/.well-known/security.txt", "contact": "mailto:security@stripe.com" },
        "aiTxt": { "found": false, "url": "" },
        "llmsTxt": { "found": false, "url": "" },
        "robotsTxt": { "found": true, "url": "https://stripe.com/robots.txt", "sitemapReference": "https://stripe.com/sitemap.xml" },
        "sitemapXml": { "found": true, "url": "https://stripe.com/sitemap.xml" },
        "rssFeed": { "found": true, "url": "https://stripe.com/blog/feed.rss" },
        "statusPage": { "found": true, "url": "https://status.stripe.com" },
        "changelogPage": { "found": true, "url": "https://stripe.com/changelog" },
        "pricingPage": { "found": true, "url": "https://stripe.com/pricing" },
        "careersPage": { "found": true, "url": "https://stripe.com/jobs" }
    },
    "community": {
        "hackerNews": {
            "mentionCount": 3742,
            "topStories": [
                { "title": "Stripe acquires Bridge for payments", "url": "https://stripe.com/...", "points": 1842, "numComments": 612, "createdAt": "2026-04-15T...", "storyUrl": "https://news.ycombinator.com/item?id=..." }
            ]
        }
    },
    "socialMedia": [
        { "platform": "Twitter/X", "url": "https://twitter.com/stripe", "found": true },
        { "platform": "LinkedIn", "url": "https://www.linkedin.com/company/stripe", "found": true }
    ],
    "diff": null
}

Output fields

Top-level discriminators:

Field	Type	Description
`recordType`	String	`'company-report'` for a successful research record, `'error'` for an error record
`domain`	String	The company domain that was researched
`companyName`	String	Detected or provided company name
`researchDate`	String	ISO date of the research (YYYY-MM-DD)

summary fields (hero block — read this first):

Field	Type	Description
`headline`	String	One-line title (`"<companyName> — <role> (<N> signals)"`)
`oneLine`	String	Short shareable answer (Slack subject, dashboard tile)
`keyTakeaways[]`	Array of strings	Up to 8 scannable bullets synthesized from the modules below
`whatToCheck[]`	Array of `{label, url}`	Up to 4 ranked next-step links
`confidence.score`	Number	0..1 — fraction of attempted sources that returned data
`confidence.level`	String	`'high'` (≥0.7), `'medium'` (≥0.4), or `'low'`
`confidence.explanation`	String	Plain-English reason — usable verbatim in reports

website.techStack fields (in-actor signature detection):

Field	Type	Description
`cms`	String	Detected CMS (WordPress, Shopify, Webflow, Wix, Squarespace, Ghost, Drupal, Joomla, HubSpot CMS, Contentful, Sanity)
`framework`	String	Detected framework (Next.js, Nuxt, Gatsby, Remix, Astro, SvelteKit, React, Vue, Angular, Hugo, Jekyll, Eleventy)
`analytics[]`	Array	Analytics tools (GA4, GTM, Universal Analytics, Segment, Mixpanel, Amplitude, Heap, PostHog, Plausible, Fathom, Hotjar, FullStory, Matomo)
`cdn`	String	CDN (Cloudflare, Fastly, Akamai, CloudFront, Vercel, Netlify, GitHub Pages, Cloudflare Pages, Bunny CDN, KeyCDN)
`ecommerce`	String	E-commerce platform (Shopify, WooCommerce, BigCommerce, Magento, PrestaShop, Snipcart)
`paymentProcessors[]`	Array	Payment processors (Stripe, PayPal, Square, Adyen, Braintree, Klarna)
`ads[]`	Array	Ad pixels (Google Ads, Meta Pixel, LinkedIn Insight, Twitter Pixel, TikTok Pixel, Reddit Pixel)
`fonts[]`	Array	Font services (Google Fonts, Adobe Fonts, Monotype)
`securityHeaders`	Object	HSTS, CSP, X-Frame-Options, X-Content-Type-Options, Referrer-Policy, Permissions-Policy

github fields:

Field	Type	Description
`found`	Boolean	Whether a GitHub org or repos were found
`orgProfile.name`	String	GitHub organization display name
`orgProfile.bio`	String	Organization description
`orgProfile.publicRepos`	Integer	Number of public repositories
`orgProfile.followers`	Integer	Number of GitHub followers
`orgProfile.createdAt`	String	ISO date the org was created (proxy for company age)
`topRepositories[]`	Array	Top repos by stars: `name`, `description`, `stars`, `forks`, `language`, `url`
`totalStars` / `totalForks`	Integer	Sum across returned repos
`languages[]`	Array	Language breakdown: `{language, repoCount}` ranked by repo count
`npmPackages[]`	Array	npm packages under the same scope (e.g. `@stripe/*`)
`dockerImages[]`	Array	Docker Hub images under the same org

financials fields:

Field	Type	Description
`isPublicCompany`	Boolean	Whether SEC filings were found
`ticker`	String / null	Stock ticker (e.g., `"AAPL"`)
`cik`	String / null	SEC Central Index Key
`exchange`	String / null	Stock exchange (NYSE, NASDAQ, etc.) — pulled from `data.sec.gov/submissions/CIK*.json`
`sicCode` / `sicDescription`	String / null	Standard Industrial Classification
`fiscalYearEnd`	String / null	MMDD format
`address`	Object / null	Business address (street, city, state, zip)
`formerNames[]`	Array	Past company names from SEC filings
`recentFilings[]`	Array	Recent SEC filings: `formType`, `filedDate`, `description`, `url`, `accessionNumber`

dns + dns.email fields:

Field	Type	Description
`aRecords` / `aaaaRecords`	String[]	IPv4 / IPv6 addresses
`mxRecords` / `txtRecords` / `nameServers` / `caaRecords`	String[]	Other DNS records
`email.provider`	String	Classified email provider (Google Workspace, Microsoft 365, Zoho, Proton, Fastmail, Mailgun, SendGrid, Postmark, Amazon SES, Yandex, Migadu, Cloudflare, self-hosted)
`email.spfPresent` / `email.spfRecord`	Boolean / String	SPF detection
`email.dmarcPolicy` / `email.dmarcRecord`	String	DMARC policy (`none` / `quarantine` / `reject`)
`email.dkimSelectors[]`	Array	DKIM selectors found in TXT records

subdomains fields:

Field	Type	Description
`count`	Integer	Total unique subdomains in Certificate Transparency logs
`recent[]`	Array	Up to 20 most recently issued: `{name, firstSeen}`
`all[]`	Array	All subdomains, capped at `min(maxResults, 200)`

infrastructure fields: see the example above for full shape — every well-known file probe returns {found, url, ...}.

community.hackerNews fields:

Field	Type	Description
`mentionCount`	Integer	Total Hacker News story count for the company name
`topStories[]`	Array	Top stories: `title`, `url`, `points`, `numComments`, `createdAt`, `storyUrl`

diff fields (only on second-and-later runs of the same domain):

Field	Type	Description
`since`	String	ISO timestamp of the previous snapshot
`sinceRunId`	String / null	Apify run ID of the previous run
`newSecFilings[]`	Array	SEC filings present this run, absent in the previous snapshot (matched by `accessionNumber`)
`newGithubRepos[]`	String[]	Repo names new since last run
`newTxtRecords[]` / `removedTxtRecords[]`	String[]	TXT verification token added/removed (Google, Microsoft, Slack, Okta, ad networks…)
`newSubdomains[]`	String[]	Subdomains issued in Certificate Transparency since last run
`nameServersChanged`	Boolean	Whether NS records changed
`nameServersOld` / `nameServersNew`	String[]	Old vs new NS list (only populated when changed)
`mxRecordsChanged`	Boolean	Whether MX records changed
`homepageTitleChanged` / `homepageDescriptionChanged`	Boolean	Whether homepage copy changed
`newPatents` / `newHackerNewsStories`	Integer	Count delta since last run

Decision-grade features

This actor goes well beyond "give me the data." Four features turn the output into a decision system:

Priorities — the decision queue

Every report contains a priorities[] array with the top 5 ranked decisions the data implies. Each priority has a recommendedAction written as a concrete next step ("Read filing X", "Investigate the new subdomains", "Update internal records of this domain's infra"). Most users only need to read priorities[0]. Built deterministically from events + correlations + anomalies + lifecycle + securityPosture.

"priorities": [
  {
    "rank": 1,
    "type": "correlation:product-launch",
    "severity": "high",
    "headline": "product launch pattern detected (confidence 85%)",
    "reason": "5 new public GitHub repos + 12 new subdomains in Certificate Transparency logs",
    "whyItMatters": "New public repos co-occurring with a subdomain burst is a strong product-launch signal — repo for the SDK, subdomain for the service.",
    "recommendedAction": "Track the launch — read the new repos AND visit the new subdomains for the live product.",
    "evidence": ["5 new public GitHub repos: stripe-cli-v2, payments-rs, fraud-rules-dsl…", "12 new subdomains in Certificate Transparency logs"],
    "timeToImpact": "days"
  }
]

Triggers — automation-ready booleans

The triggers object precomputes 11 booleans for downstream routing. Filter with WHERE triggers.X = true instead of parsing prose. Drop-in for Zapier / Make / Slack / agent tool calls.

"triggers": {
  "highSeverityEvents": true,
  "possibleAcquisition": false,
  "productSignals": true,
  "infraMigration": false,
  "emailInfraChange": false,
  "brandRefresh": false,
  "communityTraction": true,
  "securityRiskHigh": false,
  "rapidGrowth": true,
  "dormancy": false,
  "needsHumanReview": true
}

Output profiles — same data, different consumers

Pick outputProfile:

analyst (default) — Full intelligence record. ~10–50KB per record.
executive — Decision layer + thin module pointers. Strips verbose subtrees (full subdomain lists, full HN stories, full repo metadata) — keeps everything you need for a Slack alert or dashboard tile. ~3–10KB per record.
raw — Modules only, no decision layer. Backward-compatible mode for users who want pure data and will compute their own intelligence on top.

Note: the SUMMARY KV record always contains the full headline summary regardless of profile.

Portfolio mode — cross-company prioritisation

Pass portfolioId: "my-watchlist-name" and the actor maintains a per-user named key-value store of every company you've researched under that label. Each subsequent run emits relative intelligence on top of the absolute intelligence:

"portfolioContext": {
  "portfolioId": "fintech-watchlist-2026",
  "portfolioSize": 47,
  "rank": "3/47",
  "rankBasis": "maximum alert score across events / correlations / anomalies",
  "percentile": 0.96,
  "outlier": true,
  "rarity": "Uncommon — top 4% of portfolio",
  "reason": "Stands out: top 4% of the portfolio by alert intensity; flagged for human review; top priority is high-severity (correlation:product-launch).",
  "portfolioMedians": { "technicalMaturityScore": 0.62, "securityPostureScore": 0.55, "subdomainCount": 18, "githubStars": 240 }
}

"feed": {
  "portfolioId": "fintech-watchlist-2026",
  "rollingAlerts": [
    { "detectedAt": "2026-04-30T12:14:00Z", "domain": "stripe.com", "eventType": "PRODUCT_SIGNAL", "severity": "high", "headline": "5 new public GitHub repos", "alertScore": 0.9 },
    { "detectedAt": "2026-04-29T08:00:00Z", "domain": "checkout.com", "eventType": "INFRA_MIGRATION", "severity": "high", "headline": "Name servers changed", "alertScore": 0.8 }
  ],
  "topMovers": [
    { "domain": "ramp.com", "rationale": "Alert intensity rose from 0.30 → 0.85 (now: PRODUCT_SIGNAL)" }
  ],
  "newEntrants": [
    { "domain": "klarna.com", "addedAt": "2026-04-29T..." }
  ]
}

This is the difference between "give me a report on Stripe" and "tell me which of my 100 watchlist companies matter most today." The portfolio is the platform.

Comparability mode — benchmark against peers

Pass compareTo: ["domain1.com", "domain2.com"] (max 3) to benchmark. Each peer triggers a separate recursive run and bills its own PPE event ($1.00 per peer). The result is an additional peer-comparison record in the dataset with 8 ranked metrics:

{
  "recordType": "peer-comparison",
  "entityId": "stripe.com|stripe",
  "domain": "stripe.com",
  "comparison": {
    "domain": "stripe.com",
    "peers": ["adyen.com", "checkout.com"],
    "peerErrors": [],
    "metrics": {
      "technicalMaturityScore": { "ours": 0.95, "peers": [{"domain": "adyen.com", "value": 0.85}, {"domain": "checkout.com", "value": 0.78}], "rank": "1/3", "summary": "Highest technical maturity of the 3 compared" },
      "securityPostureScore": { "ours": 0.92, "peers": [...], "rank": "2/3", "summary": "Higher security posture than 1/2 peers" },
      "infraComplexity": { "ours": "high", "peers": [...], "rank": "1/3", "summary": "infra complexity: matches all peers (high)" }
    },
    "headline": "Stripe stands out: Highest technical maturity of the 3 compared.",
    "distinctSignals": ["Highest technical maturity of the 3 compared", "Higher security posture than 1/2 peers"]
  }
}

Use this for dashboards, RFP-prep, or competitive deep-dives. The recursive runs are bounded — peers don't recurse further (their compareTo is forced empty).

Monitoring mode (scheduled runs)

Schedule this actor on a domain you care about — daily, weekly, monthly — and every run after the first returns a populated diff field. The first run for a domain saves a snapshot to the actor's key-value store; the second run loads that snapshot, computes the differences, and emits them under diff.

This is the difference between a one-shot company-research tool and a competitive-intelligence monitoring product. Examples of what diff surfaces:

newSecFilings — new 10-K, 10-Q, or 8-K filed since last run (M&A, earnings, material events)
newGithubRepos — new public repo published (product launches, new SDK)
newSubdomains — new subdomain in Certificate Transparency logs (acquisitions, new internal tools, staging environments standing up)
newTxtRecords — new verification token (Slack workspace, Google site verification, Okta, ad network)
nameServersChanged / mxRecordsChanged — infra migration, M&A signal, email provider change
homepageTitleChanged / homepageDescriptionChanged — rebrand, pivot, messaging shift

The status message at the end of a monitoring run reads Done. 9 sources returned data: … | Δ since last run: 1 new filing, 2 new repos, 14 new subdomains | PPE charge: $1.00.

Use Cases

Sales & BD preparing company briefs before outbound — identify tech stack, e-commerce platform, payment processor, filings status, and social channels to personalize outreach
Competitive intelligence — pull website + GitHub + SEC + Wayback + status-page + changelog + subdomain growth into one report; schedule it weekly to track competitor cadence
VC & PE researchers — assess public-market presence, open-source footprint, npm + Docker distribution, and academic citations on prospective investments; schedule on portfolio companies for change alerts
Journalists & investigators — Wikipedia summary, SEC filings, DNS records, social presence, and Hacker News mentions in seconds — usable directly in stories
M&A due diligence — preliminary technical + public-records checks on acquisition targets, including subdomain enumeration for asset inventory
Marketing strategists — audit a brand's digital footprint across social, tech stack, and operational-transparency surfaces (status page, changelog, security.txt)
Security & SRE teams — Certificate Transparency subdomain enumeration + DMARC posture + security-headers + security.txt presence, all in one pass; schedule for cert-rotation and infra-change alerts
DevRel & developer marketers — track Hacker News momentum, GitHub stars, and changelog updates over time on competitor and partner products

How to Use the API

You can call this actor programmatically from any language.

Python

import requests
import time

run = requests.post(
    "https://api.apify.com/v2/acts/ryanclinton~company-deep-research/runs",
    params={"token": "YOUR_APIFY_TOKEN"},
    json={
        "domain": "stripe.com",
        "includeFinancials": True,
        "includeResearch": True,
        "includeGithub": True,
        "enableMonitoring": True,
        "maxResults": 20
    },
    timeout=30,
).json()

run_id = run["data"]["id"]
while True:
    status = requests.get(
        f"https://api.apify.com/v2/actor-runs/{run_id}",
        params={"token": "YOUR_APIFY_TOKEN"},
        timeout=10,
    ).json()
    if status["data"]["status"] in ("SUCCEEDED", "FAILED", "ABORTED"):
        break
    time.sleep(5)

dataset_id = status["data"]["defaultDatasetId"]
items = requests.get(
    f"https://api.apify.com/v2/datasets/{dataset_id}/items",
    params={"token": "YOUR_APIFY_TOKEN"},
    timeout=30,
).json()

report = items[0]
print(report["summary"]["headline"])
for line in report["summary"]["keyTakeaways"]:
    print(f"  - {line}")
print(f"\nConfidence: {report['summary']['confidence']['explanation']}")

JavaScript

const response = await fetch(
    "https://api.apify.com/v2/acts/ryanclinton~company-deep-research/run-sync-get-dataset-items?token=YOUR_APIFY_TOKEN",
    {
        method: "POST",
        headers: { "Content-Type": "application/json" },
        body: JSON.stringify({
            domain: "stripe.com",
            enableMonitoring: true,
            maxResults: 20,
        }),
    }
);

const [report] = await response.json();
console.log(report.summary.headline);
report.summary.keyTakeaways.forEach((line) => console.log(`  - ${line}`));
console.log(`Confidence: ${report.summary.confidence.explanation}`);

cURL

curl -X POST "https://api.apify.com/v2/acts/ryanclinton~company-deep-research/run-sync-get-dataset-items?token=YOUR_APIFY_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "domain": "stripe.com",
    "enableMonitoring": true,
    "maxResults": 20
  }'

Reading the SUMMARY KV record

For orchestrators using Actor.call, the run's key-value store also contains a lightweight SUMMARY record so you don't need to paginate the dataset:

from apify_client import ApifyClient

client = ApifyClient("YOUR_APIFY_TOKEN")
run = client.actor("ryanclinton/company-deep-research").call(run_input={"domain": "stripe.com"})
summary = client.key_value_store(run["defaultKeyValueStoreId"]).get_record("SUMMARY")["value"]
print(summary["confidence"]["level"], summary["sourcesWithData"], "of", summary["sourcesAttempted"])

How It Works

Input (domain, optional companyName, module toggles)
  │
  ▼
Phase A — Website (sequential, gives us companyName + raw HTML for downstream parsing)
  │   Fetch HTML + response headers, extract title / og:* / favicon / social links,
  │   detect tech stack from in-actor signatures (CMS / framework / analytics / CDN /
  │   e-commerce / payment processors / ad pixels / fonts / security headers)
  │
  ▼
Phase B — All independent modules in parallel (Promise.all)
  ├── Wikipedia           — direct page summary then search fallback
  ├── GitHub              — try 3 org-name guesses, top repos by stars,
  │                         language breakdown, npm scope packages, Docker Hub org
  ├── SEC EDGAR           — EFTS full-text search + atom company search +
  │                         data.sec.gov submissions enrichment (exchange, SIC, address)
  ├── OpenAlex            — citation-sorted academic papers
  ├── DNS                 — A/AAAA/MX/TXT/NS/CAA + _dmarc lookup +
  │                         SPF/DMARC/DKIM parse + email-provider classification
  ├── Subdomains          — Certificate Transparency log enumeration via crt.sh
  ├── Infrastructure      — Wayback first-seen + 10 well-known-file probes
  │                         (security.txt, ai.txt, llms.txt, robots, sitemap, RSS,
  │                          status, changelog, pricing, careers)
  ├── Community           — Hacker News mention count + top stories (Algolia HN API)
  └── Social Media        — 6 platforms, prefers website-discovered links over slug guesses
  │
  ▼
Phase C — Compile report, build summary hero block (deterministic synthesis)
  │
  ▼
Phase D — On enableMonitoring: load prior snapshot, compute diff, save current snapshot
  │
  ▼
Push to dataset (one record), save lightweight SUMMARY to KV store, charge PPE if data found

Data sources

Step	Source	API Used	Auth Required
1	Company website	Direct HTTPS fetch + HTML parsing + in-actor tech-stack signatures	No
2	Wikipedia	REST API (`/api/rest_v1/page/summary`) + search API	No
3	GitHub	REST API (`/orgs/{name}`, `/orgs/{name}/repos`) + search fallback	Optional token (60 → 5,000 req/hr)
3a	npm	`registry.npmjs.com/-/v1/search?text=scope:{org}`	No
3b	Docker Hub	`hub.docker.com/v2/repositories/{org}/`	No
4	SEC EDGAR	EFTS search + browse-edgar atom + `data.sec.gov/submissions/CIK{padded}.json`	No
5	OpenAlex	REST API (`/works?search=`)	No
6	DNS	Node.js `dns.promises` (resolve4/6, resolveMx/Txt/Ns/Caa) + `_dmarc.{domain}` lookup	No
7	Social Media	HTTP GET to profile URLs	No
8	Subdomains	Certificate Transparency logs via `crt.sh` JSON API	No
9	Infrastructure	Direct fetches to `/.well-known/security.txt`, `/ai.txt`, `/llms.txt`, `/robots.txt`, `/sitemap.xml`, `status.{domain}`, `/changelog`, etc. + Internet Archive CDX	No
10	Hacker News	Algolia HN Search API (`hn.algolia.com/api/v1/search`)	No

How much does it cost?

This actor is priced $1 per company researched under Pay-Per-Event. You are only charged when at least one source returns data — runs against parked, unreachable, or invalid domains are not billed.

The actor uses 512 MB of memory (default) and completes in 15–45 seconds for most domains. Phase B runs all 9 independent modules in parallel, so total run time is bounded by the slowest single API rather than sequential summation.

Plan	Monthly Cost	Included PPE budget	Approx companies researched
Free	$0	$5 (built-in)	~5
Personal	$49/month	$49 included	~49
Team	$499/month	$499 included	~499

Apify platform compute (memory-seconds) is billed separately by Apify and is typically a few cents per run.

Tips

Provide companyName explicitly for companies whose website title is a tagline. This dramatically improves accuracy across Wikipedia, SEC, GitHub, and Hacker News.
Schedule on a domain you care about to unlock the diff field — the second run onwards returns what changed since last time. This converts the actor from one-shot research into a competitive-intel monitoring product.
Disable unused modules to cut run time. If you only need website + tech stack + DNS + social, turn off SEC, research, GitHub, subdomains, infrastructure, and community.
Use a GitHub token when researching multiple companies in a batch. Without one, GitHub allows 60 unauthenticated requests/hour. A free personal access token raises this to 5,000/hour.
Combine with other actors — feed the SEC CIK number into the SEC EDGAR Filing Analyzer, or pass the domain into Website Tech Stack Detector for a deeper Wappalyzer-grade fingerprint.
Batch process company lists by calling this actor via the Apify API in a loop. Each run is independent, so you can research hundreds of companies in parallel.

Limitations

Company name detection depends on website title — sites with tagline-only titles (e.g., "Build the Future") will produce poor search results across Wikipedia, SEC, GitHub, and Hacker News unless you provide companyName manually.
SEC EDGAR is US-only — the financials module only finds companies that file with the US Securities and Exchange Commission.
GitHub org matching is heuristic — the actor tries 3 name guesses (domain base, lowercased company name, dashed company name) plus a search fallback. Companies with GitHub org names that differ significantly may not be found.
npm + Docker Hub probes assume the org name matches the GitHub org — works well for stripe, airbnb, vercel; doesn't work when the company uses a different naming convention on each platform.
Tech stack detection covers ~60 high-value signatures — Cloudflare, Next.js, Stripe, Shopify, GA4, Segment, etc. It is not a full Wappalyzer replacement (Wappalyzer covers 3,000+ signatures). For deep technographics, combine with the Website Tech Stack Detector.
Subdomain enumeration depends on crt.sh — only finds subdomains that have ever been issued a public TLS certificate. Internal-only subdomains, subdomains using wildcard certs, and subdomains using private CAs are not visible.
Wikipedia search may match wrong entity — common company names (e.g., "Apple") may match the article for a different entity. Providing the full company name helps.
Hacker News mentions are name-based — common company names will produce false positives in the mention count. Top stories are usually correct.
USPTO patents are off by default — the USPTO PatentsView API was retired in August 2024 and the replacement requires an API key. To keep this actor truly key-free, patents are returned as {found: false, count: 0} rather than gating behind a key requirement.
diff field is empty on the first run for a domain — the actor saves a snapshot at the end of the first run; the second run is when comparison kicks in.

What this actor does NOT do

To set expectations honestly:

It does NOT replace Clearbit, ZoomInfo, Apollo, or PitchBook. Those are licensed commercial enrichment APIs with employee counts, revenue ranges, technographic confidence scores, intent signals, and verified contact data. This actor uses official + open public sources only.
It does NOT find verified personal email addresses. For email enrichment use Hunter.io, Apollo.io, or the Person Enrichment Lookup actor (which wraps People Data Labs).
It does NOT scrape LinkedIn employee profiles or LinkedIn employee counts. LinkedIn is anti-scraping; for LinkedIn data use a dedicated LinkedIn scraper actor.
It does NOT pull website traffic data. SimilarWeb / Semrush / Ahrefs are paid for a reason — public surfaces don't expose this.
It does NOT compute a comprehensive tech-stack fingerprint. It uses ~60 high-value signatures over the homepage HTML; for full Wappalyzer-grade analysis (3,000+ signatures, login-walled SPA support, JS execution) use the Website Tech Stack Detector actor.
It does NOT classify the company by NAICS / SIC beyond what the SEC publishes. Public companies get the SEC SIC code; private companies do not.
It does NOT score the company for sales fit, lead quality, or risk. It returns structured facts; you apply your scoring on top.

Combine with other Apify actors

Actor	How to combine
Lead Enrichment Pipeline	Company research is built into step 4 of this pipeline — use the pipeline for full lead enrichment with email, phone, scoring
Person Enrichment Lookup	Pair with this actor for company-level intel: email + verified contact for individuals at the company
Website Contact Scraper	Scrape contacts first, then run this actor on the domains to enrich with company intel
Website Tech Stack Detector	Deep Wappalyzer-grade tech-

Company Research Scraper — Deep Company Intelligence Data

scrapepilot/aiendn

Extract deep company intelligence data from any domain. Get company name, description, employees, tech stack, social links, GitHub stats, Wikipedia summary, executive leadership, recent news, and contact emails — from 8 sources in parallel. $0.06 per company.

Scrape Pilot

Talent Intelligence — Brain Drain & Flight Risk

ryanclinton/talent-intelligence-report

Generate a comprehensive workforce competitive intelligence dossier for any company by orchestrating 7 data sources in parallel — job market postings, USPTO and EPO patents, ORCID researcher profiles, company deep research, GitHub repositories, and SEC insider trading filings.

Ryan Clinton

🏢 Company Data Aggregator — LEI, SEC Funding & Tech

nexgendata/company-data-aggregator

Aggregate company intelligence from official sources: legal identity (GLEIF LEI), real funding signals (SEC Form D), and a domain/tech profile (WHOIS, DNS, GitHub, tech stack). No paid APIs, no keys.

NexGenData

1.0

DNS Records Checker - Domain DNS Lookup Data

benthepythondev/dns-records-checker

Look up public DNS records for domains and export A, AAAA, CNAME, MX, NS, TXT, SOA and CAA records as structured data.

Ben

Company Domain & Social Links Finder

crawlerbros/company-domain

Given a company name, return the company's official website domain and its social media links (LinkedIn, X/Twitter, Facebook, Instagram, YouTube, TikTok, GitHub).

Crawler Bros

136

2.0

Bulk Email DNS Audit Scraper

taroyamada/email-deliverability-portfolio-audit

Scrape domain DNS data to extract SPF, DKIM, DMARC, and BIMI records. Generate structured deliverability readiness reports for client portfolios.

naoki anzai

Social Contact Resolver

novashieldai/social-contact-resolver

Resolve a domain to social media profiles and contact information. Extracts social links, emails, phones, JSON-LD data, DNS verification tokens, and meta tags from any website.

Ali haydar Karadaş

Dns Lookup API

vivid_astronaut/dns-lookup

Fabio Suizu

Dns Lookup Actor

balathon/dns-lookup-actor

An Apify Actor that performs DNS lookups for given domain names and record types, with optional custom DNS server support.

Balasai Sigireddy

Trustpilot Company Reviews Scraper

thescrappa/trustpilot-company-reviews-scraper

Scrape Trustpilot company reviews by domain. Extract review text, ratings, reviewer names, dates, verification status, company replies, filters, and pagination metadata for reputation monitoring and competitor research.

Scrappa