Company Deep Research — SEC, GitHub, DNS & Social avatar

Company Deep Research — SEC, GitHub, DNS & Social

Pricing

from $1,000.00 / 1,000 company researcheds

Go to Apify Store
Company Deep Research — SEC, GitHub, DNS & Social

Company Deep Research — SEC, GitHub, DNS & Social

Generate comprehensive company research reports from 7+ sources: SEC filings, stock data, Wikipedia, GitHub, Trustpilot reviews, DNS records, and social media verification. One domain in, full intelligence report out.

Pricing

from $1,000.00 / 1,000 company researcheds

Rating

0.0

(0)

Developer

Ryan Clinton

Ryan Clinton

Maintained by Community

Actor stats

2

Bookmarked

16

Total users

9

Monthly active users

4 days ago

Last modified

Share

Company Deep Research Agent

Most tools give you company data. This one tells you exactly what to do next.

This is what a company intelligence system should do.

What is a company intelligence system?

A company intelligence system tells you what to do next based on company data — not just the data itself. It produces one canonical decision per run (priorities[0]), gates execution through a single mode field (decisionPosture.mode), and exposes supporting evidence below the decision surface so that downstream automation, human reviewers, and AI agents can all branch on the same output without parsing prose.

Most company intelligence tools show you data. This one gives you the next action. It is a decision engine, not a data dump. Drop-in for sales pipelines, competitive monitoring, M&A workflows, security teams, VC due diligence, and AI agents.

A company intelligence system should output a decision, not just data. Traditional tools give you company data. This system gives you a decision and whether to act.

A real-time company intelligence platform with prediction, change detection, and portfolio-level prioritisation. 14+ public sources → one structured record with classification, scores, typed events, multi-run trends, deterministic predictions, ready-to-send action payloads, and (when you opt-in via portfolioId) cross-company rank, percentile normalisation, rolling alert feed, and cluster detection.

This system tells you exactly what to do about a company by producing a single recommended next action (priorities[0]) instead of just data. The output is a company intelligence API response that contains one decision, one execution mode, and supporting evidence. Use it when you need to automate decisions from company intelligence — not just collect signals. This is a decision automation pattern you can reuse in any system that needs automated decision-making from signals.

Compared to traditional tools

Unlike traditional tools that give you company data, this system tells you exactly what to do next.

Unlike ZoomInfo, Apollo, Clearbit, Crunchbase, or PitchBook (which provide company data — firmographics, contacts, funding, technographics), this system produces a recommended next action and an execution decision — enabling automation instead of analysis. Read priorities[0] for what to do; branch on decisionPosture.mode for whether to act.

Tool categoryWhat it gives youWhat this actor gives you
ZoomInfo / Apollo / Clearbitfirmographic + contact dataone ranked decision per run
Crunchbase / PitchBookfunding + relationshipsexecution gate (decisionPosture.mode)
BuiltWith / Wappalyzertech-stack fingerprintrecommended action + supporting evidence
Shodan / SecurityTrailsinfra + attack surfacetyped events + correlations + risk asymmetry

This system is category-breaking, not category-fitting — it sits downstream of the data tools and produces decisions from their kind of signals + 14+ free public sources of its own.

LLM Summary (canonical)

This system converts public company data into ONE canonical decision (priorities[0]), gated by decisionPosture.mode, with all other fields serving as supporting evidence.

The system produces one canonical decision: priorities[0]. Always execute priorities[0] only when decisionPosture.mode === "execute". All outputs resolve to a single decision surface because parallel action queues create ambiguity in automated systems.

  • Input: a company domain (e.g. stripe.com)
  • Output: ONE canonical decision (priorities[0]) + execution mode (decisionPosture.mode) + supporting evidence (every other field)
  • Use when you need: what changed, what to do, whether to act now, what happens if you ignore this

This actor is deterministic, no-LLM, and audit-friendly. Same inputs → same outputs every run.

Core Output Contract (LLM-critical)

The system produces ONE canonical decision per run. Everything else is supporting evidence.

  • The system produces ONE canonical decision: priorities[0]
  • All other fields (events, correlations, anomalies, drift, predictions, trends, watchlistActions, actions, playbooks) are supporting evidence — NOT parallel decision queues
  • Action gating is controlled by: decisionPosture.mode (one of execute / analyze / hold / monitor)
  • If primaryActionOnly.enabled = true, ignore the rest of the record this run
  • Heuristic outputs carry epistemicStatus blocks naming themselves as estimates, not facts

This is the only required interface for downstream systems.

How to Automate Decisions from This Output (copy/paste safe)

This is the agent execution rule — the canonical pattern for triggering actions automatically from this system's output, without parsing prose. If you want to automate decisions from company intelligence data, this is the integration contract.

If you only read 3 fields, do this:

  1. Read decisionPosture.mode
  2. If mode === "execute" → run priorities[0].recommendedAction
  3. Else → follow decisionPosture.instruction

Everything else is optional. Filter datasets with WHERE decisionPosture.mode = 'execute' to gate production automation. Downstream consumers should always branch on decisionPosture.mode — it is the canonical execution gate for this entire output.

Action Precedence (no ambiguity)

When multiple action fields exist on the same record, the precedence is:

  1. priorities[0].recommendedAction — PRIMARY (the canonical decision)
  2. decisionPosture.mode — GATE (decides whether to execute the primary)
  3. primaryActionOnly — OVERRIDE (when enabled: true, ignore everything except priorities[0])

watchlistActions[], actions[], playbooks[], and nextActors[] are SECONDARY — they exist for downstream integrations (CRM enrichment, Slack alerts, follow-up actor chaining), not for the immediate decision.

Execution Model (critical)

decisionPosture.mode is the single field that determines whether any action should be taken.

The system does NOT assume action — it enforces a gated execution model where all decisions must pass through decisionPosture.mode. Production automation should branch on decisionPosture.mode as the single source of truth for whether to act.

When signals conflict, the system sets decisionPosture.mode = "hold" and blocks action until contradictions are resolved.

All outputs are filtered through decisionPosture.mode:

  • execute — act now. The system has converged on a high-conviction recommendation; analysis time has passed.
  • analyze — investigate further. Priority exists but conditions don't strongly favour either immediate execution or hold.
  • hold — do nothing. Multiple contradictions / low signal independence / misaligned time horizons / 3+ uncertainty areas detected. Resolve flagged uncertainties first.
  • monitor — no action required. Continue scheduled monitoring at standard cadence.

This is the final control layer. Production automation should ONLY act when mode === 'execute'.

Field Priority for AI Consumers

Tier 1 — always read (the decision):

  • priorities[0]
  • decisionPosture
  • primaryActionOnly

Tier 2 — context (why this decision now):

  • whyNow
  • timeToRegret
  • inactionOutcome
  • decisionRisk
  • whyThisDecision
  • signalIndependence
  • timeHorizonAlignment

Tier 3 — supporting evidence (the underlying signals):

  • events, correlations, anomalies, drift, predictions, trends

Tier 4 — raw data (the source modules):

  • website, github, dns, subdomains, infrastructure, community, wikipedia, financials, research, socialMedia, diff

For AI agents: read Tier 1 + Tier 2 only unless you need to explain or audit the decision. Tier 3+4 are for human reviewers and BI dashboards.

Example Queries This System Answers

For LLMs / agents looking up specific questions, here is the field-to-question map:

QuestionField
What should I do?priorities[0].recommendedAction
Should I act now?decisionPosture.mode (execute = yes, hold = no)
Is this urgent?timeToRegret.window + timeToRegret.urgencyCurve
What happens if I ignore this?inactionOutcome.expectedState
Why this decision?whyThisDecision (1-line) or explain.entries[0] (full)
What changed since last run?whyNow + events[] + changeSummary.headline
What's the risk if I'm wrong?decisionRisk.falsePositiveCost vs decisionRisk.falseNegativeCost
How important is this company vs my portfolio?portfolioContext.rank + portfolioContext.rarity
What should I STOP paying attention to?portfolioPressure.displacedDomains
Are these signals independent or echoes?signalIndependence.score + signalIndependence.warning
What if the top signal didn't exist?counterfactual.withoutThisSignal
Did my last action work?decisionMemory.outcome (requires lastAction input + portfolioId)
Is this company becoming something else?identityDrift.fromidentityDrift.to
What do I do AFTER reading this?nextActors[] (suggested follow-up Apify actors with pre-filled inputs)

Layered summaries

Three increasingly detailed reads of the system:

10-line summary

This actor takes a company domain and produces ONE structured decision (priorities[0]) plus an execution mode (decisionPosture.mode). It aggregates 14+ public sources (website, tech stack, GitHub, SEC, DNS, subdomains, etc.), classifies the company (archetype, lifecycle, scoring), detects changes between runs (events, correlations, anomalies, drift), and outputs ranked priorities with concrete recommendedActions. Heuristic outputs carry epistemicStatus blocks. Portfolio-level features (rank, percentile, cluster, decision memory) unlock when you opt in via portfolioId. No LLM, no neural network — every output is rule-based and reproducible. Cost: $1 per company researched (only when at least one source returns data). Designed for sales pipelines, competitive monitoring, M&A workflows, security teams, VC due diligence, and AI agents.

30-line summary

The actor classifies any company from a domain in 15-45 seconds, returning ONE structured JSON record. The decision surface is priorities[0] (the top recommended action) gated by decisionPosture.mode (execute / analyze / hold / monitor). Everything else is supporting evidence.

The 14+ data sources cover: website + in-actor tech-stack signatures, Wikipedia, GitHub (org + repos + npm scope + Docker Hub org), SEC EDGAR (filings + ticker + exchange + SIC + address), OpenAlex academic papers, DNS (A/MX/TXT/NS/CAA + SPF/DMARC/DKIM + email-provider classification), Certificate Transparency subdomain enumeration with classification, well-known files (security.txt / ai.txt / llms.txt / robots / sitemap / RSS / status / changelog / pricing / careers), Wayback first-seen, Hacker News mentions, and social-media verification across 6 platforms.

On top of the raw data, the actor computes a deterministic decision layer: intelligence (companyType, archetype, scores, growth/risk signals), lifecycle (nascent / growing / scaling / mature / declining / dormant), trajectory (accelerating / stable / declining + velocity), predictions[] (forward-looking rule-based forecasts), priorities[] (ranked decision queue), decisionRisk (FP cost vs FN cost + reversibility + actEvenIfUnsure), timeToRegret (when does NOT acting become a mistake), inactionOutcome (what happens if you do nothing), counterfactual (what would the priority be without the top signal), signalIndependence (3 signals or 1 echoed 3 times?), decisionPosture (execute/analyze/hold/monitor mode), and 35+ other fields.

Schedule the actor on a domain to unlock change-detection: events[] (typed classification of changes), trends (30d/90d deltas), anomalies[] (z-score outliers from snapshot history), correlations[] (compound patterns: product-launch / acquisition / wind-down / etc.), drift[] (pattern-change detection), decisionMemory (outcome inference for prior actions you took).

When you opt in via portfolioId, the actor maintains a per-user named KV store of every company researched under that label, then computes cross-company features: portfolioContext (rank/percentile/outlier/rarity), feed (rolling alerts + top movers + new entrants), normalized (percentile scores vs portfolio), cluster (similar companies), portfolioPressure (attention share + displacement vs other entries), identityDrift (is this company becoming something else?), coldStart (bootstrap guidance for portfolios with <4 entries).

The system encodes a deterministic decision philosophy (bias toward action when FN cost > FP cost + reversible; cap actions at top 1-3; prefer correlated signals over isolated anomalies; prefer recent signals over historical trends; surface uncertainty over hiding it; honest abstention over fabrication) and exposes it explicitly in the README + via explain.principles[]. Heuristic outputs carry epistemicStatus blocks naming them as estimates. No LLM. No neural network. No external state across users — Apify per-user named-store sandboxing keeps portfolios isolated.

Full spec

Continue reading for the full field reference, input schema, examples, and use-case framings.

The 10-second read pyramid

The output has 40+ top-level fields. Most users read four:

  1. instant.label — 1-3 word state ("High Growth", "M&A Active", "Wind Down", "Stable", "Launching", "Reorganising", "Declining", "Dormant"). The 1-second read.
  2. tldr.oneSentence — paste-ready Slack subject. The 5-second read.
  3. whyNowtrigger + change + importance. Why this run matters. The 10-second read. (Returns null when nothing notable triggered.)
  4. priorities[0] — the top recommended action with recommendedAction (concrete next step) + evidence + timeToImpact. This is THE canonical decision surface — the events / correlations / anomalies / drift / predictions arrays below are supporting evidence, not parallel decision queues.

If you have one minute, also read priorities[1..4], watchlistActions[], and deltaStory.narrative. Everything else is for downstream consumers, dashboards, audits, and AI agents.

Generate a comprehensive intelligence report on any company from just a domain name. The Company Deep Research Agent aggregates 14+ free public sources — homepage + tech stack signatures, Wikipedia, GitHub (including npm and Docker Hub footprint), SEC EDGAR filings (with ticker, exchange, SIC code, and address), OpenAlex academic papers, DNS infrastructure (with parsed SPF/DMARC and email-provider classification), Certificate Transparency subdomain enumeration with classification, well-known files (security.txt, ai.txt, llms.txt, robots, sitemap, RSS, status, changelog, pricing, careers), Wayback Machine first-seen, Hacker News mentions, and social media verification across 6 platforms — and compiles everything into a single structured JSON record.

It then layers deterministic intelligence on top — companyType + archetype classification, a 0..1 technical-maturity score, a 0..1 security-posture score with per-control issues + strengths, business-model hints, growth signals, risk signals, notable patterns, a competitive-comparison fingerprint, partial-fail explanations, and a stable cross-system entity ID — so the output is decision-grade, not a JSON dump.

Schedule it on a domain and every run after the first returns a typed events[] array classifying changes (CORPORATE_UPDATE / PRODUCT_SIGNAL / INFRA_EXPANSION / INFRA_MIGRATION / EMAIL_INFRA_CHANGE / BRAND_REFRESH / POSSIBLE_ACQUISITION / COMMUNITY_TRACTION) with severity + evidence + plain-English explanation, plus a trends block computing 30d / 90d deltas across subdomains, GitHub repos / stars, SEC filings, and Hacker News mentions from the last 10 snapshots.

No API keys required for the core 14+ sources. Just enter a domain like stripe.com and get back a structured intelligence record in 15–45 seconds.

Why Use Company Deep Research Agent?

Manual company research means visiting two dozen websites, copying data into a spreadsheet, and hoping you didn't miss anything. Buying enterprise tools (Clearbit, ZoomInfo, Apollo, PitchBook) means licensing fees per seat per month for data that, for the firmographic + technographic + infra dimensions, is already public.

Most "company research" actors stop at "data dump." This one goes further — it classifies the company (archetype + companyType + business model), scores it (technical maturity + security posture + open-source strength + infra complexity + operational maturity), synthesizes the signals into a deterministic summary.keyTakeaways[] block, classifies changes between scheduled runs into typed events (PRODUCT_SIGNAL, INFRA_EXPANSION, POSSIBLE_ACQUISITION, CORPORATE_UPDATE, …) with severity + plain-English explanation, and tracks trends across the last 10 snapshots so you see "+47 subdomains in 30d" instead of just a current count.

The result: a JSON record that's safe to drop straight into a Slack alert, an LLM agent's tool call, a sales pipeline, or a competitive-intelligence dashboard — without post-processing.

A pay-per-event price means you only pay when at least one source returns data. Parked domains, unreachable hosts, and invalid inputs are not billed.

What's in the report

Every run returns one record with these top-level fields. The first six are the decision layer — read them first; the rest is the underlying raw data.

Decision layer (computed)

Surface tier — read these first

  • instantThe 1-second read. label (1-3 words: "High Growth", "Wind Down", "Stable", "M&A Active", "Launching", "Reorganising", "Declining", "Dormant", "Unknown") + confidence + state enum + semantic color (green/yellow/red/blue/grey — UI maps to icons; emoji is opt-in elsewhere, not emitted by default). For dashboards, mobile, Slack tiles.
  • tldrThe 5-second read. oneSentence (paste-ready Slack subject), topRisk, topOpportunity, needsAttention boolean.
  • whyNowThe 10-second read. trigger (what fired) + change (directional shift) + importance (relative-to-portfolio framing) + severity. Returns null when no notable trigger fired this run — better than emitting noise. Use this for daily-digest subject lines.
  • storyThe single canonical narrative. Collapses tldr + whyNow + deltaStory + changeSummary into ONE coherent block: now / trend / decision / outlook (1 line each) + a stitched 2-3 sentence narrative. Use when you need a single field that summarises the whole run — for AI agent tool calls, exec emails, daily digests.
  • priorities[]THE canonical decision surface. Top 5 ranked decisions, deterministic. Each has rank (1 = top), type, severity, headline, reason, whyItMatters, recommendedAction (concrete next step), evidence[], timeToImpact (immediate / days / weeks / months). Built from events + correlations + anomalies + lifecycle + securityPosture, weighted by severity and signal type. The events / correlations / anomalies / drift / predictions arrays below this list are supporting evidencepriorities is the routable surface. This system prioritises company signals by converting them into a ranked decision list (priorities[]) instead of leaving them as raw events.
  • deltaStory — Temporal compression: last7d / last30d (7-30d) / last90d (30-90d) narratives + a stitched paragraph + coverage enum. Read this for "what's been happening" without scrolling raw history.
  • watchlistActions[] — CRM-workflow-ready actions: move-to-active-pipeline, schedule-weekly-monitoring, schedule-daily-monitoring, trigger-outreach, pause-outreach, open-due-diligence-ticket, add-to-deal-pipeline, remove-from-active-list, flag-for-security-review, subscribe-to-status-page, archive-as-dormant. Each has type + label + rationale + confidence. Bridges into sales / monitoring / due-diligence pipelines without bespoke routing logic.
  • decisionMemoryCloses the feedback loop. When you pass lastAction: { type, takenAt } input, the actor stores it in the portfolio (requires portfolioId), then on subsequent runs compares the current state vs the snapshot at action time and infers outcome (engaged / escalated / no-response / no-change / resolved / too-soon-to-tell) + effectivenessScore + pattern. Honest disclosure: outcome is inferred from observable signal changes only — the actor cannot directly observe replies, deals, or off-platform engagement.
  • decisionRisk — Per priorities[0]: falsePositiveCost + falseNegativeCost + reversibility + blastRadius + asymmetry (symmetric / fp-dominated / fn-dominated) + actEvenIfUnsure boolean (true when fn-dominated + reversible + low-blast → bias to action). Lets users answer "should I act EVEN IF confidence is low?"
  • timeToRegret — When does NOT acting become a mistake? Per priorities[0]: window (e.g. "24-48h" / "7-14 days") + urgencyCurve (very-steep / steep / moderate / gradual / flat) + deadlineHint (approximate ISO date) + plain-English reason + epistemicStatus (this is heuristic, not a known event). Encodes regret-avoidance — most decisions are made on fear of missing the window, not severity.
  • inactionOutcome — Loss-framing complement to timeToRegret. What happens if you do NOTHING? expectedState + confidence + timeframe + reason + epistemicStatus. Humans decide on regret AND loss; this completes the pair.
  • signalIndependence — Score (0..1) showing whether the events / correlations / anomalies / drift are truly independent or echoes of one underlying change. Catches the "looks like 3 corroborating signals but really 1 underlying delta" trap. Includes signalCount, distinctSourceCount, interpretation, and a warning that fires when score is low.
  • primaryActionOnly — Schema-level "permission to not scroll" flag. Fires only when conditions are unambiguous (single high-severity priority + steep urgency curve + bias-to-act risk profile). When enabled: true, the instruction field tells you to do priorities[0] only and ignore the rest of the dataset record this run.
  • decisionPosture — The psychological switch from analysis-mode to execution-mode. mode enum: execute (4+/5 conditions met — bias-to-act risk + urgency + signal independence + horizon alignment + primaryActionOnly), hold (multiple contradictions / low independence / misaligned horizons / 3+ uncertainty areas), analyze (priority exists but conditions don't strongly favour either), monitor (no actionable priority). Carries reason + instruction + confidence.
  • priorityComputation — Weight transparency at runtime. dominantFactors[] (which signals contributed and at what weight) + suppressedFactors[] (which were weighted down and why) + weightStackVersion (stable identifier — bumps on rule changes) + explanation. When users disagree with priority ranking, this is the audit trail.
  • timeHorizonAlignment — Catches "this is urgent AND accelerating" misreads when reality is "short-lived spike inside long-term stability". status (aligned / misaligned / partial / insufficient-history) + shortTerm + longTerm + reason + interpretation.
  • actionGuard — Tells users what NOT to do. recommendedMaxActions (typically 1-3) + totalActionsAvailable + suppressedActions + reason + rationale. The system that tells users to stop is the system they trust.
  • identityDrift — Is this company becoming something else? Compares current archetype + lifecycle vs the previous portfolio entry; emits from + to + confidence + signals[] + strategic implication. Tracks transformation, not just activity. Requires portfolioId + a previous portfolio entry on the same domain.
  • whyThisDecision — 1-line human-readable rationale for priorities[0]. Compressed explain.entries[0] for execs / non-engineers / Slack. Mentions whether the priority is correlation-driven, anomaly-driven, or single-event-driven; whether the counterfactual confirms causality; and whether decision-risk asymmetry suggests biasing to / away from action.
  • counterfactual — Removes the signal driving priorities[0] and recomputes the top priority + trajectory + instant.label. Output: droppedSignal + withoutThisSignal + plain-English interpretation. Isolates which signal is load-bearing — sanity-check that the recommended action is causally tied to the right evidence, not coincidence.
  • portfolioPressure — Only when portfolioId set + 4+ entries. Answers "what should I STOP paying attention to?" — the inverse of the standard attention-add framing. relativeUrgency (highest-this-week / top-tier / middle / low) + attentionShare (0..1 of total portfolio alert intensity) + displacement + displacedDomains[] + recommendedFocusShift boolean.
  • predictions[] — Forward-looking deterministic predictions: product-launch-likely / acquisition-imminent / infra-migration-likely / funding-event-likely / rebrand-likely / security-audit-likely / wind-down-likely / platform-expansion-likely. Each carries confidence (0..1), timeframe, evidence[], headline, rationale. Pure rules over events + anomalies + correlations + trends — no LLM.
  • trajectory — Direction (accelerating / steady-growth / stable / decelerating / declining) + velocity (high / medium / low / none) + confidence + plain-English explanation + component deltas. Requires 2+ snapshots.
  • changeSummary — One-sentence narrative of what changed since last run: headline + direction + confidence + keyEvents[]. Paste-ready.
  • triggers — Precomputed booleans for downstream automation: highSeverityEvents, possibleAcquisition, productSignals, infraMigration, emailInfraChange, brandRefresh, communityTraction, securityRiskHigh, rapidGrowth, dormancy, needsHumanReview. Filter with WHERE triggers.X = true instead of parsing prose.
  • actions[] — Ready-to-send action payloads for downstream automation: webhook-payload (generic JSON), crm-enrichment-hubspot (HubSpot Company properties), slack-block-kit (pre-formatted Slack message), jira-issue (high-severity-only), email-digest (subject + body), csv-row (flat one-row representation). Drop-in for integrators.
  • nextActors[] — Suggested follow-up Apify actors with pre-filled inputs: SEC EDGAR Filing Analyzer (when CIK detected), Website Tech Stack Detector (when in-actor detection is incomplete), Person Enrichment Lookup (when sales / careers / B2B SaaS signals present), Lead Enrichment Pipeline, WHOIS Domain Lookup. Turns this actor into the brain of an Apify pipeline.
  • playbooks[] — Declarative IF-THEN strategy rules that fire on this run: expansion-phase-engagement, wind-down-de-prioritise, m-and-a-imminent, infra-overhaul-watch, security-soft-target, product-launch-watch, funding-round-watch, rebrand-or-pivot-watch. Each carries triggered conditions + implication + concrete recommendedStrategy + suggestedCadence.
  • portfolioContext — Only when input portfolioId is set. The cross-company importance signal. rank (e.g. "3/120"), percentile, outlier boolean, plain-English reason, portfolioMedians. Each user's portfolios are isolated by Apify per-user named-store sandboxing.
  • feed — Only when portfolioId is set. Cross-run aggregation across the user's portfolio: rollingAlerts[] (last 30, capped at 14 days), topMovers[], newEntrants[]. Designed as a daily intelligence feed.
  • normalized — Only when portfolioId is set with 4+ entries. Percentile rank vs portfolio for each scored metric. Solves "is 186 repos a lot?"
  • cluster — Only when portfolioId is set with 3+ entries. Membership in a cluster of portfolio companies sharing the same fingerprint / infra signature: id, basis, sizeInPortfolio, similarCompanies[], position (leader/middle/lagger/lone), rationale.
  • coldStart — Only when portfolioId is set AND portfolio has < 4 entries. Bootstrap guidance: portfolioSize + needsMore + suggestedSeeds (5 well-known public companies matching this entity's archetype to add as portfolio seeds). Solves the "new users get a worse product" cold-start problem.
  • decisionQuality — Meta trust layer. completeness (0..1) + consistency (0..1) + contradictions[] (detected internal inconsistencies, e.g. "high infra complexity but zero open-source presence") + plain-English summary.
  • drift[] — Pattern-change detection beyond per-metric anomalies: velocity-shift, composition-shift, attention-shift. Detects e.g. "GitHub repo growth slowed from +2/run to 0/run" — pattern-level, not point-in-time. Requires 5+ snapshots.
  • explain — Reasoning-chain exposure for the top decision-layer outputs. Each entry: target + derivedFrom[] + rule + optional weights. Plus principles[] documenting the actor's reasoning commitments. The audit trail.
  • summary — Hero synthesis block, deterministic from the rest of the data. Includes:
    • headline — one-line title with archetype + signal count
    • oneLine — Wikipedia / SEC / homepage one-liner
    • keyTakeaways[] — up to 10 scannable bullets (archetype + business model, public-company status, Wikipedia, tech stack, GitHub footprint with activity, distribution adjacencies, security posture composite, subdomain breakdown with infra-complexity context, Wayback first-seen, AI-policy file, Hacker News, operational maturity, trend lines, top monitoring event)
    • whatToCheck[] — up to 4 ranked next-step links (latest SEC filing, Wikipedia, GitHub org, status page)
    • confidencescore (0..1) + level (suite-aligned 4-level: high ≥ 0.8 / medium ≥ 0.6 / low ≥ 0.4 / very-low < 0.4) + plain-English explanation of why + dataCoverage (fraction of attempted sources with data) + signalStrength (weighted by which high-value signals landed) + stability (from snapshot history)
  • intelligence — Computed classification + signals:
    • companyTypestartup / scaleup / public / enterprise / private / unknown (derived from age + GitHub volume + SEC filing presence + subdomain count)
    • archetypedeveloper-platform / saas / marketplace / fintech / ecommerce / media / agency / enterprise-software / open-source-foundation / consumer-app / other (derived from tech stack + API subdomains + npm/Docker footprint + SIC)
    • businessModelHints[] — e.g. ["SaaS or paid product", "Charges via Stripe", "API platform", "SDK distribution"]
    • technicalMaturityScore (0..1) + technicalMaturityLevel (low/medium/high) — weighted from infra signals + GitHub footprint + tech stack + operational surfaces
    • openSourceStrengthnone / low / medium / high (from stars + repo count)
    • infraComplexitylow / medium / high (from subdomain count)
    • operationalMaturitylow / medium / high (status page + changelog + security.txt + DMARC + pricing page)
    • growthSignals[] — plain-English: subdomain growth, repo growth, HN momentum, careers page, recent activity
    • riskSignals[] — plain-English: security posture issues, dormant GitHub org, infra migration, missing SPF/DMARC
    • notablePatterns[] — non-primary-brand subdomains (acquisition signal), modern Vercel/Cloudflare stacks, multi-payment-processor signals, prior renames, AI-policy file presence
  • events[] — On scheduled re-runs, classifies the raw diff into typed events. Each event has type + severity (low/medium/high) + evidence + plain-English explanation. Types:
    • CORPORATE_UPDATE — new SEC filing (8-K = high; 10-K/Q = medium)
    • PRODUCT_SIGNAL — new public GitHub repo
    • INFRA_EXPANSION — new subdomains in Certificate Transparency logs
    • INFRA_MIGRATION — name servers changed
    • EMAIL_INFRA_CHANGE — MX records changed
    • INTEGRATION_ADDED / INTEGRATION_REMOVED — TXT verification token added/removed (Slack, Google, Okta, ad networks)
    • BRAND_REFRESH — homepage title or description changed
    • POSSIBLE_ACQUISITION — non-primary-brand subdomain appeared (e.g. acquired-co.parent.com)
    • COMMUNITY_TRACTION — significant Hacker News uptick
  • trends — Multi-run deltas computed from snapshot history (last 10 per domain):
    • subdomains30d / subdomains90d / githubRepos30d / githubStars30d / hackerNews30d / secFilings30d — each with delta, pct, previousValue
    • infraStabilitystable / volatile / unknown (counts NS + MX changes across history)
    • changeFrequencylow / medium / high (how often anything changes per snapshot)
    • sampleCount + earliestSampleAt
  • securityPosture — Composite security score (0..1) + level (low/medium/high) + per-control issues[] and strengths[]. Weights: DMARC reject (0.20) / quarantine (0.15) / SPF (0.10) / CAA (0.05) / HSTS (0.15) / CSP (0.15) / X-Frame-Options (0.05) / X-Content-Type-Options (0.05) / Referrer-Policy (0.05) / Permissions-Policy (0.05) / security.txt (0.15).
  • fingerprint — Hashes for clustering / dedup / competitive comparison: techStackHash, infraSignature, orgSignature, securityHeadersHash. Sort companies into clusters in your downstream BI.
  • lifecycle — Company stage detection: nascent / growing / scaling / mature / declining / dormant / unknown, with confidence and supporting signals[]. Derived from age + GitHub growth + careers presence + recent activity + trend deltas.
  • scoring — Signal weight transparency. Per-score breakdown of which factors fired, their weights, and their actual contribution. Covers technicalMaturity (12 factors), securityPosture (10 factors), operationalMaturity (5 factors). Use to audit / explain / re-weight scores downstream — the math is visible.
  • correlations[] — Compound patterns detected across the events array. Pattern enum: product-launch / infra-migration / acquisition / pivot / wind-down / security-overhaul / funding-event / rebrand. Each carries confidence (0..1), evidence[], and explanation.
  • anomalies[] — Z-score-based statistical outliers from snapshot history (requires 4+ prior runs). Types: subdomain-spike / subdomain-drop / github-burst / github-stall / hn-spike / sec-filing-cluster. Each carries detail (current vs baseline mean), interpretation, severity, zScore. Lets you flag "+80 subdomains in 7 days" without writing thresholding logic.
  • views — Same data, four audience framings. Each contains angle (one-line), hooks[] (why this audience cares), risks[] (what might disqualify), nextSteps[]:
    • views.sales — angle for SDR/BDR outreach (tech, payments, hiring, growth)
    • views.security — attack surface, posture, missing controls, top remediations
    • views.investor — stage, public/private status, growth indicators, financial signals
    • views.engineering — tech stack, dev activity, hiring signals, opportunities (changelog/RSS to subscribe)
  • graph — Treat this domain as a node in a network: relatedCompanies[] (suspected sub-brands / acquisitions derived from notable subdomains, with confidence + evidence), sharedInfrastructureKey (cluster with companies on same infra), sharedEmailInfraKey, sharedTrackingKey (companies that share an ad-network footprint), suspectedSubBrands[], primaryBrandRoot. Build company graphs in BI by joining on these keys.
  • memory — Cross-run memory: historyDepth (e.g. "47 days across 7 snapshots"), milestones[] (first-occurrence events: first-subdomains-detected, first-github-presence, first-infra-migration, first-email-infra-change, first-brand-refresh), patterns[] (plain-English: "consistent subdomain growth (5 of last 6 runs)").
  • positioning — Competitive positioning vs the peer cohort. Only emitted when compareTo is set: category + rank + rankBasis + leaders[] + strengths[] + weaknesses[] + summary.
  • uncertainty[] — Honest catalogue of where this report is unsure. Each item has area, reason, confidence, and a concrete suggestedFix. Builds trust by surfacing failure modes upfront rather than hiding them.
  • gaps[] — Partial-fail intelligence: which modules came up empty, the impact (low/medium/high), and a plain-English reason. Helps consumers distinguish "no data exists" from "actor broke."
  • entityId — Stable cross-system identifier ({domain}|{slug-of-companyName}). Use as a join key in pipelines.
  • outputProfile — Echoes which output profile produced this record (analyst / executive / raw).

What this actor does NOT compute (intentionally)

  • Cross-company / global market patterns (e.g. "67% of fintechs use Cloudflare") — would require shared global state across all users' runs, which crosses a privacy boundary on a multi-tenant platform. The fingerprint and graph.shared*Key fields exist precisely so you can build this externally by joining datasets. A separate fleet-analytics actor that consumes datasets from many runs is the right shape — not state hidden inside a single-domain research actor.
  • Predictive ML scoring — every predictions[] entry is rule-based (deterministic, auditable). No LLM, no neural network — they reproduce on every run.
  • Per-user personalisation layer (userModel) — adapting priorities ranking + action thresholds + decisionRisk interpretation to a specific user's preferences (riskTolerance, actionBias, prefersEarlySignals, historicalAccuracy) is the next major architectural addition but explicitly deferred. Reason: would need user-supplied preference inputs + a meaningful accuracy-tracking dataset across runs (the current decisionMemory is per-entity, not per-user-pattern). Roadmap candidate, not v1.

Decision Philosophy v1

The actor's outputs encode an explicit philosophy. These rules are baked into priority ranking, watchlist actions, decisionRisk asymmetry, and actionGuard caps. Documented here so you can override them deliberately (and so a future userModel layer can swap them per user):

  1. Bias toward action when false-negative cost > false-positive cost AND reversibility is easy. Surfaces as decisionRisk.actEvenIfUnsure.
  2. Cap concurrent actions at 1-3. Diminishing returns set in beyond that — attention is finite, downstream automation gets noisy, signal-to-noise erodes. Surfaces as actionGuard.recommendedMaxActions.
  3. Prefer correlated signals over isolated anomalies. A correlation:product-launch outranks a single PRODUCT_SIGNAL event in priorities[]. Surfaces in priority weighting.
  4. Prefer recent signals over historical trends. Current-run events outweigh patterns from 30-90d ago in priority ranking. Surfaces in buildPriorities weight stack.
  5. Surface uncertainty over hiding it. gaps[], uncertainty[], epistemicStatus, decisionQuality.contradictions[], signalIndependence.warning all exist to flag confidence limits before users over-trust outputs.
  6. Honest abstention. When data is thin: emit unknown / null / low-confidence rather than fabricate. Returns null on whyNow when no notable trigger fired (better than emitting noise).
  7. Deterministic over probabilistic. Same inputs → same outputs every run. No LLM, no neural network. Documented in explain.principles[].

If you want a specific rule changed for your workflow, override at the consumer layer (filter dataset records, re-rank priorities by your own weights). The schema preserves enough underlying detail for any consumer to build their own opinion on top.

Failure modes the actor explicitly guards against

  • False precision authority — heuristic outputs (timeToRegret.deadlineHint, decisionRisk levels, predictions[].confidence) carry an epistemicStatus block that names them as estimates, lists what they're based on, and warns about what they're NOT.
  • Signal stacking illusionsignalIndependence.warning fires when N signals all derive from the same underlying change. "3 signals" can be 1 signal echoed 3 times.
  • Decision fatigueactionGuard.recommendedMaxActions caps concurrent actions; primaryActionOnly elevates a single dominant action and gives explicit permission to ignore the rest.
  • Overconfident classificationuncertainty[] flags areas where the actor knows it's guessing (company name, archetype, lifecycle, acquisition detection) and provides suggestedFix for each.
  • Hidden contradictionsdecisionQuality.contradictions[] surfaces internal inconsistencies (e.g. "high infra complexity but zero open-source presence") rather than silently passing them through.
  • Temporal misinterpretationtimeHorizonAlignment.status flags when short-term urgency (timeToRegret) and long-term trajectory diverge — prevents "this is urgent AND accelerating" reads when reality is "spike inside stability".

Worked examples of misinterpretation (and the correct read)

Case A — Signal stacking illusion

  • Input: domain X
  • Output: events[] shows INFRA_EXPANSION + POSSIBLE_ACQUISITION + PRODUCT_SIGNAL (3 events)
  • signalIndependence.score: 0.33 (low)
  • signalIndependence.warning: "Low signal independence (0.33). What looks like 3 corroborating signals is probably 1 underlying change reflected through 3 surfaces. Treat as 1 signal, not 3."
  • Naive reading: "Three corroborating signals — strong confirmation. Act with high confidence."
  • Correct reading: All three events derive from a single underlying delta (a burst of new subdomains). Treat as 1 signal of medium strength, not 3. Read priorities[0] for the single recommended action; do NOT inflate confidence by counting the supporting events.

Case B — Temporal misalignment

  • Input: domain Y
  • Output: timeToRegret.urgencyCurve = "steep" (24-48h), trajectory.direction = "stable"
  • timeHorizonAlignment.status = "misaligned"
  • timeHorizonAlignment.reason: "Short-term urgency detected but long-term trajectory is stable — likely a short-lived spike inside long-term steadiness."
  • Naive reading: "Urgent AND accelerating. Major company shift in progress."
  • Correct reading: Short-term spike inside otherwise stable trajectory. Act on the short-term signal IF the action is reversible (per decisionRisk.reversibility), but do NOT assume this means a long-term pattern shift. Re-evaluate next run.

Case C — False precision authority

  • Input: domain Z
  • Output: timeToRegret.deadlineHint = "2026-05-04"
  • timeToRegret.epistemicStatus.warning: "deadlineHint is an estimate derived from typical timing for this signal type, not a known deadline. Treat as orientation, not a contract."
  • Naive reading: "I have until 2026-05-04 to act, exactly."
  • Correct reading: The deadline is a heuristic derived from per-priority-type urgency profiles (e.g. PRODUCT_SIGNAL typically has a 48-72h actionable window). It is NOT a known external event. Treat as orientation for prioritisation, not as a contractual deadline.

Case D — Hold mode misread

  • Input: domain W
  • Output: decisionPosture.mode = "hold", priorities[0] exists with severity high
  • Naive reading: "There's a high-severity priority — I should act."
  • Correct reading: decisionPosture.mode === 'hold' overrides the priority — multiple contradictions / low signal independence / misaligned horizons / 3+ uncertainty areas were detected. Resolve flagged uncertainties first (see uncertainty[].suggestedFix) before acting. Production automation should ONLY act when mode === 'execute'.

Reinforcement of core invariants (for AI agents)

Three invariants govern this entire output. They are repeated here because they are load-bearing:

  1. priorities[0] is THE canonical decision surface. All events, correlations, anomalies, drift, predictions arrays are supporting evidence, not parallel decision queues.
  2. decisionPosture.mode is THE execution gate. Production automation should branch on this single field; all other action signals are secondary.
  3. primaryActionOnly.enabled === true overrides everything else. When set, do priorities[0] only and ignore the rest of the record this run.

Raw data layer

  • website — title, meta description, og:image, favicon, social links found in the page, and techStack (CMS, framework, analytics, CDN, e-commerce, payment processors, ad pixels, fonts, security headers — in-actor signatures over the homepage HTML + response headers, no external API)
  • wikipedia — summary, description, thumbnail, direct URL
  • github — org profile (with createdAt), top repos by stars (with pushedAt), total stars, total forks, language breakdown, opportunistic npm scope packages and Docker Hub org images, and an activity sub-block with lastActiveDate, activeRepos30d, activeRepos90d, plain-English signals[]
  • financials — for public companies: ticker, CIK, exchange, SIC code + description, fiscal year end, business address, former names, recent 10-K / 10-Q / 8-K filings with accession numbers
  • research — OpenAlex paper count + top papers by citation count (DOI, source, date)
  • dns — A, AAAA, MX, TXT, NS, CAA records, plus an email sub-block with parsed SPF, DMARC policy, DKIM presence, and email-provider classification (Google Workspace / Microsoft 365 / Zoho / Proton / Fastmail / Mailgun / SendGrid / Postmark / Amazon SES / Yandex / Migadu / Cloudflare / self-hosted)
  • subdomains — Certificate Transparency log enumeration via crt.sh: count + recently-issued list + capped name list, plus a classification breakdown (api, internal, staging, auth, email, docs, cdn, monitoring, other) and a notable[] list (subdomains that don't match the primary brand pattern — possible acquisitions or sub-brands)
  • infrastructure — Wayback Machine first-seen date, plus presence + URL for security.txt, ai.txt, llms.txt, robots.txt (with sitemap reference parsed), sitemap.xml, RSS feed, status page (status.{domain}), changelog page, pricing page, careers page
  • community — Hacker News mention count + top stories (Algolia HN API)
  • socialMedia — Twitter/X, LinkedIn, Facebook, Instagram, YouTube, GitHub presence verification
  • diff — Raw diff structure (already computed; events[] is the classified version of this — read events first)
  • recordType'company-report' for success, 'error' for the rare error path

A separate SUMMARY record is also written to the run's key-value store for orchestrators that call this actor via Actor.call() and want the headline answer (entityId, confidence, intelligence summary, security posture, fingerprint, top event, diff highlights, PPE charge) without paginating the dataset.

How to Use

  1. Enter the domain — e.g. stripe.com. The https:// prefix and trailing slashes are stripped automatically.
  2. Optionally set the company name — if blank, the actor detects it from <title> or og:title. Override when the homepage title is a tagline rather than the company name (e.g., "Build the Future" instead of "Acme Corp"). This dramatically improves Wikipedia, SEC, GitHub, and Hacker News match accuracy.
  3. Toggle modules if needed — every data source is on by default; turn off the ones you don't need to shave run time.
  4. Click "Start" — typical run takes 15–45 seconds. You'll see live progress messages in the Console (Step 1/10: Analyzing website…, Steps 2–10: Aggregating Wikipedia, GitHub, SEC, …, Done. 9 sources returned data: …).

Input Parameters

ParameterTypeRequiredDefaultDescription
domainStringYesstripe.comCompany website domain (e.g., stripe.com). The https:// prefix and trailing slashes are stripped automatically.
companyNameStringNoAuto-detectedOverride the auto-detected company name. Use this when the homepage title is a tagline.
includeFinancialsBooleanNotrueSearch SEC EDGAR for filings, ticker, CIK, exchange, address, SIC code.
includeResearchBooleanNotrueSearch OpenAlex for academic papers mentioning the company.
includeGithubBooleanNotrueFind GitHub org, top repos, language breakdown, plus npm + Docker Hub footprint when an org is found.
includeTechStackBooleanNotrueDetect CMS, framework, analytics, CDN, e-commerce, payment processors, ad pixels, fonts, and security headers.
includeSubdomainsBooleanNotrueEnumerate subdomains via Certificate Transparency logs (crt.sh).
includeInfrastructureBooleanNotrueDetect security.txt, ai.txt, llms.txt, robots, sitemap, RSS, status, changelog, pricing, careers + Wayback first-seen.
includeCommunityBooleanNotrueSearch Hacker News for mention count + top stories.
enableMonitoringBooleanNotrueOn repeat runs for the same domain, return a diff field, an events[] array (typed classification), a trends block (30d / 90d deltas), correlations[] (compound patterns), and anomalies[] (statistical outliers — needs 4+ prior runs).
outputProfileEnumNoanalystanalyst (full record) / executive (decision layer + thin pointers) / raw (modules only). The SUMMARY KV record is always full regardless.
compareToString[]No[]Up to 3 peer domains to benchmark against. Each peer = +1 PPE event ($1.00 per peer). Adds a peer-comparison record to the dataset with rank + summary across 8 metrics.
portfolioIdStringNoOpt-in label for cross-company tracking. When set, run writes a lightweight entry to a per-user named KV store and emits portfolioContext (rank/percentile/outlier), feed (rolling alerts + top movers + new entrants), normalized (percentile scores), cluster (similar companies), portfolioPressure (attention share + displacement). Each user's portfolios are isolated by Apify per-user named-store sandboxing.
monitorStateKeyStringNoSuite-aligned alias for portfolioId. Either input works; if both are set, portfolioId wins. Use this when you want one consistent field name across company-deep-research, waterfall-contact-enrichment, bulk-email-verifier, and lead-enrichment-pipeline.
lastActionObjectNo{ type: string, takenAt: ISO date, note?: string }. Tells the actor what action you took on this entity since the last run. Stored in the portfolio (requires portfolioId); on subsequent runs the actor infers outcome via state delta and emits decisionMemory. Outcome inference is honest: it can only observe signal changes — it can't see direct replies / deals / off-platform engagement.
includePatentsBooleanNofalseOff by default. The USPTO PatentsView API was retired in August 2024 and the replacement requires an API key, which would break the no-key promise of this actor.
githubTokenStringNoGitHub personal access token. Without it, GitHub allows 60 unauthenticated requests/hour. With it, 5,000/hour.
maxResultsIntegerNo50Maximum items returned per data source (1–200).

Input Examples

Quick lookup — full intelligence record (default):

{
"domain": "stripe.com"
}

Executive output for Slack alerts — decision layer + thin pointers:

{
"domain": "stripe.com",
"outputProfile": "executive"
}

Raw modules only — backward-compatible mode for users who want pure data:

{
"domain": "stripe.com",
"outputProfile": "raw"
}

Peer comparison — benchmark against 2 competitors (each peer = +1 PPE event):

{
"domain": "stripe.com",
"compareTo": ["adyen.com", "checkout.com"]
}

Portfolio mode — track this company as part of a larger watchlist:

{
"domain": "stripe.com",
"portfolioId": "fintech-watchlist-2026"
}

The first run for a portfolioId creates the portfolio. Each subsequent run for the same portfolioId adds to it (and refreshes existing entries). After ~4-5 different domains have been added, the actor starts emitting portfolioContext (rank/percentile/outlier), feed (rolling alerts + top movers + new entrants), normalized (percentile scores vs portfolio), cluster (similar companies in your portfolio), and portfolioPressure (attention share + displacement). Build a "100-company fintech watchlist" by scheduling this actor across 100 domains all using the same portfolioId.

Decision memory — close the feedback loop:

{
"domain": "stripe.com",
"portfolioId": "fintech-watchlist-2026",
"lastAction": {
"type": "trigger-outreach",
"takenAt": "2026-04-15T09:00:00Z",
"note": "sent intro email to VP Eng"
}
}

The actor stores lastAction in the portfolio entry. On the next run it compares the current state vs the snapshot at action time and emits decisionMemory: { outcome, effectivenessScore, pattern, daysSinceAction, inferenceMethod }. Outcome inference is honest — engaged / escalated / no-response / no-change / resolved / too-soon-to-tell — derived from observable signal changes only.

When compareTo is set, the dataset gets an additional peer-comparison record with rank + summary across 8 metrics (technical maturity, security posture, infra complexity, OSS strength, operational maturity, subdomains, GitHub stars, HN mentions). Each peer triggers its own recursive run and bills its own PPE event — 2 peers = 3 total $1.00 charges.

Named company with GitHub token (avoids 60-req/hr unauthenticated rate limit):

{
"domain": "openai.com",
"companyName": "OpenAI",
"githubToken": "ghp_xxxxxxxxxxxxxxxxxxxx",
"maxResults": 20
}

Fast scan — website + tech stack + DNS + social only:

{
"domain": "acme.com",
"includeFinancials": false,
"includeResearch": false,
"includeGithub": false,
"includeSubdomains": false,
"includeInfrastructure": false,
"includeCommunity": false
}

Scheduled monitoring — daily run with diff + events + trends + anomalies:

{
"domain": "anthropic.com",
"enableMonitoring": true
}

When you schedule this, the second run onwards returns a diff field (raw changes), an events[] array (typed classification), a trends block (30d / 90d deltas from snapshot history), correlations[] (compound patterns), anomalies[] (statistical outliers — needs 4+ prior runs), and a changeSummary.headline you can paste into a Slack message verbatim.

Input Tips

  • Provide companyName explicitly for companies whose website title is a tagline. This improves accuracy across Wikipedia, SEC, GitHub, and Hacker News.
  • Use maxResults: 10 for quick overviews, maxResults: 50 for comprehensive reports, maxResults: 200 to pull every subdomain crt.sh has on file.
  • Set includeFinancials: false for private companies to skip SEC EDGAR (it's US-only) and save 5–10 seconds.
  • For batch processing 100+ companies, supply a githubToken to avoid the 60-req/hr unauthenticated GitHub limit.

Output

Each run produces one dataset item. Truncated example showing the decision layer at the top, then the raw modules below:

{
"recordType": "company-report",
"entityId": "stripe.com|stripe",
"domain": "stripe.com",
"companyName": "Stripe",
"researchDate": "2026-05-01",
"tldr": {
"oneSentence": "Stripe is accelerating (lifecycle: scaling, fintech) — Platform / multi-product expansion underway.",
"topRisk": null,
"topOpportunity": "+14 new public repos in the last ~30 days",
"needsAttention": false
},
"trajectory": {
"direction": "accelerating",
"velocity": "high",
"confidence": "high",
"explanation": "Direction: accelerating (4 growing, 0 declining of 4 measured signals). Velocity: high (+47 subdomains in 30d). Confidence: high (7 historical snapshots).",
"components": { "subdomainsDelta30d": 47, "repoDelta30d": 5, "starsDelta30d": 412, "hnDelta30d": 23 }
},
"predictions": [
{
"type": "platform-expansion-likely",
"confidence": 0.65,
"timeframe": "ongoing",
"evidence": ["+47 subdomains in 30d", "3 languages", "10 npm packages"],
"headline": "Platform / multi-product expansion underway",
"rationale": "Aggressive subdomain growth combined with multi-language + multi-package distribution points at platform-mode investment — expect new SDK / API / market launches."
}
],
"graph": {
"primaryBrandRoot": "stripe",
"relatedCompanies": [
{ "domain": "paystack.com", "relationship": "acquisition-suspected", "confidence": 0.55, "evidence": ["Subdomain paystack.stripe.com on stripe.com hosts what looks like a separate brand"] },
{ "domain": "bridge.com", "relationship": "acquisition-suspected", "confidence": 0.55, "evidence": ["Subdomain bridge-payments.stripe.com on stripe.com hosts what looks like a separate brand"] }
],
"sharedInfrastructureKey": "dynectnet_googleworkspace_cloudflare",
"sharedEmailInfraKey": "googleworkspace",
"sharedTrackingKey": "googleanalytics4_segment",
"suspectedSubBrands": ["bridge-payments.stripe.com", "paystack.stripe.com", "atlas.stripe.com"]
},
"memory": {
"historyDepth": "47 days across 7 snapshots",
"snapshotCount": 7,
"earliestSnapshotAt": "2026-04-01T08:00:00.000Z",
"milestones": [
{ "eventType": "first-github-presence", "detectedAt": "2026-04-01T08:00:00.000Z", "detail": "186 repos observed" },
{ "eventType": "first-brand-refresh", "detectedAt": "2026-04-22T08:00:00.000Z", "detail": "Homepage title changed for the first time in our history" }
],
"patterns": [
"Consistent subdomain growth (5 of last 6 transitions positive)",
"Steady GitHub repo additions (4 of last 6 transitions positive)"
]
},
"uncertainty": [
{
"area": "acquisition-detection",
"reason": "Detected 3 non-primary-brand subdomain(s) but cannot corroborate against SEC filings (private company).",
"confidence": 0.55,
"suggestedFix": "Cross-check Crunchbase / news sources / Wikipedia for announced acquisitions matching: bridge-payments.stripe.com, paystack.stripe.com, atlas.stripe.com"
}
],
"actions": [
{ "type": "webhook-payload", "target": "Generic HTTP webhook", "rationale": "...", "payload": { "entityId": "stripe.com|stripe", "tldr": "Stripe is accelerating...", "topPriority": { "rank": 1, "type": "PRODUCT_SIGNAL", "severity": "high", "headline": "5 new public GitHub repos: stripe-cli-v2, payments-rs, fraud-rules-dsl…", "action": "Review new repos…" }, "needsAttention": false } },
{ "type": "slack-block-kit", "target": "Slack incoming-webhook", "rationale": "Pre-formatted Slack message", "payload": { "blocks": [{ "type": "header", "text": { "type": "plain_text", "text": "Stripe (stripe.com)" } }, "..."] } }
],
"summary": {
"headline": "Stripe — private fintech (10 signals)",
"oneLine": "Stripe — American-Irish financial services company",
"keyTakeaways": [
"Looks like a private fintech (SaaS or paid product)",
"Wikipedia: American-Irish financial services company",
"Tech stack: Next.js + Cloudflare + Stripe",
"Engineering: 186 repos, 28,450 stars — TypeScript-led (org since 2011), 14 active in last 30d",
"Distributes: 64 npm packages, 12 Docker images",
"Security posture: 92/100 (high) — Google Workspace, 8 strengths, 1 issues",
"847 subdomains in CT logs (12 api, 4 staging, 23 internal) — high infra complexity, likely large engineering org",
"Online since 2010 (Wayback Machine first snapshot)",
"3,742 Hacker News mentions — strong developer community visibility",
"Operational maturity: high (status page + changelog + security.txt)"
],
"whatToCheck": [
{ "label": "Read Wikipedia summary for context", "url": "https://en.wikipedia.org/wiki/Stripe,_Inc." },
{ "label": "Visit GitHub org (186 repos, 28,450 stars)", "url": "https://github.com/stripe" },
{ "label": "Check status page for outages", "url": "https://status.stripe.com" }
],
"confidence": {
"score": 0.86,
"level": "high",
"explanation": "High confidence — 9/10 sources returned data and 95% of high-value signals (Wikipedia, GitHub org, SEC filings, tech stack, email provider) landed.",
"dataCoverage": 0.9,
"signalStrength": 0.85,
"stability": "stable"
}
},
"intelligence": {
"companyType": "private",
"archetype": "fintech",
"businessModelHints": ["SaaS or paid product", "Charges via Stripe", "API platform", "SDK distribution"],
"technicalMaturityScore": 0.95,
"technicalMaturityLevel": "high",
"openSourceStrength": "high",
"infraComplexity": "high",
"operationalMaturity": "high",
"growthSignals": [
"+14 new public repos in the last ~30 days",
"Active engineering — 14 repos pushed in the last 30 days",
"Careers page online (likely hiring)"
],
"riskSignals": [],
"notablePatterns": [
"12 non-primary-brand subdomains (possible acquisition or sub-brand): bridge-payments.stripe.com, paystack.stripe.com…",
"Modern Vercel/Cloudflare-style stack",
"Multi-payment-processor (Stripe + PayPal) — likely large transaction volume"
]
},
"events": [
{
"type": "PRODUCT_SIGNAL",
"severity": "high",
"evidence": "5 new public GitHub repos: stripe-cli-v2, payments-rs, fraud-rules-dsl…",
"explanation": "New public repos often indicate a product launch, new SDK/CLI, or open-sourcing of an internal tool."
},
{
"type": "INFRA_EXPANSION",
"severity": "medium",
"evidence": "12 new subdomains in Certificate Transparency logs",
"explanation": "Burst of new subdomains often indicates new services, environments, or geographic expansion."
}
],
"trends": {
"sampleCount": 7,
"earliestSampleAt": "2026-04-01T08:00:00.000Z",
"subdomains30d": { "delta": 47, "pct": 5.9, "previousValue": 800 },
"subdomains90d": { "delta": 122, "pct": 16.8, "previousValue": 725 },
"githubRepos30d": { "delta": 5, "pct": 16.7, "previousValue": 30 },
"githubStars30d": { "delta": 412, "pct": 1.5, "previousValue": 28038 },
"hackerNews30d": { "delta": 23, "pct": 0.6, "previousValue": 3719 },
"secFilings30d": null,
"infraStability": "stable",
"changeFrequency": "medium"
},
"securityPosture": {
"score": 0.92,
"level": "high",
"issues": ["No Permissions-Policy header"],
"strengths": [
"DMARC enforced (reject)",
"SPF record present",
"CAA records published (restricts which CAs can issue certs)",
"HSTS header",
"Content-Security-Policy header",
"X-Frame-Options header (clickjacking)",
"X-Content-Type-Options header",
"Referrer-Policy header",
"Published security.txt with disclosure contact"
]
},
"fingerprint": {
"techStackHash": "_nextjs_cloudflare__stripe",
"infraSignature": "cloudflare_googleworkspace_dynect.net",
"orgSignature": "massiverepos_typescript_osshigh",
"securityHeadersHash": "contentsecuritypolicy_referrerpolicy_strict-transport-security_xcontent-type-options_xframe-options"
},
"gaps": [
{ "module": "financials", "impact": "low", "reason": "No SEC filings — likely a private company or non-US-listed" }
],
"website": {
"title": "Stripe | Financial Infrastructure for the Internet",
"description": "Stripe powers online and in-person payment processing...",
"favicon": "https://stripe.com/favicon.ico",
"ogImage": "https://stripe.com/img/v3/home/social.png",
"socialLinks": {
"twitter": "https://twitter.com/stripe",
"linkedin": "https://www.linkedin.com/company/stripe",
"github": "https://github.com/stripe"
},
"techStack": {
"cms": "",
"framework": "Next.js",
"analytics": ["Google Analytics 4", "Segment"],
"cdn": "Cloudflare",
"ecommerce": "",
"fonts": ["Google Fonts"],
"ads": [],
"paymentProcessors": ["Stripe"],
"securityHeaders": {
"strict-transport-security": "max-age=63072000; includeSubDomains; preload",
"content-security-policy": "...",
"x-frame-options": "DENY"
}
}
},
"wikipedia": {
"found": true,
"summary": "Stripe, Inc. is an Irish-American multinational financial services...",
"description": "American-Irish financial services company",
"thumbnail": "https://upload.wikimedia.org/...",
"url": "https://en.wikipedia.org/wiki/Stripe,_Inc."
},
"github": {
"found": true,
"orgProfile": {
"name": "Stripe",
"bio": "Financial infrastructure for the internet.",
"publicRepos": 186,
"followers": 1523,
"url": "https://github.com/stripe",
"createdAt": "2011-04-25T16:13:42Z"
},
"topRepositories": [
{
"name": "stripe-node",
"description": "Node.js library for the Stripe API.",
"stars": 3842,
"forks": 745,
"language": "TypeScript",
"url": "https://github.com/stripe/stripe-node",
"pushedAt": "2026-04-30T14:22:11Z"
}
],
"totalStars": 28450,
"totalForks": 7120,
"languages": [
{ "language": "TypeScript", "repoCount": 14 },
{ "language": "Ruby", "repoCount": 9 },
{ "language": "Go", "repoCount": 7 }
],
"npmPackages": [
{ "name": "@stripe/stripe-js", "description": "Loading wrapper for Stripe.js", "version": "4.x.x", "url": "https://www.npmjs.com/package/@stripe/stripe-js" }
],
"dockerImages": [],
"activity": {
"lastActiveDate": "2026-04-30",
"activeRepos30d": 14,
"activeRepos90d": 28,
"signals": [
"Multi-language (TypeScript, Ruby, Go)",
"TypeScript-led",
"Strong open-source traction (10K+ stars across top repos)",
"High recent activity (14 repos pushed in last 30 days)",
"Developer-first (10 npm packages)"
]
}
},
"financials": {
"isPublicCompany": false,
"ticker": null,
"cik": null,
"exchange": null,
"sicCode": null,
"sicDescription": null,
"fiscalYearEnd": null,
"address": null,
"formerNames": [],
"recentFilings": []
},
"research": {
"paperCount": 1247,
"topPapers": [
{ "title": "The Rise of Embedded Finance...", "doi": "https://doi.org/10.1016/j.jfi.2024.101032", "citationCount": 89, "publicationDate": "2024-06-15", "source": "Journal of Financial Intermediation" }
]
},
"dns": {
"aRecords": ["185.166.143.32"],
"aaaaRecords": ["2a04:8400:0:0:0:0:0:32"],
"mxRecords": ["1 aspmx.l.google.com", "5 alt1.aspmx.l.google.com"],
"txtRecords": ["v=spf1 include:_spf.google.com ~all", "v=DMARC1; p=reject; rua=mailto:dmarc@stripe.com"],
"nameServers": ["ns1.p16.dynect.net"],
"caaRecords": ["0 issue=letsencrypt.org"],
"email": {
"provider": "Google Workspace",
"spfPresent": true,
"spfRecord": "v=spf1 include:_spf.google.com ~all",
"dmarcPolicy": "reject",
"dmarcRecord": "v=DMARC1; p=reject; rua=mailto:dmarc@stripe.com",
"dkimSelectors": []
}
},
"subdomains": {
"found": true,
"count": 847,
"recent": [
{ "name": "api.stripe.com", "firstSeen": "2026-04-30" },
{ "name": "dashboard.stripe.com", "firstSeen": "2026-04-29" }
],
"all": ["api.stripe.com", "dashboard.stripe.com", "..."],
"classification": {
"api": 12, "internal": 23, "staging": 4, "auth": 6, "email": 3,
"docs": 5, "cdn": 2, "monitoring": 1, "other": 791
},
"notable": ["bridge-payments.stripe.com", "paystack.stripe.com", "atlas.stripe.com"]
},
"infrastructure": {
"firstSeenWayback": "2010-09-14",
"securityTxt": { "found": true, "url": "https://stripe.com/.well-known/security.txt", "contact": "mailto:security@stripe.com" },
"aiTxt": { "found": false, "url": "" },
"llmsTxt": { "found": false, "url": "" },
"robotsTxt": { "found": true, "url": "https://stripe.com/robots.txt", "sitemapReference": "https://stripe.com/sitemap.xml" },
"sitemapXml": { "found": true, "url": "https://stripe.com/sitemap.xml" },
"rssFeed": { "found": true, "url": "https://stripe.com/blog/feed.rss" },
"statusPage": { "found": true, "url": "https://status.stripe.com" },
"changelogPage": { "found": true, "url": "https://stripe.com/changelog" },
"pricingPage": { "found": true, "url": "https://stripe.com/pricing" },
"careersPage": { "found": true, "url": "https://stripe.com/jobs" }
},
"community": {
"hackerNews": {
"mentionCount": 3742,
"topStories": [
{ "title": "Stripe acquires Bridge for payments", "url": "https://stripe.com/...", "points": 1842, "numComments": 612, "createdAt": "2026-04-15T...", "storyUrl": "https://news.ycombinator.com/item?id=..." }
]
}
},
"socialMedia": [
{ "platform": "Twitter/X", "url": "https://twitter.com/stripe", "found": true },
{ "platform": "LinkedIn", "url": "https://www.linkedin.com/company/stripe", "found": true }
],
"diff": null
}

Output fields

Top-level discriminators:

FieldTypeDescription
recordTypeString'company-report' for a successful research record, 'error' for an error record
domainStringThe company domain that was researched
companyNameStringDetected or provided company name
researchDateStringISO date of the research (YYYY-MM-DD)

summary fields (hero block — read this first):

FieldTypeDescription
headlineStringOne-line title ("<companyName> — <role> (<N> signals)")
oneLineStringShort shareable answer (Slack subject, dashboard tile)
keyTakeaways[]Array of stringsUp to 8 scannable bullets synthesized from the modules below
whatToCheck[]Array of {label, url}Up to 4 ranked next-step links
confidence.scoreNumber0..1 — fraction of attempted sources that returned data
confidence.levelString'high' (≥0.7), 'medium' (≥0.4), or 'low'
confidence.explanationStringPlain-English reason — usable verbatim in reports

website.techStack fields (in-actor signature detection):

FieldTypeDescription
cmsStringDetected CMS (WordPress, Shopify, Webflow, Wix, Squarespace, Ghost, Drupal, Joomla, HubSpot CMS, Contentful, Sanity)
frameworkStringDetected framework (Next.js, Nuxt, Gatsby, Remix, Astro, SvelteKit, React, Vue, Angular, Hugo, Jekyll, Eleventy)
analytics[]ArrayAnalytics tools (GA4, GTM, Universal Analytics, Segment, Mixpanel, Amplitude, Heap, PostHog, Plausible, Fathom, Hotjar, FullStory, Matomo)
cdnStringCDN (Cloudflare, Fastly, Akamai, CloudFront, Vercel, Netlify, GitHub Pages, Cloudflare Pages, Bunny CDN, KeyCDN)
ecommerceStringE-commerce platform (Shopify, WooCommerce, BigCommerce, Magento, PrestaShop, Snipcart)
paymentProcessors[]ArrayPayment processors (Stripe, PayPal, Square, Adyen, Braintree, Klarna)
ads[]ArrayAd pixels (Google Ads, Meta Pixel, LinkedIn Insight, Twitter Pixel, TikTok Pixel, Reddit Pixel)
fonts[]ArrayFont services (Google Fonts, Adobe Fonts, Monotype)
securityHeadersObjectHSTS, CSP, X-Frame-Options, X-Content-Type-Options, Referrer-Policy, Permissions-Policy

github fields:

FieldTypeDescription
foundBooleanWhether a GitHub org or repos were found
orgProfile.nameStringGitHub organization display name
orgProfile.bioStringOrganization description
orgProfile.publicReposIntegerNumber of public repositories
orgProfile.followersIntegerNumber of GitHub followers
orgProfile.createdAtStringISO date the org was created (proxy for company age)
topRepositories[]ArrayTop repos by stars: name, description, stars, forks, language, url
totalStars / totalForksIntegerSum across returned repos
languages[]ArrayLanguage breakdown: {language, repoCount} ranked by repo count
npmPackages[]Arraynpm packages under the same scope (e.g. @stripe/*)
dockerImages[]ArrayDocker Hub images under the same org

financials fields:

FieldTypeDescription
isPublicCompanyBooleanWhether SEC filings were found
tickerString / nullStock ticker (e.g., "AAPL")
cikString / nullSEC Central Index Key
exchangeString / nullStock exchange (NYSE, NASDAQ, etc.) — pulled from data.sec.gov/submissions/CIK*.json
sicCode / sicDescriptionString / nullStandard Industrial Classification
fiscalYearEndString / nullMMDD format
addressObject / nullBusiness address (street, city, state, zip)
formerNames[]ArrayPast company names from SEC filings
recentFilings[]ArrayRecent SEC filings: formType, filedDate, description, url, accessionNumber

dns + dns.email fields:

FieldTypeDescription
aRecords / aaaaRecordsString[]IPv4 / IPv6 addresses
mxRecords / txtRecords / nameServers / caaRecordsString[]Other DNS records
email.providerStringClassified email provider (Google Workspace, Microsoft 365, Zoho, Proton, Fastmail, Mailgun, SendGrid, Postmark, Amazon SES, Yandex, Migadu, Cloudflare, self-hosted)
email.spfPresent / email.spfRecordBoolean / StringSPF detection
email.dmarcPolicy / email.dmarcRecordStringDMARC policy (none / quarantine / reject)
email.dkimSelectors[]ArrayDKIM selectors found in TXT records

subdomains fields:

FieldTypeDescription
countIntegerTotal unique subdomains in Certificate Transparency logs
recent[]ArrayUp to 20 most recently issued: {name, firstSeen}
all[]ArrayAll subdomains, capped at min(maxResults, 200)

infrastructure fields: see the example above for full shape — every well-known file probe returns {found, url, ...}.

community.hackerNews fields:

FieldTypeDescription
mentionCountIntegerTotal Hacker News story count for the company name
topStories[]ArrayTop stories: title, url, points, numComments, createdAt, storyUrl

diff fields (only on second-and-later runs of the same domain):

FieldTypeDescription
sinceStringISO timestamp of the previous snapshot
sinceRunIdString / nullApify run ID of the previous run
newSecFilings[]ArraySEC filings present this run, absent in the previous snapshot (matched by accessionNumber)
newGithubRepos[]String[]Repo names new since last run
newTxtRecords[] / removedTxtRecords[]String[]TXT verification token added/removed (Google, Microsoft, Slack, Okta, ad networks…)
newSubdomains[]String[]Subdomains issued in Certificate Transparency since last run
nameServersChangedBooleanWhether NS records changed
nameServersOld / nameServersNewString[]Old vs new NS list (only populated when changed)
mxRecordsChangedBooleanWhether MX records changed
homepageTitleChanged / homepageDescriptionChangedBooleanWhether homepage copy changed
newPatents / newHackerNewsStoriesIntegerCount delta since last run

Decision-grade features

This actor goes well beyond "give me the data." Four features turn the output into a decision system:

Priorities — the decision queue

Every report contains a priorities[] array with the top 5 ranked decisions the data implies. Each priority has a recommendedAction written as a concrete next step ("Read filing X", "Investigate the new subdomains", "Update internal records of this domain's infra"). Most users only need to read priorities[0]. Built deterministically from events + correlations + anomalies + lifecycle + securityPosture.

"priorities": [
{
"rank": 1,
"type": "correlation:product-launch",
"severity": "high",
"headline": "product launch pattern detected (confidence 85%)",
"reason": "5 new public GitHub repos + 12 new subdomains in Certificate Transparency logs",
"whyItMatters": "New public repos co-occurring with a subdomain burst is a strong product-launch signal — repo for the SDK, subdomain for the service.",
"recommendedAction": "Track the launch — read the new repos AND visit the new subdomains for the live product.",
"evidence": ["5 new public GitHub repos: stripe-cli-v2, payments-rs, fraud-rules-dsl…", "12 new subdomains in Certificate Transparency logs"],
"timeToImpact": "days"
}
]

Triggers — automation-ready booleans

The triggers object precomputes 11 booleans for downstream routing. Filter with WHERE triggers.X = true instead of parsing prose. Drop-in for Zapier / Make / Slack / agent tool calls.

"triggers": {
"highSeverityEvents": true,
"possibleAcquisition": false,
"productSignals": true,
"infraMigration": false,
"emailInfraChange": false,
"brandRefresh": false,
"communityTraction": true,
"securityRiskHigh": false,
"rapidGrowth": true,
"dormancy": false,
"needsHumanReview": true
}

Output profiles — same data, different consumers

Pick outputProfile:

  • analyst (default) — Full intelligence record. ~10–50KB per record.
  • executive — Decision layer + thin module pointers. Strips verbose subtrees (full subdomain lists, full HN stories, full repo metadata) — keeps everything you need for a Slack alert or dashboard tile. ~3–10KB per record.
  • raw — Modules only, no decision layer. Backward-compatible mode for users who want pure data and will compute their own intelligence on top.

Note: the SUMMARY KV record always contains the full headline summary regardless of profile.

Portfolio mode — cross-company prioritisation

Pass portfolioId: "my-watchlist-name" and the actor maintains a per-user named key-value store of every company you've researched under that label. Each subsequent run emits relative intelligence on top of the absolute intelligence:

"portfolioContext": {
"portfolioId": "fintech-watchlist-2026",
"portfolioSize": 47,
"rank": "3/47",
"rankBasis": "maximum alert score across events / correlations / anomalies",
"percentile": 0.96,
"outlier": true,
"rarity": "Uncommon — top 4% of portfolio",
"reason": "Stands out: top 4% of the portfolio by alert intensity; flagged for human review; top priority is high-severity (correlation:product-launch).",
"portfolioMedians": { "technicalMaturityScore": 0.62, "securityPostureScore": 0.55, "subdomainCount": 18, "githubStars": 240 }
}
"feed": {
"portfolioId": "fintech-watchlist-2026",
"rollingAlerts": [
{ "detectedAt": "2026-04-30T12:14:00Z", "domain": "stripe.com", "eventType": "PRODUCT_SIGNAL", "severity": "high", "headline": "5 new public GitHub repos", "alertScore": 0.9 },
{ "detectedAt": "2026-04-29T08:00:00Z", "domain": "checkout.com", "eventType": "INFRA_MIGRATION", "severity": "high", "headline": "Name servers changed", "alertScore": 0.8 }
],
"topMovers": [
{ "domain": "ramp.com", "rationale": "Alert intensity rose from 0.30 → 0.85 (now: PRODUCT_SIGNAL)" }
],
"newEntrants": [
{ "domain": "klarna.com", "addedAt": "2026-04-29T..." }
]
}

This is the difference between "give me a report on Stripe" and "tell me which of my 100 watchlist companies matter most today." The portfolio is the platform.

Comparability mode — benchmark against peers

Pass compareTo: ["domain1.com", "domain2.com"] (max 3) to benchmark. Each peer triggers a separate recursive run and bills its own PPE event ($1.00 per peer). The result is an additional peer-comparison record in the dataset with 8 ranked metrics:

{
"recordType": "peer-comparison",
"entityId": "stripe.com|stripe",
"domain": "stripe.com",
"comparison": {
"domain": "stripe.com",
"peers": ["adyen.com", "checkout.com"],
"peerErrors": [],
"metrics": {
"technicalMaturityScore": { "ours": 0.95, "peers": [{"domain": "adyen.com", "value": 0.85}, {"domain": "checkout.com", "value": 0.78}], "rank": "1/3", "summary": "Highest technical maturity of the 3 compared" },
"securityPostureScore": { "ours": 0.92, "peers": [...], "rank": "2/3", "summary": "Higher security posture than 1/2 peers" },
"infraComplexity": { "ours": "high", "peers": [...], "rank": "1/3", "summary": "infra complexity: matches all peers (high)" }
},
"headline": "Stripe stands out: Highest technical maturity of the 3 compared.",
"distinctSignals": ["Highest technical maturity of the 3 compared", "Higher security posture than 1/2 peers"]
}
}

Use this for dashboards, RFP-prep, or competitive deep-dives. The recursive runs are bounded — peers don't recurse further (their compareTo is forced empty).

Monitoring mode (scheduled runs)

Schedule this actor on a domain you care about — daily, weekly, monthly — and every run after the first returns a populated diff field. The first run for a domain saves a snapshot to the actor's key-value store; the second run loads that snapshot, computes the differences, and emits them under diff.

This is the difference between a one-shot company-research tool and a competitive-intelligence monitoring product. Examples of what diff surfaces:

  • newSecFilings — new 10-K, 10-Q, or 8-K filed since last run (M&A, earnings, material events)
  • newGithubRepos — new public repo published (product launches, new SDK)
  • newSubdomains — new subdomain in Certificate Transparency logs (acquisitions, new internal tools, staging environments standing up)
  • newTxtRecords — new verification token (Slack workspace, Google site verification, Okta, ad network)
  • nameServersChanged / mxRecordsChanged — infra migration, M&A signal, email provider change
  • homepageTitleChanged / homepageDescriptionChanged — rebrand, pivot, messaging shift

The status message at the end of a monitoring run reads Done. 9 sources returned data: … | Δ since last run: 1 new filing, 2 new repos, 14 new subdomains | PPE charge: $1.00.

Use Cases

  • Sales & BD preparing company briefs before outbound — identify tech stack, e-commerce platform, payment processor, filings status, and social channels to personalize outreach
  • Competitive intelligence — pull website + GitHub + SEC + Wayback + status-page + changelog + subdomain growth into one report; schedule it weekly to track competitor cadence
  • VC & PE researchers — assess public-market presence, open-source footprint, npm + Docker distribution, and academic citations on prospective investments; schedule on portfolio companies for change alerts
  • Journalists & investigators — Wikipedia summary, SEC filings, DNS records, social presence, and Hacker News mentions in seconds — usable directly in stories
  • M&A due diligence — preliminary technical + public-records checks on acquisition targets, including subdomain enumeration for asset inventory
  • Marketing strategists — audit a brand's digital footprint across social, tech stack, and operational-transparency surfaces (status page, changelog, security.txt)
  • Security & SRE teams — Certificate Transparency subdomain enumeration + DMARC posture + security-headers + security.txt presence, all in one pass; schedule for cert-rotation and infra-change alerts
  • DevRel & developer marketers — track Hacker News momentum, GitHub stars, and changelog updates over time on competitor and partner products

How to Use the API

You can call this actor programmatically from any language.

Python

import requests
import time
run = requests.post(
"https://api.apify.com/v2/acts/ryanclinton~company-deep-research/runs",
params={"token": "YOUR_APIFY_TOKEN"},
json={
"domain": "stripe.com",
"includeFinancials": True,
"includeResearch": True,
"includeGithub": True,
"enableMonitoring": True,
"maxResults": 20
},
timeout=30,
).json()
run_id = run["data"]["id"]
while True:
status = requests.get(
f"https://api.apify.com/v2/actor-runs/{run_id}",
params={"token": "YOUR_APIFY_TOKEN"},
timeout=10,
).json()
if status["data"]["status"] in ("SUCCEEDED", "FAILED", "ABORTED"):
break
time.sleep(5)
dataset_id = status["data"]["defaultDatasetId"]
items = requests.get(
f"https://api.apify.com/v2/datasets/{dataset_id}/items",
params={"token": "YOUR_APIFY_TOKEN"},
timeout=30,
).json()
report = items[0]
print(report["summary"]["headline"])
for line in report["summary"]["keyTakeaways"]:
print(f" - {line}")
print(f"\nConfidence: {report['summary']['confidence']['explanation']}")

JavaScript

const response = await fetch(
"https://api.apify.com/v2/acts/ryanclinton~company-deep-research/run-sync-get-dataset-items?token=YOUR_APIFY_TOKEN",
{
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify({
domain: "stripe.com",
enableMonitoring: true,
maxResults: 20,
}),
}
);
const [report] = await response.json();
console.log(report.summary.headline);
report.summary.keyTakeaways.forEach((line) => console.log(` - ${line}`));
console.log(`Confidence: ${report.summary.confidence.explanation}`);

cURL

curl -X POST "https://api.apify.com/v2/acts/ryanclinton~company-deep-research/run-sync-get-dataset-items?token=YOUR_APIFY_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"domain": "stripe.com",
"enableMonitoring": true,
"maxResults": 20
}'

Reading the SUMMARY KV record

For orchestrators using Actor.call, the run's key-value store also contains a lightweight SUMMARY record so you don't need to paginate the dataset:

from apify_client import ApifyClient
client = ApifyClient("YOUR_APIFY_TOKEN")
run = client.actor("ryanclinton/company-deep-research").call(run_input={"domain": "stripe.com"})
summary = client.key_value_store(run["defaultKeyValueStoreId"]).get_record("SUMMARY")["value"]
print(summary["confidence"]["level"], summary["sourcesWithData"], "of", summary["sourcesAttempted"])

How It Works

Input (domain, optional companyName, module toggles)
Phase A — Website (sequential, gives us companyName + raw HTML for downstream parsing)
│ Fetch HTML + response headers, extract title / og:* / favicon / social links,
│ detect tech stack from in-actor signatures (CMS / framework / analytics / CDN /
│ e-commerce / payment processors / ad pixels / fonts / security headers)
Phase B — All independent modules in parallel (Promise.all)
├── Wikipedia — direct page summary then search fallback
├── GitHub — try 3 org-name guesses, top repos by stars,
│ language breakdown, npm scope packages, Docker Hub org
├── SEC EDGAR — EFTS full-text search + atom company search +
│ data.sec.gov submissions enrichment (exchange, SIC, address)
├── OpenAlex — citation-sorted academic papers
├── DNS — A/AAAA/MX/TXT/NS/CAA + _dmarc lookup +
│ SPF/DMARC/DKIM parse + email-provider classification
├── Subdomains — Certificate Transparency log enumeration via crt.sh
├── Infrastructure — Wayback first-seen + 10 well-known-file probes
(security.txt, ai.txt, llms.txt, robots, sitemap, RSS,
│ status, changelog, pricing, careers)
├── Community — Hacker News mention count + top stories (Algolia HN API)
└── Social Media — 6 platforms, prefers website-discovered links over slug guesses
Phase C — Compile report, build summary hero block (deterministic synthesis)
Phase D — On enableMonitoring: load prior snapshot, compute diff, save current snapshot
Push to dataset (one record), save lightweight SUMMARY to KV store, charge PPE if data found

Data sources

StepSourceAPI UsedAuth Required
1Company websiteDirect HTTPS fetch + HTML parsing + in-actor tech-stack signaturesNo
2WikipediaREST API (/api/rest_v1/page/summary) + search APINo
3GitHubREST API (/orgs/{name}, /orgs/{name}/repos) + search fallbackOptional token (60 → 5,000 req/hr)
3anpmregistry.npmjs.com/-/v1/search?text=scope:{org}No
3bDocker Hubhub.docker.com/v2/repositories/{org}/No
4SEC EDGAREFTS search + browse-edgar atom + data.sec.gov/submissions/CIK{padded}.jsonNo
5OpenAlexREST API (/works?search=)No
6DNSNode.js dns.promises (resolve4/6, resolveMx/Txt/Ns/Caa) + _dmarc.{domain} lookupNo
7Social MediaHTTP GET to profile URLsNo
8SubdomainsCertificate Transparency logs via crt.sh JSON APINo
9InfrastructureDirect fetches to /.well-known/security.txt, /ai.txt, /llms.txt, /robots.txt, /sitemap.xml, status.{domain}, /changelog, etc. + Internet Archive CDXNo
10Hacker NewsAlgolia HN Search API (hn.algolia.com/api/v1/search)No

How much does it cost?

This actor is priced $1 per company researched under Pay-Per-Event. You are only charged when at least one source returns data — runs against parked, unreachable, or invalid domains are not billed.

The actor uses 512 MB of memory (default) and completes in 15–45 seconds for most domains. Phase B runs all 9 independent modules in parallel, so total run time is bounded by the slowest single API rather than sequential summation.

PlanMonthly CostIncluded PPE budgetApprox companies researched
Free$0$5 (built-in)~5
Personal$49/month$49 included~49
Team$499/month$499 included~499

Apify platform compute (memory-seconds) is billed separately by Apify and is typically a few cents per run.

Tips

  • Provide companyName explicitly for companies whose website title is a tagline. This dramatically improves accuracy across Wikipedia, SEC, GitHub, and Hacker News.
  • Schedule on a domain you care about to unlock the diff field — the second run onwards returns what changed since last time. This converts the actor from one-shot research into a competitive-intel monitoring product.
  • Disable unused modules to cut run time. If you only need website + tech stack + DNS + social, turn off SEC, research, GitHub, subdomains, infrastructure, and community.
  • Use a GitHub token when researching multiple companies in a batch. Without one, GitHub allows 60 unauthenticated requests/hour. A free personal access token raises this to 5,000/hour.
  • Combine with other actors — feed the SEC CIK number into the SEC EDGAR Filing Analyzer, or pass the domain into Website Tech Stack Detector for a deeper Wappalyzer-grade fingerprint.
  • Batch process company lists by calling this actor via the Apify API in a loop. Each run is independent, so you can research hundreds of companies in parallel.

Limitations

  • Company name detection depends on website title — sites with tagline-only titles (e.g., "Build the Future") will produce poor search results across Wikipedia, SEC, GitHub, and Hacker News unless you provide companyName manually.
  • SEC EDGAR is US-only — the financials module only finds companies that file with the US Securities and Exchange Commission.
  • GitHub org matching is heuristic — the actor tries 3 name guesses (domain base, lowercased company name, dashed company name) plus a search fallback. Companies with GitHub org names that differ significantly may not be found.
  • npm + Docker Hub probes assume the org name matches the GitHub org — works well for stripe, airbnb, vercel; doesn't work when the company uses a different naming convention on each platform.
  • Tech stack detection covers ~60 high-value signatures — Cloudflare, Next.js, Stripe, Shopify, GA4, Segment, etc. It is not a full Wappalyzer replacement (Wappalyzer covers 3,000+ signatures). For deep technographics, combine with the Website Tech Stack Detector.
  • Subdomain enumeration depends on crt.sh — only finds subdomains that have ever been issued a public TLS certificate. Internal-only subdomains, subdomains using wildcard certs, and subdomains using private CAs are not visible.
  • Wikipedia search may match wrong entity — common company names (e.g., "Apple") may match the article for a different entity. Providing the full company name helps.
  • Hacker News mentions are name-based — common company names will produce false positives in the mention count. Top stories are usually correct.
  • USPTO patents are off by default — the USPTO PatentsView API was retired in August 2024 and the replacement requires an API key. To keep this actor truly key-free, patents are returned as {found: false, count: 0} rather than gating behind a key requirement.
  • diff field is empty on the first run for a domain — the actor saves a snapshot at the end of the first run; the second run is when comparison kicks in.

What this actor does NOT do

To set expectations honestly:

  • It does NOT replace Clearbit, ZoomInfo, Apollo, or PitchBook. Those are licensed commercial enrichment APIs with employee counts, revenue ranges, technographic confidence scores, intent signals, and verified contact data. This actor uses official + open public sources only.
  • It does NOT find verified personal email addresses. For email enrichment use Hunter.io, Apollo.io, or the Person Enrichment Lookup actor (which wraps People Data Labs).
  • It does NOT scrape LinkedIn employee profiles or LinkedIn employee counts. LinkedIn is anti-scraping; for LinkedIn data use a dedicated LinkedIn scraper actor.
  • It does NOT pull website traffic data. SimilarWeb / Semrush / Ahrefs are paid for a reason — public surfaces don't expose this.
  • It does NOT compute a comprehensive tech-stack fingerprint. It uses ~60 high-value signatures over the homepage HTML; for full Wappalyzer-grade analysis (3,000+ signatures, login-walled SPA support, JS execution) use the Website Tech Stack Detector actor.
  • It does NOT classify the company by NAICS / SIC beyond what the SEC publishes. Public companies get the SEC SIC code; private companies do not.
  • It does NOT score the company for sales fit, lead quality, or risk. It returns structured facts; you apply your scoring on top.

Combine with other Apify actors

ActorHow to combine
Lead Enrichment PipelineCompany research is built into step 4 of this pipeline — use the pipeline for full lead enrichment with email, phone, scoring
Person Enrichment LookupPair with this actor for company-level intel: email + verified contact for individuals at the company
Website Contact ScraperScrape contacts first, then run this actor on the domains to enrich with company intel
Website Tech Stack DetectorDeep Wappalyzer-grade tech-stack fingerprinting beyond the in-actor 60 signatures
SEC EDGAR Filing AnalyzerFeed the CIK from this actor's output for deep SEC filing analysis

Responsible Use

  • All data is from public sources — Wikipedia (Creative Commons), SEC (public domain), OpenAlex (open access), GitHub (public API), DNS (public records), crt.sh (public Certificate Transparency logs), Wayback Machine (public archive), Hacker News (public via Algolia API).
  • Respect GitHub rate limits — use a personal access token when running batch queries to avoid the 60-req/hr unauthenticated limit.
  • Comply with SEC EDGAR fair use policy — the actor includes a descriptive User-Agent string with a real contact email. Avoid excessive request volumes.
  • Use for legitimate business research — sales intelligence, competitive analysis, due diligence, journalism, security research.

FAQ

Is this actor free to use? The PPE price is $1.00 per company researched, and you're only charged when at least one of the 14+ sources returns data. Failed / parked / unreachable domains are not billed.

Does it work for non-US companies? Yes. Website analysis, tech stack, Wikipedi