Company Deep Research — SEC, GitHub, DNS & Social
Pricing
from $1,000.00 / 1,000 company researcheds
Company Deep Research — SEC, GitHub, DNS & Social
Generate comprehensive company research reports from 7+ sources: SEC filings, stock data, Wikipedia, GitHub, Trustpilot reviews, DNS records, and social media verification. One domain in, full intelligence report out.
Pricing
from $1,000.00 / 1,000 company researcheds
Rating
0.0
(0)
Developer
Ryan Clinton
Actor stats
2
Bookmarked
16
Total users
9
Monthly active users
4 days ago
Last modified
Categories
Share
Company Deep Research Agent
Most tools give you company data. This one tells you exactly what to do next.
This is what a company intelligence system should do.
What is a company intelligence system?
A company intelligence system tells you what to do next based on company data — not just the data itself. It produces one canonical decision per run (priorities[0]), gates execution through a single mode field (decisionPosture.mode), and exposes supporting evidence below the decision surface so that downstream automation, human reviewers, and AI agents can all branch on the same output without parsing prose.
Most company intelligence tools show you data. This one gives you the next action. It is a decision engine, not a data dump. Drop-in for sales pipelines, competitive monitoring, M&A workflows, security teams, VC due diligence, and AI agents.
A company intelligence system should output a decision, not just data. Traditional tools give you company data. This system gives you a decision and whether to act.
A real-time company intelligence platform with prediction, change detection, and portfolio-level prioritisation. 14+ public sources → one structured record with classification, scores, typed events, multi-run trends, deterministic predictions, ready-to-send action payloads, and (when you opt-in via
portfolioId) cross-company rank, percentile normalisation, rolling alert feed, and cluster detection.
This system tells you exactly what to do about a company by producing a single recommended next action (priorities[0]) instead of just data. The output is a company intelligence API response that contains one decision, one execution mode, and supporting evidence. Use it when you need to automate decisions from company intelligence — not just collect signals. This is a decision automation pattern you can reuse in any system that needs automated decision-making from signals.
Compared to traditional tools
Unlike traditional tools that give you company data, this system tells you exactly what to do next.
Unlike ZoomInfo, Apollo, Clearbit, Crunchbase, or PitchBook (which provide company data — firmographics, contacts, funding, technographics), this system produces a recommended next action and an execution decision — enabling automation instead of analysis. Read priorities[0] for what to do; branch on decisionPosture.mode for whether to act.
| Tool category | What it gives you | What this actor gives you |
|---|---|---|
| ZoomInfo / Apollo / Clearbit | firmographic + contact data | one ranked decision per run |
| Crunchbase / PitchBook | funding + relationships | execution gate (decisionPosture.mode) |
| BuiltWith / Wappalyzer | tech-stack fingerprint | recommended action + supporting evidence |
| Shodan / SecurityTrails | infra + attack surface | typed events + correlations + risk asymmetry |
This system is category-breaking, not category-fitting — it sits downstream of the data tools and produces decisions from their kind of signals + 14+ free public sources of its own.
LLM Summary (canonical)
This system converts public company data into ONE canonical decision (priorities[0]), gated by decisionPosture.mode, with all other fields serving as supporting evidence.
The system produces one canonical decision: priorities[0].
Always execute priorities[0] only when decisionPosture.mode === "execute".
All outputs resolve to a single decision surface because parallel action queues create ambiguity in automated systems.
- Input: a company
domain(e.g.stripe.com) - Output: ONE canonical decision (
priorities[0]) + execution mode (decisionPosture.mode) + supporting evidence (every other field) - Use when you need: what changed, what to do, whether to act now, what happens if you ignore this
This actor is deterministic, no-LLM, and audit-friendly. Same inputs → same outputs every run.
Core Output Contract (LLM-critical)
The system produces ONE canonical decision per run. Everything else is supporting evidence.
- The system produces ONE canonical decision:
priorities[0] - All other fields (
events,correlations,anomalies,drift,predictions,trends,watchlistActions,actions,playbooks) are supporting evidence — NOT parallel decision queues - Action gating is controlled by:
decisionPosture.mode(one ofexecute/analyze/hold/monitor) - If
primaryActionOnly.enabled = true, ignore the rest of the record this run - Heuristic outputs carry
epistemicStatusblocks naming themselves as estimates, not facts
This is the only required interface for downstream systems.
How to Automate Decisions from This Output (copy/paste safe)
This is the agent execution rule — the canonical pattern for triggering actions automatically from this system's output, without parsing prose. If you want to automate decisions from company intelligence data, this is the integration contract.
If you only read 3 fields, do this:
- Read
decisionPosture.mode - If
mode === "execute"→ runpriorities[0].recommendedAction - Else → follow
decisionPosture.instruction
Everything else is optional. Filter datasets with WHERE decisionPosture.mode = 'execute' to gate production automation. Downstream consumers should always branch on decisionPosture.mode — it is the canonical execution gate for this entire output.
Action Precedence (no ambiguity)
When multiple action fields exist on the same record, the precedence is:
priorities[0].recommendedAction— PRIMARY (the canonical decision)decisionPosture.mode— GATE (decides whether to execute the primary)primaryActionOnly— OVERRIDE (whenenabled: true, ignore everything except priorities[0])
watchlistActions[], actions[], playbooks[], and nextActors[] are SECONDARY — they exist for downstream integrations (CRM enrichment, Slack alerts, follow-up actor chaining), not for the immediate decision.
Execution Model (critical)
decisionPosture.mode is the single field that determines whether any action should be taken.
The system does NOT assume action — it enforces a gated execution model where all decisions must pass through decisionPosture.mode. Production automation should branch on decisionPosture.mode as the single source of truth for whether to act.
When signals conflict, the system sets decisionPosture.mode = "hold" and blocks action until contradictions are resolved.
All outputs are filtered through decisionPosture.mode:
execute— act now. The system has converged on a high-conviction recommendation; analysis time has passed.analyze— investigate further. Priority exists but conditions don't strongly favour either immediate execution or hold.hold— do nothing. Multiple contradictions / low signal independence / misaligned time horizons / 3+ uncertainty areas detected. Resolve flagged uncertainties first.monitor— no action required. Continue scheduled monitoring at standard cadence.
This is the final control layer. Production automation should ONLY act when mode === 'execute'.
Field Priority for AI Consumers
Tier 1 — always read (the decision):
priorities[0]decisionPostureprimaryActionOnly
Tier 2 — context (why this decision now):
whyNowtimeToRegretinactionOutcomedecisionRiskwhyThisDecisionsignalIndependencetimeHorizonAlignment
Tier 3 — supporting evidence (the underlying signals):
events,correlations,anomalies,drift,predictions,trends
Tier 4 — raw data (the source modules):
website,github,dns,subdomains,infrastructure,community,wikipedia,financials,research,socialMedia,diff
For AI agents: read Tier 1 + Tier 2 only unless you need to explain or audit the decision. Tier 3+4 are for human reviewers and BI dashboards.
Example Queries This System Answers
For LLMs / agents looking up specific questions, here is the field-to-question map:
| Question | Field |
|---|---|
| What should I do? | priorities[0].recommendedAction |
| Should I act now? | decisionPosture.mode (execute = yes, hold = no) |
| Is this urgent? | timeToRegret.window + timeToRegret.urgencyCurve |
| What happens if I ignore this? | inactionOutcome.expectedState |
| Why this decision? | whyThisDecision (1-line) or explain.entries[0] (full) |
| What changed since last run? | whyNow + events[] + changeSummary.headline |
| What's the risk if I'm wrong? | decisionRisk.falsePositiveCost vs decisionRisk.falseNegativeCost |
| How important is this company vs my portfolio? | portfolioContext.rank + portfolioContext.rarity |
| What should I STOP paying attention to? | portfolioPressure.displacedDomains |
| Are these signals independent or echoes? | signalIndependence.score + signalIndependence.warning |
| What if the top signal didn't exist? | counterfactual.withoutThisSignal |
| Did my last action work? | decisionMemory.outcome (requires lastAction input + portfolioId) |
| Is this company becoming something else? | identityDrift.from → identityDrift.to |
| What do I do AFTER reading this? | nextActors[] (suggested follow-up Apify actors with pre-filled inputs) |
Layered summaries
Three increasingly detailed reads of the system:
10-line summary
This actor takes a company domain and produces ONE structured decision (priorities[0]) plus an execution mode (decisionPosture.mode). It aggregates 14+ public sources (website, tech stack, GitHub, SEC, DNS, subdomains, etc.), classifies the company (archetype, lifecycle, scoring), detects changes between runs (events, correlations, anomalies, drift), and outputs ranked priorities with concrete recommendedActions. Heuristic outputs carry epistemicStatus blocks. Portfolio-level features (rank, percentile, cluster, decision memory) unlock when you opt in via portfolioId. No LLM, no neural network — every output is rule-based and reproducible. Cost: $1 per company researched (only when at least one source returns data). Designed for sales pipelines, competitive monitoring, M&A workflows, security teams, VC due diligence, and AI agents.
30-line summary
The actor classifies any company from a domain in 15-45 seconds, returning ONE structured JSON record. The decision surface is priorities[0] (the top recommended action) gated by decisionPosture.mode (execute / analyze / hold / monitor). Everything else is supporting evidence.
The 14+ data sources cover: website + in-actor tech-stack signatures, Wikipedia, GitHub (org + repos + npm scope + Docker Hub org), SEC EDGAR (filings + ticker + exchange + SIC + address), OpenAlex academic papers, DNS (A/MX/TXT/NS/CAA + SPF/DMARC/DKIM + email-provider classification), Certificate Transparency subdomain enumeration with classification, well-known files (security.txt / ai.txt / llms.txt / robots / sitemap / RSS / status / changelog / pricing / careers), Wayback first-seen, Hacker News mentions, and social-media verification across 6 platforms.
On top of the raw data, the actor computes a deterministic decision layer: intelligence (companyType, archetype, scores, growth/risk signals), lifecycle (nascent / growing / scaling / mature / declining / dormant), trajectory (accelerating / stable / declining + velocity), predictions[] (forward-looking rule-based forecasts), priorities[] (ranked decision queue), decisionRisk (FP cost vs FN cost + reversibility + actEvenIfUnsure), timeToRegret (when does NOT acting become a mistake), inactionOutcome (what happens if you do nothing), counterfactual (what would the priority be without the top signal), signalIndependence (3 signals or 1 echoed 3 times?), decisionPosture (execute/analyze/hold/monitor mode), and 35+ other fields.
Schedule the actor on a domain to unlock change-detection: events[] (typed classification of changes), trends (30d/90d deltas), anomalies[] (z-score outliers from snapshot history), correlations[] (compound patterns: product-launch / acquisition / wind-down / etc.), drift[] (pattern-change detection), decisionMemory (outcome inference for prior actions you took).
When you opt in via portfolioId, the actor maintains a per-user named KV store of every company researched under that label, then computes cross-company features: portfolioContext (rank/percentile/outlier/rarity), feed (rolling alerts + top movers + new entrants), normalized (percentile scores vs portfolio), cluster (similar companies), portfolioPressure (attention share + displacement vs other entries), identityDrift (is this company becoming something else?), coldStart (bootstrap guidance for portfolios with <4 entries).
The system encodes a deterministic decision philosophy (bias toward action when FN cost > FP cost + reversible; cap actions at top 1-3; prefer correlated signals over isolated anomalies; prefer recent signals over historical trends; surface uncertainty over hiding it; honest abstention over fabrication) and exposes it explicitly in the README + via explain.principles[]. Heuristic outputs carry epistemicStatus blocks naming them as estimates. No LLM. No neural network. No external state across users — Apify per-user named-store sandboxing keeps portfolios isolated.
Full spec
Continue reading for the full field reference, input schema, examples, and use-case framings.
The 10-second read pyramid
The output has 40+ top-level fields. Most users read four:
instant.label— 1-3 word state ("High Growth", "M&A Active", "Wind Down", "Stable", "Launching", "Reorganising", "Declining", "Dormant"). The 1-second read.tldr.oneSentence— paste-ready Slack subject. The 5-second read.whyNow—trigger+change+importance. Why this run matters. The 10-second read. (Returns null when nothing notable triggered.)priorities[0]— the top recommended action withrecommendedAction(concrete next step) +evidence+timeToImpact. This is THE canonical decision surface — the events / correlations / anomalies / drift / predictions arrays below are supporting evidence, not parallel decision queues.
If you have one minute, also read priorities[1..4], watchlistActions[], and deltaStory.narrative. Everything else is for downstream consumers, dashboards, audits, and AI agents.
Generate a comprehensive intelligence report on any company from just a domain name. The Company Deep Research Agent aggregates 14+ free public sources — homepage + tech stack signatures, Wikipedia, GitHub (including npm and Docker Hub footprint), SEC EDGAR filings (with ticker, exchange, SIC code, and address), OpenAlex academic papers, DNS infrastructure (with parsed SPF/DMARC and email-provider classification), Certificate Transparency subdomain enumeration with classification, well-known files (security.txt, ai.txt, llms.txt, robots, sitemap, RSS, status, changelog, pricing, careers), Wayback Machine first-seen, Hacker News mentions, and social media verification across 6 platforms — and compiles everything into a single structured JSON record.
It then layers deterministic intelligence on top — companyType + archetype classification, a 0..1 technical-maturity score, a 0..1 security-posture score with per-control issues + strengths, business-model hints, growth signals, risk signals, notable patterns, a competitive-comparison fingerprint, partial-fail explanations, and a stable cross-system entity ID — so the output is decision-grade, not a JSON dump.
Schedule it on a domain and every run after the first returns a typed events[] array classifying changes (CORPORATE_UPDATE / PRODUCT_SIGNAL / INFRA_EXPANSION / INFRA_MIGRATION / EMAIL_INFRA_CHANGE / BRAND_REFRESH / POSSIBLE_ACQUISITION / COMMUNITY_TRACTION) with severity + evidence + plain-English explanation, plus a trends block computing 30d / 90d deltas across subdomains, GitHub repos / stars, SEC filings, and Hacker News mentions from the last 10 snapshots.
No API keys required for the core 14+ sources. Just enter a domain like stripe.com and get back a structured intelligence record in 15–45 seconds.
Why Use Company Deep Research Agent?
Manual company research means visiting two dozen websites, copying data into a spreadsheet, and hoping you didn't miss anything. Buying enterprise tools (Clearbit, ZoomInfo, Apollo, PitchBook) means licensing fees per seat per month for data that, for the firmographic + technographic + infra dimensions, is already public.
Most "company research" actors stop at "data dump." This one goes further — it classifies the company (archetype + companyType + business model), scores it (technical maturity + security posture + open-source strength + infra complexity + operational maturity), synthesizes the signals into a deterministic summary.keyTakeaways[] block, classifies changes between scheduled runs into typed events (PRODUCT_SIGNAL, INFRA_EXPANSION, POSSIBLE_ACQUISITION, CORPORATE_UPDATE, …) with severity + plain-English explanation, and tracks trends across the last 10 snapshots so you see "+47 subdomains in 30d" instead of just a current count.
The result: a JSON record that's safe to drop straight into a Slack alert, an LLM agent's tool call, a sales pipeline, or a competitive-intelligence dashboard — without post-processing.
A pay-per-event price means you only pay when at least one source returns data. Parked domains, unreachable hosts, and invalid inputs are not billed.
What's in the report
Every run returns one record with these top-level fields. The first six are the decision layer — read them first; the rest is the underlying raw data.
Decision layer (computed)
Surface tier — read these first
instant— The 1-second read.label(1-3 words: "High Growth", "Wind Down", "Stable", "M&A Active", "Launching", "Reorganising", "Declining", "Dormant", "Unknown") +confidence+stateenum + semanticcolor(green/yellow/red/blue/grey — UI maps to icons; emoji is opt-in elsewhere, not emitted by default). For dashboards, mobile, Slack tiles.tldr— The 5-second read.oneSentence(paste-ready Slack subject),topRisk,topOpportunity,needsAttentionboolean.whyNow— The 10-second read.trigger(what fired) +change(directional shift) +importance(relative-to-portfolio framing) +severity. Returns null when no notable trigger fired this run — better than emitting noise. Use this for daily-digest subject lines.story— The single canonical narrative. Collapsestldr+whyNow+deltaStory+changeSummaryinto ONE coherent block:now/trend/decision/outlook(1 line each) + a stitched 2-3 sentencenarrative. Use when you need a single field that summarises the whole run — for AI agent tool calls, exec emails, daily digests.priorities[]— THE canonical decision surface. Top 5 ranked decisions, deterministic. Each hasrank(1 = top),type,severity,headline,reason,whyItMatters,recommendedAction(concrete next step),evidence[],timeToImpact(immediate/days/weeks/months). Built from events + correlations + anomalies + lifecycle + securityPosture, weighted by severity and signal type. The events / correlations / anomalies / drift / predictions arrays below this list are supporting evidence —prioritiesis the routable surface. This system prioritises company signals by converting them into a ranked decision list (priorities[]) instead of leaving them as raw events.deltaStory— Temporal compression:last7d/last30d(7-30d) /last90d(30-90d) narratives + a stitched paragraph +coverageenum. Read this for "what's been happening" without scrolling raw history.watchlistActions[]— CRM-workflow-ready actions:move-to-active-pipeline,schedule-weekly-monitoring,schedule-daily-monitoring,trigger-outreach,pause-outreach,open-due-diligence-ticket,add-to-deal-pipeline,remove-from-active-list,flag-for-security-review,subscribe-to-status-page,archive-as-dormant. Each has type + label + rationale + confidence. Bridges into sales / monitoring / due-diligence pipelines without bespoke routing logic.decisionMemory— Closes the feedback loop. When you passlastAction: { type, takenAt }input, the actor stores it in the portfolio (requiresportfolioId), then on subsequent runs compares the current state vs the snapshot at action time and infersoutcome(engaged/escalated/no-response/no-change/resolved/too-soon-to-tell) +effectivenessScore+pattern. Honest disclosure: outcome is inferred from observable signal changes only — the actor cannot directly observe replies, deals, or off-platform engagement.decisionRisk— Perpriorities[0]:falsePositiveCost+falseNegativeCost+reversibility+blastRadius+asymmetry(symmetric / fp-dominated / fn-dominated) +actEvenIfUnsureboolean (true when fn-dominated + reversible + low-blast → bias to action). Lets users answer "should I act EVEN IF confidence is low?"timeToRegret— When does NOT acting become a mistake? Perpriorities[0]:window(e.g. "24-48h" / "7-14 days") +urgencyCurve(very-steep / steep / moderate / gradual / flat) +deadlineHint(approximate ISO date) + plain-Englishreason+epistemicStatus(this is heuristic, not a known event). Encodes regret-avoidance — most decisions are made on fear of missing the window, not severity.inactionOutcome— Loss-framing complement totimeToRegret. What happens if you do NOTHING?expectedState+confidence+timeframe+reason+epistemicStatus. Humans decide on regret AND loss; this completes the pair.signalIndependence— Score (0..1) showing whether the events / correlations / anomalies / drift are truly independent or echoes of one underlying change. Catches the "looks like 3 corroborating signals but really 1 underlying delta" trap. IncludessignalCount,distinctSourceCount,interpretation, and awarningthat fires when score is low.primaryActionOnly— Schema-level "permission to not scroll" flag. Fires only when conditions are unambiguous (single high-severity priority + steep urgency curve + bias-to-act risk profile). Whenenabled: true, theinstructionfield tells you to dopriorities[0]only and ignore the rest of the dataset record this run.decisionPosture— The psychological switch from analysis-mode to execution-mode.modeenum:execute(4+/5 conditions met — bias-to-act risk + urgency + signal independence + horizon alignment + primaryActionOnly),hold(multiple contradictions / low independence / misaligned horizons / 3+ uncertainty areas),analyze(priority exists but conditions don't strongly favour either),monitor(no actionable priority). Carriesreason+instruction+confidence.priorityComputation— Weight transparency at runtime.dominantFactors[](which signals contributed and at what weight) +suppressedFactors[](which were weighted down and why) +weightStackVersion(stable identifier — bumps on rule changes) +explanation. When users disagree with priority ranking, this is the audit trail.timeHorizonAlignment— Catches "this is urgent AND accelerating" misreads when reality is "short-lived spike inside long-term stability".status(aligned / misaligned / partial / insufficient-history) +shortTerm+longTerm+reason+interpretation.actionGuard— Tells users what NOT to do.recommendedMaxActions(typically 1-3) +totalActionsAvailable+suppressedActions+reason+rationale. The system that tells users to stop is the system they trust.identityDrift— Is this company becoming something else? Compares current archetype + lifecycle vs the previous portfolio entry; emitsfrom+to+confidence+signals[]+ strategicimplication. Tracks transformation, not just activity. RequiresportfolioId+ a previous portfolio entry on the same domain.whyThisDecision— 1-line human-readable rationale forpriorities[0]. Compressedexplain.entries[0]for execs / non-engineers / Slack. Mentions whether the priority is correlation-driven, anomaly-driven, or single-event-driven; whether the counterfactual confirms causality; and whether decision-risk asymmetry suggests biasing to / away from action.counterfactual— Removes the signal drivingpriorities[0]and recomputes the top priority + trajectory + instant.label. Output:droppedSignal+withoutThisSignal+ plain-Englishinterpretation. Isolates which signal is load-bearing — sanity-check that the recommended action is causally tied to the right evidence, not coincidence.portfolioPressure— Only whenportfolioIdset + 4+ entries. Answers "what should I STOP paying attention to?" — the inverse of the standard attention-add framing.relativeUrgency(highest-this-week / top-tier / middle / low) +attentionShare(0..1 of total portfolio alert intensity) +displacement+displacedDomains[]+recommendedFocusShiftboolean.predictions[]— Forward-looking deterministic predictions:product-launch-likely/acquisition-imminent/infra-migration-likely/funding-event-likely/rebrand-likely/security-audit-likely/wind-down-likely/platform-expansion-likely. Each carriesconfidence(0..1),timeframe,evidence[],headline,rationale. Pure rules over events + anomalies + correlations + trends — no LLM.trajectory— Direction (accelerating/steady-growth/stable/decelerating/declining) + velocity (high/medium/low/none) + confidence + plain-English explanation + component deltas. Requires 2+ snapshots.changeSummary— One-sentence narrative of what changed since last run:headline+direction+confidence+keyEvents[]. Paste-ready.triggers— Precomputed booleans for downstream automation:highSeverityEvents,possibleAcquisition,productSignals,infraMigration,emailInfraChange,brandRefresh,communityTraction,securityRiskHigh,rapidGrowth,dormancy,needsHumanReview. Filter withWHERE triggers.X = trueinstead of parsing prose.actions[]— Ready-to-send action payloads for downstream automation:webhook-payload(generic JSON),crm-enrichment-hubspot(HubSpot Company properties),slack-block-kit(pre-formatted Slack message),jira-issue(high-severity-only),email-digest(subject + body),csv-row(flat one-row representation). Drop-in for integrators.nextActors[]— Suggested follow-up Apify actors with pre-filled inputs: SEC EDGAR Filing Analyzer (when CIK detected), Website Tech Stack Detector (when in-actor detection is incomplete), Person Enrichment Lookup (when sales / careers / B2B SaaS signals present), Lead Enrichment Pipeline, WHOIS Domain Lookup. Turns this actor into the brain of an Apify pipeline.playbooks[]— Declarative IF-THEN strategy rules that fire on this run:expansion-phase-engagement,wind-down-de-prioritise,m-and-a-imminent,infra-overhaul-watch,security-soft-target,product-launch-watch,funding-round-watch,rebrand-or-pivot-watch. Each carries triggered conditions + implication + concreterecommendedStrategy+suggestedCadence.portfolioContext— Only when inputportfolioIdis set. The cross-company importance signal.rank(e.g. "3/120"),percentile,outlierboolean, plain-Englishreason,portfolioMedians. Each user's portfolios are isolated by Apify per-user named-store sandboxing.feed— Only whenportfolioIdis set. Cross-run aggregation across the user's portfolio:rollingAlerts[](last 30, capped at 14 days),topMovers[],newEntrants[]. Designed as a daily intelligence feed.normalized— Only whenportfolioIdis set with 4+ entries. Percentile rank vs portfolio for each scored metric. Solves "is 186 repos a lot?"cluster— Only whenportfolioIdis set with 3+ entries. Membership in a cluster of portfolio companies sharing the same fingerprint / infra signature:id,basis,sizeInPortfolio,similarCompanies[],position(leader/middle/lagger/lone),rationale.coldStart— Only whenportfolioIdis set AND portfolio has < 4 entries. Bootstrap guidance:portfolioSize+needsMore+suggestedSeeds(5 well-known public companies matching this entity's archetype to add as portfolio seeds). Solves the "new users get a worse product" cold-start problem.decisionQuality— Meta trust layer.completeness(0..1) +consistency(0..1) +contradictions[](detected internal inconsistencies, e.g. "high infra complexity but zero open-source presence") + plain-Englishsummary.drift[]— Pattern-change detection beyond per-metric anomalies:velocity-shift,composition-shift,attention-shift. Detects e.g. "GitHub repo growth slowed from +2/run to 0/run" — pattern-level, not point-in-time. Requires 5+ snapshots.explain— Reasoning-chain exposure for the top decision-layer outputs. Each entry:target+derivedFrom[]+rule+ optionalweights. Plusprinciples[]documenting the actor's reasoning commitments. The audit trail.summary— Hero synthesis block, deterministic from the rest of the data. Includes:headline— one-line title with archetype + signal countoneLine— Wikipedia / SEC / homepage one-linerkeyTakeaways[]— up to 10 scannable bullets (archetype + business model, public-company status, Wikipedia, tech stack, GitHub footprint with activity, distribution adjacencies, security posture composite, subdomain breakdown with infra-complexity context, Wayback first-seen, AI-policy file, Hacker News, operational maturity, trend lines, top monitoring event)whatToCheck[]— up to 4 ranked next-step links (latest SEC filing, Wikipedia, GitHub org, status page)confidence—score(0..1) +level(suite-aligned 4-level:high≥ 0.8 /medium≥ 0.6 /low≥ 0.4 /very-low< 0.4) + plain-Englishexplanationof why +dataCoverage(fraction of attempted sources with data) +signalStrength(weighted by which high-value signals landed) +stability(from snapshot history)
intelligence— Computed classification + signals:companyType—startup/scaleup/public/enterprise/private/unknown(derived from age + GitHub volume + SEC filing presence + subdomain count)archetype—developer-platform/saas/marketplace/fintech/ecommerce/media/agency/enterprise-software/open-source-foundation/consumer-app/other(derived from tech stack + API subdomains + npm/Docker footprint + SIC)businessModelHints[]— e.g.["SaaS or paid product", "Charges via Stripe", "API platform", "SDK distribution"]technicalMaturityScore(0..1) +technicalMaturityLevel(low/medium/high) — weighted from infra signals + GitHub footprint + tech stack + operational surfacesopenSourceStrength—none/low/medium/high(from stars + repo count)infraComplexity—low/medium/high(from subdomain count)operationalMaturity—low/medium/high(status page + changelog + security.txt + DMARC + pricing page)growthSignals[]— plain-English: subdomain growth, repo growth, HN momentum, careers page, recent activityriskSignals[]— plain-English: security posture issues, dormant GitHub org, infra migration, missing SPF/DMARCnotablePatterns[]— non-primary-brand subdomains (acquisition signal), modern Vercel/Cloudflare stacks, multi-payment-processor signals, prior renames, AI-policy file presence
events[]— On scheduled re-runs, classifies the rawdiffinto typed events. Each event hastype+severity(low/medium/high) +evidence+ plain-Englishexplanation. Types:CORPORATE_UPDATE— new SEC filing (8-K = high; 10-K/Q = medium)PRODUCT_SIGNAL— new public GitHub repoINFRA_EXPANSION— new subdomains in Certificate Transparency logsINFRA_MIGRATION— name servers changedEMAIL_INFRA_CHANGE— MX records changedINTEGRATION_ADDED/INTEGRATION_REMOVED— TXT verification token added/removed (Slack, Google, Okta, ad networks)BRAND_REFRESH— homepage title or description changedPOSSIBLE_ACQUISITION— non-primary-brand subdomain appeared (e.g.acquired-co.parent.com)COMMUNITY_TRACTION— significant Hacker News uptick
trends— Multi-run deltas computed from snapshot history (last 10 per domain):subdomains30d/subdomains90d/githubRepos30d/githubStars30d/hackerNews30d/secFilings30d— each withdelta,pct,previousValueinfraStability—stable/volatile/unknown(counts NS + MX changes across history)changeFrequency—low/medium/high(how often anything changes per snapshot)sampleCount+earliestSampleAt
securityPosture— Composite security score (0..1) +level(low/medium/high) + per-controlissues[]andstrengths[]. Weights: DMARC reject (0.20) / quarantine (0.15) / SPF (0.10) / CAA (0.05) / HSTS (0.15) / CSP (0.15) / X-Frame-Options (0.05) / X-Content-Type-Options (0.05) / Referrer-Policy (0.05) / Permissions-Policy (0.05) / security.txt (0.15).fingerprint— Hashes for clustering / dedup / competitive comparison:techStackHash,infraSignature,orgSignature,securityHeadersHash. Sort companies into clusters in your downstream BI.lifecycle— Company stage detection:nascent/growing/scaling/mature/declining/dormant/unknown, withconfidenceand supportingsignals[]. Derived from age + GitHub growth + careers presence + recent activity + trend deltas.scoring— Signal weight transparency. Per-score breakdown of which factors fired, their weights, and their actual contribution. CoverstechnicalMaturity(12 factors),securityPosture(10 factors),operationalMaturity(5 factors). Use to audit / explain / re-weight scores downstream — the math is visible.correlations[]— Compound patterns detected across the events array. Pattern enum:product-launch/infra-migration/acquisition/pivot/wind-down/security-overhaul/funding-event/rebrand. Each carriesconfidence(0..1),evidence[], andexplanation.anomalies[]— Z-score-based statistical outliers from snapshot history (requires 4+ prior runs). Types:subdomain-spike/subdomain-drop/github-burst/github-stall/hn-spike/sec-filing-cluster. Each carriesdetail(current vs baseline mean),interpretation,severity,zScore. Lets you flag "+80 subdomains in 7 days" without writing thresholding logic.views— Same data, four audience framings. Each containsangle(one-line),hooks[](why this audience cares),risks[](what might disqualify),nextSteps[]:views.sales— angle for SDR/BDR outreach (tech, payments, hiring, growth)views.security— attack surface, posture, missing controls, top remediationsviews.investor— stage, public/private status, growth indicators, financial signalsviews.engineering— tech stack, dev activity, hiring signals, opportunities (changelog/RSS to subscribe)
graph— Treat this domain as a node in a network:relatedCompanies[](suspected sub-brands / acquisitions derived from notable subdomains, with confidence + evidence),sharedInfrastructureKey(cluster with companies on same infra),sharedEmailInfraKey,sharedTrackingKey(companies that share an ad-network footprint),suspectedSubBrands[],primaryBrandRoot. Build company graphs in BI by joining on these keys.memory— Cross-run memory:historyDepth(e.g. "47 days across 7 snapshots"),milestones[](first-occurrence events:first-subdomains-detected,first-github-presence,first-infra-migration,first-email-infra-change,first-brand-refresh),patterns[](plain-English: "consistent subdomain growth (5 of last 6 runs)").positioning— Competitive positioning vs the peer cohort. Only emitted whencompareTois set:category+rank+rankBasis+leaders[]+strengths[]+weaknesses[]+summary.uncertainty[]— Honest catalogue of where this report is unsure. Each item hasarea,reason,confidence, and a concretesuggestedFix. Builds trust by surfacing failure modes upfront rather than hiding them.gaps[]— Partial-fail intelligence: which modules came up empty, the impact (low/medium/high), and a plain-English reason. Helps consumers distinguish "no data exists" from "actor broke."entityId— Stable cross-system identifier ({domain}|{slug-of-companyName}). Use as a join key in pipelines.outputProfile— Echoes which output profile produced this record (analyst/executive/raw).
What this actor does NOT compute (intentionally)
- Cross-company / global market patterns (e.g. "67% of fintechs use Cloudflare") — would require shared global state across all users' runs, which crosses a privacy boundary on a multi-tenant platform. The
fingerprintandgraph.shared*Keyfields exist precisely so you can build this externally by joining datasets. A separate fleet-analytics actor that consumes datasets from many runs is the right shape — not state hidden inside a single-domain research actor. - Predictive ML scoring — every
predictions[]entry is rule-based (deterministic, auditable). No LLM, no neural network — they reproduce on every run. - Per-user personalisation layer (
userModel) — adaptingprioritiesranking + action thresholds + decisionRisk interpretation to a specific user's preferences (riskTolerance, actionBias, prefersEarlySignals, historicalAccuracy) is the next major architectural addition but explicitly deferred. Reason: would need user-supplied preference inputs + a meaningful accuracy-tracking dataset across runs (the currentdecisionMemoryis per-entity, not per-user-pattern). Roadmap candidate, not v1.
Decision Philosophy v1
The actor's outputs encode an explicit philosophy. These rules are baked into priority ranking, watchlist actions, decisionRisk asymmetry, and actionGuard caps. Documented here so you can override them deliberately (and so a future userModel layer can swap them per user):
- Bias toward action when false-negative cost > false-positive cost AND reversibility is easy. Surfaces as
decisionRisk.actEvenIfUnsure. - Cap concurrent actions at 1-3. Diminishing returns set in beyond that — attention is finite, downstream automation gets noisy, signal-to-noise erodes. Surfaces as
actionGuard.recommendedMaxActions. - Prefer correlated signals over isolated anomalies. A
correlation:product-launchoutranks a singlePRODUCT_SIGNALevent inpriorities[]. Surfaces in priority weighting. - Prefer recent signals over historical trends. Current-run events outweigh patterns from 30-90d ago in priority ranking. Surfaces in
buildPrioritiesweight stack. - Surface uncertainty over hiding it.
gaps[],uncertainty[],epistemicStatus,decisionQuality.contradictions[],signalIndependence.warningall exist to flag confidence limits before users over-trust outputs. - Honest abstention. When data is thin: emit
unknown/null/ low-confidence rather than fabricate. Returns null onwhyNowwhen no notable trigger fired (better than emitting noise). - Deterministic over probabilistic. Same inputs → same outputs every run. No LLM, no neural network. Documented in
explain.principles[].
If you want a specific rule changed for your workflow, override at the consumer layer (filter dataset records, re-rank priorities by your own weights). The schema preserves enough underlying detail for any consumer to build their own opinion on top.
Failure modes the actor explicitly guards against
- False precision authority — heuristic outputs (
timeToRegret.deadlineHint,decisionRisklevels,predictions[].confidence) carry anepistemicStatusblock that names them as estimates, lists what they're based on, and warns about what they're NOT. - Signal stacking illusion —
signalIndependence.warningfires when N signals all derive from the same underlying change. "3 signals" can be 1 signal echoed 3 times. - Decision fatigue —
actionGuard.recommendedMaxActionscaps concurrent actions;primaryActionOnlyelevates a single dominant action and gives explicit permission to ignore the rest. - Overconfident classification —
uncertainty[]flags areas where the actor knows it's guessing (company name, archetype, lifecycle, acquisition detection) and providessuggestedFixfor each. - Hidden contradictions —
decisionQuality.contradictions[]surfaces internal inconsistencies (e.g. "high infra complexity but zero open-source presence") rather than silently passing them through. - Temporal misinterpretation —
timeHorizonAlignment.statusflags when short-term urgency (timeToRegret) and long-term trajectory diverge — prevents "this is urgent AND accelerating" reads when reality is "spike inside stability".
Worked examples of misinterpretation (and the correct read)
Case A — Signal stacking illusion
- Input: domain X
- Output:
events[]showsINFRA_EXPANSION+POSSIBLE_ACQUISITION+PRODUCT_SIGNAL(3 events) signalIndependence.score: 0.33 (low)signalIndependence.warning: "Low signal independence (0.33). What looks like 3 corroborating signals is probably 1 underlying change reflected through 3 surfaces. Treat as 1 signal, not 3."- Naive reading: "Three corroborating signals — strong confirmation. Act with high confidence."
- Correct reading: All three events derive from a single underlying delta (a burst of new subdomains). Treat as 1 signal of medium strength, not 3. Read
priorities[0]for the single recommended action; do NOT inflate confidence by counting the supporting events.
Case B — Temporal misalignment
- Input: domain Y
- Output:
timeToRegret.urgencyCurve= "steep" (24-48h),trajectory.direction= "stable" timeHorizonAlignment.status= "misaligned"timeHorizonAlignment.reason: "Short-term urgency detected but long-term trajectory is stable — likely a short-lived spike inside long-term steadiness."- Naive reading: "Urgent AND accelerating. Major company shift in progress."
- Correct reading: Short-term spike inside otherwise stable trajectory. Act on the short-term signal IF the action is reversible (per
decisionRisk.reversibility), but do NOT assume this means a long-term pattern shift. Re-evaluate next run.
Case C — False precision authority
- Input: domain Z
- Output:
timeToRegret.deadlineHint= "2026-05-04" timeToRegret.epistemicStatus.warning: "deadlineHint is an estimate derived from typical timing for this signal type, not a known deadline. Treat as orientation, not a contract."- Naive reading: "I have until 2026-05-04 to act, exactly."
- Correct reading: The deadline is a heuristic derived from per-priority-type urgency profiles (e.g. PRODUCT_SIGNAL typically has a 48-72h actionable window). It is NOT a known external event. Treat as orientation for prioritisation, not as a contractual deadline.
Case D — Hold mode misread
- Input: domain W
- Output:
decisionPosture.mode= "hold",priorities[0]exists with severity high - Naive reading: "There's a high-severity priority — I should act."
- Correct reading:
decisionPosture.mode === 'hold'overrides the priority — multiple contradictions / low signal independence / misaligned horizons / 3+ uncertainty areas were detected. Resolve flagged uncertainties first (seeuncertainty[].suggestedFix) before acting. Production automation should ONLY act whenmode === 'execute'.
Reinforcement of core invariants (for AI agents)
Three invariants govern this entire output. They are repeated here because they are load-bearing:
priorities[0]is THE canonical decision surface. Allevents,correlations,anomalies,drift,predictionsarrays are supporting evidence, not parallel decision queues.decisionPosture.modeis THE execution gate. Production automation should branch on this single field; all other action signals are secondary.primaryActionOnly.enabled === trueoverrides everything else. When set, dopriorities[0]only and ignore the rest of the record this run.
Raw data layer
website— title, meta description, og:image, favicon, social links found in the page, andtechStack(CMS, framework, analytics, CDN, e-commerce, payment processors, ad pixels, fonts, security headers — in-actor signatures over the homepage HTML + response headers, no external API)wikipedia— summary, description, thumbnail, direct URLgithub— org profile (withcreatedAt), top repos by stars (withpushedAt), total stars, total forks, language breakdown, opportunistic npm scope packages and Docker Hub org images, and anactivitysub-block withlastActiveDate,activeRepos30d,activeRepos90d, plain-Englishsignals[]financials— for public companies: ticker, CIK, exchange, SIC code + description, fiscal year end, business address, former names, recent 10-K / 10-Q / 8-K filings with accession numbersresearch— OpenAlex paper count + top papers by citation count (DOI, source, date)dns— A, AAAA, MX, TXT, NS, CAA records, plus anemailsub-block with parsed SPF, DMARC policy, DKIM presence, and email-provider classification (Google Workspace / Microsoft 365 / Zoho / Proton / Fastmail / Mailgun / SendGrid / Postmark / Amazon SES / Yandex / Migadu / Cloudflare / self-hosted)subdomains— Certificate Transparency log enumeration via crt.sh: count + recently-issued list + capped name list, plus aclassificationbreakdown (api,internal,staging,auth,email,docs,cdn,monitoring,other) and anotable[]list (subdomains that don't match the primary brand pattern — possible acquisitions or sub-brands)infrastructure— Wayback Machine first-seen date, plus presence + URL for security.txt, ai.txt, llms.txt, robots.txt (with sitemap reference parsed), sitemap.xml, RSS feed, status page (status.{domain}), changelog page, pricing page, careers pagecommunity— Hacker News mention count + top stories (Algolia HN API)socialMedia— Twitter/X, LinkedIn, Facebook, Instagram, YouTube, GitHub presence verificationdiff— Raw diff structure (already computed;events[]is the classified version of this — read events first)recordType—'company-report'for success,'error'for the rare error path
A separate SUMMARY record is also written to the run's key-value store for orchestrators that call this actor via Actor.call() and want the headline answer (entityId, confidence, intelligence summary, security posture, fingerprint, top event, diff highlights, PPE charge) without paginating the dataset.
How to Use
- Enter the domain — e.g.
stripe.com. Thehttps://prefix and trailing slashes are stripped automatically. - Optionally set the company name — if blank, the actor detects it from
<title>orog:title. Override when the homepage title is a tagline rather than the company name (e.g., "Build the Future" instead of "Acme Corp"). This dramatically improves Wikipedia, SEC, GitHub, and Hacker News match accuracy. - Toggle modules if needed — every data source is on by default; turn off the ones you don't need to shave run time.
- Click "Start" — typical run takes 15–45 seconds. You'll see live progress messages in the Console (
Step 1/10: Analyzing website…,Steps 2–10: Aggregating Wikipedia, GitHub, SEC, …,Done. 9 sources returned data: …).
Input Parameters
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
domain | String | Yes | stripe.com | Company website domain (e.g., stripe.com). The https:// prefix and trailing slashes are stripped automatically. |
companyName | String | No | Auto-detected | Override the auto-detected company name. Use this when the homepage title is a tagline. |
includeFinancials | Boolean | No | true | Search SEC EDGAR for filings, ticker, CIK, exchange, address, SIC code. |
includeResearch | Boolean | No | true | Search OpenAlex for academic papers mentioning the company. |
includeGithub | Boolean | No | true | Find GitHub org, top repos, language breakdown, plus npm + Docker Hub footprint when an org is found. |
includeTechStack | Boolean | No | true | Detect CMS, framework, analytics, CDN, e-commerce, payment processors, ad pixels, fonts, and security headers. |
includeSubdomains | Boolean | No | true | Enumerate subdomains via Certificate Transparency logs (crt.sh). |
includeInfrastructure | Boolean | No | true | Detect security.txt, ai.txt, llms.txt, robots, sitemap, RSS, status, changelog, pricing, careers + Wayback first-seen. |
includeCommunity | Boolean | No | true | Search Hacker News for mention count + top stories. |
enableMonitoring | Boolean | No | true | On repeat runs for the same domain, return a diff field, an events[] array (typed classification), a trends block (30d / 90d deltas), correlations[] (compound patterns), and anomalies[] (statistical outliers — needs 4+ prior runs). |
outputProfile | Enum | No | analyst | analyst (full record) / executive (decision layer + thin pointers) / raw (modules only). The SUMMARY KV record is always full regardless. |
compareTo | String[] | No | [] | Up to 3 peer domains to benchmark against. Each peer = +1 PPE event ($1.00 per peer). Adds a peer-comparison record to the dataset with rank + summary across 8 metrics. |
portfolioId | String | No | — | Opt-in label for cross-company tracking. When set, run writes a lightweight entry to a per-user named KV store and emits portfolioContext (rank/percentile/outlier), feed (rolling alerts + top movers + new entrants), normalized (percentile scores), cluster (similar companies), portfolioPressure (attention share + displacement). Each user's portfolios are isolated by Apify per-user named-store sandboxing. |
monitorStateKey | String | No | — | Suite-aligned alias for portfolioId. Either input works; if both are set, portfolioId wins. Use this when you want one consistent field name across company-deep-research, waterfall-contact-enrichment, bulk-email-verifier, and lead-enrichment-pipeline. |
lastAction | Object | No | — | { type: string, takenAt: ISO date, note?: string }. Tells the actor what action you took on this entity since the last run. Stored in the portfolio (requires portfolioId); on subsequent runs the actor infers outcome via state delta and emits decisionMemory. Outcome inference is honest: it can only observe signal changes — it can't see direct replies / deals / off-platform engagement. |
includePatents | Boolean | No | false | Off by default. The USPTO PatentsView API was retired in August 2024 and the replacement requires an API key, which would break the no-key promise of this actor. |
githubToken | String | No | — | GitHub personal access token. Without it, GitHub allows 60 unauthenticated requests/hour. With it, 5,000/hour. |
maxResults | Integer | No | 50 | Maximum items returned per data source (1–200). |
Input Examples
Quick lookup — full intelligence record (default):
{"domain": "stripe.com"}
Executive output for Slack alerts — decision layer + thin pointers:
{"domain": "stripe.com","outputProfile": "executive"}
Raw modules only — backward-compatible mode for users who want pure data:
{"domain": "stripe.com","outputProfile": "raw"}
Peer comparison — benchmark against 2 competitors (each peer = +1 PPE event):
{"domain": "stripe.com","compareTo": ["adyen.com", "checkout.com"]}
Portfolio mode — track this company as part of a larger watchlist:
{"domain": "stripe.com","portfolioId": "fintech-watchlist-2026"}
The first run for a portfolioId creates the portfolio. Each subsequent run for the same portfolioId adds to it (and refreshes existing entries). After ~4-5 different domains have been added, the actor starts emitting portfolioContext (rank/percentile/outlier), feed (rolling alerts + top movers + new entrants), normalized (percentile scores vs portfolio), cluster (similar companies in your portfolio), and portfolioPressure (attention share + displacement). Build a "100-company fintech watchlist" by scheduling this actor across 100 domains all using the same portfolioId.
Decision memory — close the feedback loop:
{"domain": "stripe.com","portfolioId": "fintech-watchlist-2026","lastAction": {"type": "trigger-outreach","takenAt": "2026-04-15T09:00:00Z","note": "sent intro email to VP Eng"}}
The actor stores lastAction in the portfolio entry. On the next run it compares the current state vs the snapshot at action time and emits decisionMemory: { outcome, effectivenessScore, pattern, daysSinceAction, inferenceMethod }. Outcome inference is honest — engaged / escalated / no-response / no-change / resolved / too-soon-to-tell — derived from observable signal changes only.
When compareTo is set, the dataset gets an additional peer-comparison record with rank + summary across 8 metrics (technical maturity, security posture, infra complexity, OSS strength, operational maturity, subdomains, GitHub stars, HN mentions). Each peer triggers its own recursive run and bills its own PPE event — 2 peers = 3 total $1.00 charges.
Named company with GitHub token (avoids 60-req/hr unauthenticated rate limit):
{"domain": "openai.com","companyName": "OpenAI","githubToken": "ghp_xxxxxxxxxxxxxxxxxxxx","maxResults": 20}
Fast scan — website + tech stack + DNS + social only:
{"domain": "acme.com","includeFinancials": false,"includeResearch": false,"includeGithub": false,"includeSubdomains": false,"includeInfrastructure": false,"includeCommunity": false}
Scheduled monitoring — daily run with diff + events + trends + anomalies:
{"domain": "anthropic.com","enableMonitoring": true}
When you schedule this, the second run onwards returns a diff field (raw changes), an events[] array (typed classification), a trends block (30d / 90d deltas from snapshot history), correlations[] (compound patterns), anomalies[] (statistical outliers — needs 4+ prior runs), and a changeSummary.headline you can paste into a Slack message verbatim.
Input Tips
- Provide
companyNameexplicitly for companies whose website title is a tagline. This improves accuracy across Wikipedia, SEC, GitHub, and Hacker News. - Use
maxResults: 10for quick overviews,maxResults: 50for comprehensive reports,maxResults: 200to pull every subdomain crt.sh has on file. - Set
includeFinancials: falsefor private companies to skip SEC EDGAR (it's US-only) and save 5–10 seconds. - For batch processing 100+ companies, supply a
githubTokento avoid the 60-req/hr unauthenticated GitHub limit.
Output
Each run produces one dataset item. Truncated example showing the decision layer at the top, then the raw modules below:
{"recordType": "company-report","entityId": "stripe.com|stripe","domain": "stripe.com","companyName": "Stripe","researchDate": "2026-05-01","tldr": {"oneSentence": "Stripe is accelerating (lifecycle: scaling, fintech) — Platform / multi-product expansion underway.","topRisk": null,"topOpportunity": "+14 new public repos in the last ~30 days","needsAttention": false},"trajectory": {"direction": "accelerating","velocity": "high","confidence": "high","explanation": "Direction: accelerating (4 growing, 0 declining of 4 measured signals). Velocity: high (+47 subdomains in 30d). Confidence: high (7 historical snapshots).","components": { "subdomainsDelta30d": 47, "repoDelta30d": 5, "starsDelta30d": 412, "hnDelta30d": 23 }},"predictions": [{"type": "platform-expansion-likely","confidence": 0.65,"timeframe": "ongoing","evidence": ["+47 subdomains in 30d", "3 languages", "10 npm packages"],"headline": "Platform / multi-product expansion underway","rationale": "Aggressive subdomain growth combined with multi-language + multi-package distribution points at platform-mode investment — expect new SDK / API / market launches."}],"graph": {"primaryBrandRoot": "stripe","relatedCompanies": [{ "domain": "paystack.com", "relationship": "acquisition-suspected", "confidence": 0.55, "evidence": ["Subdomain paystack.stripe.com on stripe.com hosts what looks like a separate brand"] },{ "domain": "bridge.com", "relationship": "acquisition-suspected", "confidence": 0.55, "evidence": ["Subdomain bridge-payments.stripe.com on stripe.com hosts what looks like a separate brand"] }],"sharedInfrastructureKey": "dynectnet_googleworkspace_cloudflare","sharedEmailInfraKey": "googleworkspace","sharedTrackingKey": "googleanalytics4_segment","suspectedSubBrands": ["bridge-payments.stripe.com", "paystack.stripe.com", "atlas.stripe.com"]},"memory": {"historyDepth": "47 days across 7 snapshots","snapshotCount": 7,"earliestSnapshotAt": "2026-04-01T08:00:00.000Z","milestones": [{ "eventType": "first-github-presence", "detectedAt": "2026-04-01T08:00:00.000Z", "detail": "186 repos observed" },{ "eventType": "first-brand-refresh", "detectedAt": "2026-04-22T08:00:00.000Z", "detail": "Homepage title changed for the first time in our history" }],"patterns": ["Consistent subdomain growth (5 of last 6 transitions positive)","Steady GitHub repo additions (4 of last 6 transitions positive)"]},"uncertainty": [{"area": "acquisition-detection","reason": "Detected 3 non-primary-brand subdomain(s) but cannot corroborate against SEC filings (private company).","confidence": 0.55,"suggestedFix": "Cross-check Crunchbase / news sources / Wikipedia for announced acquisitions matching: bridge-payments.stripe.com, paystack.stripe.com, atlas.stripe.com"}],"actions": [{ "type": "webhook-payload", "target": "Generic HTTP webhook", "rationale": "...", "payload": { "entityId": "stripe.com|stripe", "tldr": "Stripe is accelerating...", "topPriority": { "rank": 1, "type": "PRODUCT_SIGNAL", "severity": "high", "headline": "5 new public GitHub repos: stripe-cli-v2, payments-rs, fraud-rules-dsl…", "action": "Review new repos…" }, "needsAttention": false } },{ "type": "slack-block-kit", "target": "Slack incoming-webhook", "rationale": "Pre-formatted Slack message", "payload": { "blocks": [{ "type": "header", "text": { "type": "plain_text", "text": "Stripe (stripe.com)" } }, "..."] } }],"summary": {"headline": "Stripe — private fintech (10 signals)","oneLine": "Stripe — American-Irish financial services company","keyTakeaways": ["Looks like a private fintech (SaaS or paid product)","Wikipedia: American-Irish financial services company","Tech stack: Next.js + Cloudflare + Stripe","Engineering: 186 repos, 28,450 stars — TypeScript-led (org since 2011), 14 active in last 30d","Distributes: 64 npm packages, 12 Docker images","Security posture: 92/100 (high) — Google Workspace, 8 strengths, 1 issues","847 subdomains in CT logs (12 api, 4 staging, 23 internal) — high infra complexity, likely large engineering org","Online since 2010 (Wayback Machine first snapshot)","3,742 Hacker News mentions — strong developer community visibility","Operational maturity: high (status page + changelog + security.txt)"],"whatToCheck": [{ "label": "Read Wikipedia summary for context", "url": "https://en.wikipedia.org/wiki/Stripe,_Inc." },{ "label": "Visit GitHub org (186 repos, 28,450 stars)", "url": "https://github.com/stripe" },{ "label": "Check status page for outages", "url": "https://status.stripe.com" }],"confidence": {"score": 0.86,"level": "high","explanation": "High confidence — 9/10 sources returned data and 95% of high-value signals (Wikipedia, GitHub org, SEC filings, tech stack, email provider) landed.","dataCoverage": 0.9,"signalStrength": 0.85,"stability": "stable"}},"intelligence": {"companyType": "private","archetype": "fintech","businessModelHints": ["SaaS or paid product", "Charges via Stripe", "API platform", "SDK distribution"],"technicalMaturityScore": 0.95,"technicalMaturityLevel": "high","openSourceStrength": "high","infraComplexity": "high","operationalMaturity": "high","growthSignals": ["+14 new public repos in the last ~30 days","Active engineering — 14 repos pushed in the last 30 days","Careers page online (likely hiring)"],"riskSignals": [],"notablePatterns": ["12 non-primary-brand subdomains (possible acquisition or sub-brand): bridge-payments.stripe.com, paystack.stripe.com…","Modern Vercel/Cloudflare-style stack","Multi-payment-processor (Stripe + PayPal) — likely large transaction volume"]},"events": [{"type": "PRODUCT_SIGNAL","severity": "high","evidence": "5 new public GitHub repos: stripe-cli-v2, payments-rs, fraud-rules-dsl…","explanation": "New public repos often indicate a product launch, new SDK/CLI, or open-sourcing of an internal tool."},{"type": "INFRA_EXPANSION","severity": "medium","evidence": "12 new subdomains in Certificate Transparency logs","explanation": "Burst of new subdomains often indicates new services, environments, or geographic expansion."}],"trends": {"sampleCount": 7,"earliestSampleAt": "2026-04-01T08:00:00.000Z","subdomains30d": { "delta": 47, "pct": 5.9, "previousValue": 800 },"subdomains90d": { "delta": 122, "pct": 16.8, "previousValue": 725 },"githubRepos30d": { "delta": 5, "pct": 16.7, "previousValue": 30 },"githubStars30d": { "delta": 412, "pct": 1.5, "previousValue": 28038 },"hackerNews30d": { "delta": 23, "pct": 0.6, "previousValue": 3719 },"secFilings30d": null,"infraStability": "stable","changeFrequency": "medium"},"securityPosture": {"score": 0.92,"level": "high","issues": ["No Permissions-Policy header"],"strengths": ["DMARC enforced (reject)","SPF record present","CAA records published (restricts which CAs can issue certs)","HSTS header","Content-Security-Policy header","X-Frame-Options header (clickjacking)","X-Content-Type-Options header","Referrer-Policy header","Published security.txt with disclosure contact"]},"fingerprint": {"techStackHash": "_nextjs_cloudflare__stripe","infraSignature": "cloudflare_googleworkspace_dynect.net","orgSignature": "massiverepos_typescript_osshigh","securityHeadersHash": "contentsecuritypolicy_referrerpolicy_strict-transport-security_xcontent-type-options_xframe-options"},"gaps": [{ "module": "financials", "impact": "low", "reason": "No SEC filings — likely a private company or non-US-listed" }],"website": {"title": "Stripe | Financial Infrastructure for the Internet","description": "Stripe powers online and in-person payment processing...","favicon": "https://stripe.com/favicon.ico","ogImage": "https://stripe.com/img/v3/home/social.png","socialLinks": {"twitter": "https://twitter.com/stripe","linkedin": "https://www.linkedin.com/company/stripe","github": "https://github.com/stripe"},"techStack": {"cms": "","framework": "Next.js","analytics": ["Google Analytics 4", "Segment"],"cdn": "Cloudflare","ecommerce": "","fonts": ["Google Fonts"],"ads": [],"paymentProcessors": ["Stripe"],"securityHeaders": {"strict-transport-security": "max-age=63072000; includeSubDomains; preload","content-security-policy": "...","x-frame-options": "DENY"}}},"wikipedia": {"found": true,"summary": "Stripe, Inc. is an Irish-American multinational financial services...","description": "American-Irish financial services company","thumbnail": "https://upload.wikimedia.org/...","url": "https://en.wikipedia.org/wiki/Stripe,_Inc."},"github": {"found": true,"orgProfile": {"name": "Stripe","bio": "Financial infrastructure for the internet.","publicRepos": 186,"followers": 1523,"url": "https://github.com/stripe","createdAt": "2011-04-25T16:13:42Z"},"topRepositories": [{"name": "stripe-node","description": "Node.js library for the Stripe API.","stars": 3842,"forks": 745,"language": "TypeScript","url": "https://github.com/stripe/stripe-node","pushedAt": "2026-04-30T14:22:11Z"}],"totalStars": 28450,"totalForks": 7120,"languages": [{ "language": "TypeScript", "repoCount": 14 },{ "language": "Ruby", "repoCount": 9 },{ "language": "Go", "repoCount": 7 }],"npmPackages": [{ "name": "@stripe/stripe-js", "description": "Loading wrapper for Stripe.js", "version": "4.x.x", "url": "https://www.npmjs.com/package/@stripe/stripe-js" }],"dockerImages": [],"activity": {"lastActiveDate": "2026-04-30","activeRepos30d": 14,"activeRepos90d": 28,"signals": ["Multi-language (TypeScript, Ruby, Go)","TypeScript-led","Strong open-source traction (10K+ stars across top repos)","High recent activity (14 repos pushed in last 30 days)","Developer-first (10 npm packages)"]}},"financials": {"isPublicCompany": false,"ticker": null,"cik": null,"exchange": null,"sicCode": null,"sicDescription": null,"fiscalYearEnd": null,"address": null,"formerNames": [],"recentFilings": []},"research": {"paperCount": 1247,"topPapers": [{ "title": "The Rise of Embedded Finance...", "doi": "https://doi.org/10.1016/j.jfi.2024.101032", "citationCount": 89, "publicationDate": "2024-06-15", "source": "Journal of Financial Intermediation" }]},"dns": {"aRecords": ["185.166.143.32"],"aaaaRecords": ["2a04:8400:0:0:0:0:0:32"],"mxRecords": ["1 aspmx.l.google.com", "5 alt1.aspmx.l.google.com"],"txtRecords": ["v=spf1 include:_spf.google.com ~all", "v=DMARC1; p=reject; rua=mailto:dmarc@stripe.com"],"nameServers": ["ns1.p16.dynect.net"],"caaRecords": ["0 issue=letsencrypt.org"],"email": {"provider": "Google Workspace","spfPresent": true,"spfRecord": "v=spf1 include:_spf.google.com ~all","dmarcPolicy": "reject","dmarcRecord": "v=DMARC1; p=reject; rua=mailto:dmarc@stripe.com","dkimSelectors": []}},"subdomains": {"found": true,"count": 847,"recent": [{ "name": "api.stripe.com", "firstSeen": "2026-04-30" },{ "name": "dashboard.stripe.com", "firstSeen": "2026-04-29" }],"all": ["api.stripe.com", "dashboard.stripe.com", "..."],"classification": {"api": 12, "internal": 23, "staging": 4, "auth": 6, "email": 3,"docs": 5, "cdn": 2, "monitoring": 1, "other": 791},"notable": ["bridge-payments.stripe.com", "paystack.stripe.com", "atlas.stripe.com"]},"infrastructure": {"firstSeenWayback": "2010-09-14","securityTxt": { "found": true, "url": "https://stripe.com/.well-known/security.txt", "contact": "mailto:security@stripe.com" },"aiTxt": { "found": false, "url": "" },"llmsTxt": { "found": false, "url": "" },"robotsTxt": { "found": true, "url": "https://stripe.com/robots.txt", "sitemapReference": "https://stripe.com/sitemap.xml" },"sitemapXml": { "found": true, "url": "https://stripe.com/sitemap.xml" },"rssFeed": { "found": true, "url": "https://stripe.com/blog/feed.rss" },"statusPage": { "found": true, "url": "https://status.stripe.com" },"changelogPage": { "found": true, "url": "https://stripe.com/changelog" },"pricingPage": { "found": true, "url": "https://stripe.com/pricing" },"careersPage": { "found": true, "url": "https://stripe.com/jobs" }},"community": {"hackerNews": {"mentionCount": 3742,"topStories": [{ "title": "Stripe acquires Bridge for payments", "url": "https://stripe.com/...", "points": 1842, "numComments": 612, "createdAt": "2026-04-15T...", "storyUrl": "https://news.ycombinator.com/item?id=..." }]}},"socialMedia": [{ "platform": "Twitter/X", "url": "https://twitter.com/stripe", "found": true },{ "platform": "LinkedIn", "url": "https://www.linkedin.com/company/stripe", "found": true }],"diff": null}
Output fields
Top-level discriminators:
| Field | Type | Description |
|---|---|---|
recordType | String | 'company-report' for a successful research record, 'error' for an error record |
domain | String | The company domain that was researched |
companyName | String | Detected or provided company name |
researchDate | String | ISO date of the research (YYYY-MM-DD) |
summary fields (hero block — read this first):
| Field | Type | Description |
|---|---|---|
headline | String | One-line title ("<companyName> — <role> (<N> signals)") |
oneLine | String | Short shareable answer (Slack subject, dashboard tile) |
keyTakeaways[] | Array of strings | Up to 8 scannable bullets synthesized from the modules below |
whatToCheck[] | Array of {label, url} | Up to 4 ranked next-step links |
confidence.score | Number | 0..1 — fraction of attempted sources that returned data |
confidence.level | String | 'high' (≥0.7), 'medium' (≥0.4), or 'low' |
confidence.explanation | String | Plain-English reason — usable verbatim in reports |
website.techStack fields (in-actor signature detection):
| Field | Type | Description |
|---|---|---|
cms | String | Detected CMS (WordPress, Shopify, Webflow, Wix, Squarespace, Ghost, Drupal, Joomla, HubSpot CMS, Contentful, Sanity) |
framework | String | Detected framework (Next.js, Nuxt, Gatsby, Remix, Astro, SvelteKit, React, Vue, Angular, Hugo, Jekyll, Eleventy) |
analytics[] | Array | Analytics tools (GA4, GTM, Universal Analytics, Segment, Mixpanel, Amplitude, Heap, PostHog, Plausible, Fathom, Hotjar, FullStory, Matomo) |
cdn | String | CDN (Cloudflare, Fastly, Akamai, CloudFront, Vercel, Netlify, GitHub Pages, Cloudflare Pages, Bunny CDN, KeyCDN) |
ecommerce | String | E-commerce platform (Shopify, WooCommerce, BigCommerce, Magento, PrestaShop, Snipcart) |
paymentProcessors[] | Array | Payment processors (Stripe, PayPal, Square, Adyen, Braintree, Klarna) |
ads[] | Array | Ad pixels (Google Ads, Meta Pixel, LinkedIn Insight, Twitter Pixel, TikTok Pixel, Reddit Pixel) |
fonts[] | Array | Font services (Google Fonts, Adobe Fonts, Monotype) |
securityHeaders | Object | HSTS, CSP, X-Frame-Options, X-Content-Type-Options, Referrer-Policy, Permissions-Policy |
github fields:
| Field | Type | Description |
|---|---|---|
found | Boolean | Whether a GitHub org or repos were found |
orgProfile.name | String | GitHub organization display name |
orgProfile.bio | String | Organization description |
orgProfile.publicRepos | Integer | Number of public repositories |
orgProfile.followers | Integer | Number of GitHub followers |
orgProfile.createdAt | String | ISO date the org was created (proxy for company age) |
topRepositories[] | Array | Top repos by stars: name, description, stars, forks, language, url |
totalStars / totalForks | Integer | Sum across returned repos |
languages[] | Array | Language breakdown: {language, repoCount} ranked by repo count |
npmPackages[] | Array | npm packages under the same scope (e.g. @stripe/*) |
dockerImages[] | Array | Docker Hub images under the same org |
financials fields:
| Field | Type | Description |
|---|---|---|
isPublicCompany | Boolean | Whether SEC filings were found |
ticker | String / null | Stock ticker (e.g., "AAPL") |
cik | String / null | SEC Central Index Key |
exchange | String / null | Stock exchange (NYSE, NASDAQ, etc.) — pulled from data.sec.gov/submissions/CIK*.json |
sicCode / sicDescription | String / null | Standard Industrial Classification |
fiscalYearEnd | String / null | MMDD format |
address | Object / null | Business address (street, city, state, zip) |
formerNames[] | Array | Past company names from SEC filings |
recentFilings[] | Array | Recent SEC filings: formType, filedDate, description, url, accessionNumber |
dns + dns.email fields:
| Field | Type | Description |
|---|---|---|
aRecords / aaaaRecords | String[] | IPv4 / IPv6 addresses |
mxRecords / txtRecords / nameServers / caaRecords | String[] | Other DNS records |
email.provider | String | Classified email provider (Google Workspace, Microsoft 365, Zoho, Proton, Fastmail, Mailgun, SendGrid, Postmark, Amazon SES, Yandex, Migadu, Cloudflare, self-hosted) |
email.spfPresent / email.spfRecord | Boolean / String | SPF detection |
email.dmarcPolicy / email.dmarcRecord | String | DMARC policy (none / quarantine / reject) |
email.dkimSelectors[] | Array | DKIM selectors found in TXT records |
subdomains fields:
| Field | Type | Description |
|---|---|---|
count | Integer | Total unique subdomains in Certificate Transparency logs |
recent[] | Array | Up to 20 most recently issued: {name, firstSeen} |
all[] | Array | All subdomains, capped at min(maxResults, 200) |
infrastructure fields: see the example above for full shape — every well-known file probe returns {found, url, ...}.
community.hackerNews fields:
| Field | Type | Description |
|---|---|---|
mentionCount | Integer | Total Hacker News story count for the company name |
topStories[] | Array | Top stories: title, url, points, numComments, createdAt, storyUrl |
diff fields (only on second-and-later runs of the same domain):
| Field | Type | Description |
|---|---|---|
since | String | ISO timestamp of the previous snapshot |
sinceRunId | String / null | Apify run ID of the previous run |
newSecFilings[] | Array | SEC filings present this run, absent in the previous snapshot (matched by accessionNumber) |
newGithubRepos[] | String[] | Repo names new since last run |
newTxtRecords[] / removedTxtRecords[] | String[] | TXT verification token added/removed (Google, Microsoft, Slack, Okta, ad networks…) |
newSubdomains[] | String[] | Subdomains issued in Certificate Transparency since last run |
nameServersChanged | Boolean | Whether NS records changed |
nameServersOld / nameServersNew | String[] | Old vs new NS list (only populated when changed) |
mxRecordsChanged | Boolean | Whether MX records changed |
homepageTitleChanged / homepageDescriptionChanged | Boolean | Whether homepage copy changed |
newPatents / newHackerNewsStories | Integer | Count delta since last run |
Decision-grade features
This actor goes well beyond "give me the data." Four features turn the output into a decision system:
Priorities — the decision queue
Every report contains a priorities[] array with the top 5 ranked decisions the data implies. Each priority has a recommendedAction written as a concrete next step ("Read filing X", "Investigate the new subdomains", "Update internal records of this domain's infra"). Most users only need to read priorities[0]. Built deterministically from events + correlations + anomalies + lifecycle + securityPosture.
"priorities": [{"rank": 1,"type": "correlation:product-launch","severity": "high","headline": "product launch pattern detected (confidence 85%)","reason": "5 new public GitHub repos + 12 new subdomains in Certificate Transparency logs","whyItMatters": "New public repos co-occurring with a subdomain burst is a strong product-launch signal — repo for the SDK, subdomain for the service.","recommendedAction": "Track the launch — read the new repos AND visit the new subdomains for the live product.","evidence": ["5 new public GitHub repos: stripe-cli-v2, payments-rs, fraud-rules-dsl…", "12 new subdomains in Certificate Transparency logs"],"timeToImpact": "days"}]
Triggers — automation-ready booleans
The triggers object precomputes 11 booleans for downstream routing. Filter with WHERE triggers.X = true instead of parsing prose. Drop-in for Zapier / Make / Slack / agent tool calls.
"triggers": {"highSeverityEvents": true,"possibleAcquisition": false,"productSignals": true,"infraMigration": false,"emailInfraChange": false,"brandRefresh": false,"communityTraction": true,"securityRiskHigh": false,"rapidGrowth": true,"dormancy": false,"needsHumanReview": true}
Output profiles — same data, different consumers
Pick outputProfile:
analyst(default) — Full intelligence record. ~10–50KB per record.executive— Decision layer + thin module pointers. Strips verbose subtrees (full subdomain lists, full HN stories, full repo metadata) — keeps everything you need for a Slack alert or dashboard tile. ~3–10KB per record.raw— Modules only, no decision layer. Backward-compatible mode for users who want pure data and will compute their own intelligence on top.
Note: the SUMMARY KV record always contains the full headline summary regardless of profile.
Portfolio mode — cross-company prioritisation
Pass portfolioId: "my-watchlist-name" and the actor maintains a per-user named key-value store of every company you've researched under that label. Each subsequent run emits relative intelligence on top of the absolute intelligence:
"portfolioContext": {"portfolioId": "fintech-watchlist-2026","portfolioSize": 47,"rank": "3/47","rankBasis": "maximum alert score across events / correlations / anomalies","percentile": 0.96,"outlier": true,"rarity": "Uncommon — top 4% of portfolio","reason": "Stands out: top 4% of the portfolio by alert intensity; flagged for human review; top priority is high-severity (correlation:product-launch).","portfolioMedians": { "technicalMaturityScore": 0.62, "securityPostureScore": 0.55, "subdomainCount": 18, "githubStars": 240 }}
"feed": {"portfolioId": "fintech-watchlist-2026","rollingAlerts": [{ "detectedAt": "2026-04-30T12:14:00Z", "domain": "stripe.com", "eventType": "PRODUCT_SIGNAL", "severity": "high", "headline": "5 new public GitHub repos", "alertScore": 0.9 },{ "detectedAt": "2026-04-29T08:00:00Z", "domain": "checkout.com", "eventType": "INFRA_MIGRATION", "severity": "high", "headline": "Name servers changed", "alertScore": 0.8 }],"topMovers": [{ "domain": "ramp.com", "rationale": "Alert intensity rose from 0.30 → 0.85 (now: PRODUCT_SIGNAL)" }],"newEntrants": [{ "domain": "klarna.com", "addedAt": "2026-04-29T..." }]}
This is the difference between "give me a report on Stripe" and "tell me which of my 100 watchlist companies matter most today." The portfolio is the platform.
Comparability mode — benchmark against peers
Pass compareTo: ["domain1.com", "domain2.com"] (max 3) to benchmark. Each peer triggers a separate recursive run and bills its own PPE event ($1.00 per peer). The result is an additional peer-comparison record in the dataset with 8 ranked metrics:
{"recordType": "peer-comparison","entityId": "stripe.com|stripe","domain": "stripe.com","comparison": {"domain": "stripe.com","peers": ["adyen.com", "checkout.com"],"peerErrors": [],"metrics": {"technicalMaturityScore": { "ours": 0.95, "peers": [{"domain": "adyen.com", "value": 0.85}, {"domain": "checkout.com", "value": 0.78}], "rank": "1/3", "summary": "Highest technical maturity of the 3 compared" },"securityPostureScore": { "ours": 0.92, "peers": [...], "rank": "2/3", "summary": "Higher security posture than 1/2 peers" },"infraComplexity": { "ours": "high", "peers": [...], "rank": "1/3", "summary": "infra complexity: matches all peers (high)" }},"headline": "Stripe stands out: Highest technical maturity of the 3 compared.","distinctSignals": ["Highest technical maturity of the 3 compared", "Higher security posture than 1/2 peers"]}}
Use this for dashboards, RFP-prep, or competitive deep-dives. The recursive runs are bounded — peers don't recurse further (their compareTo is forced empty).
Monitoring mode (scheduled runs)
Schedule this actor on a domain you care about — daily, weekly, monthly — and every run after the first returns a populated diff field. The first run for a domain saves a snapshot to the actor's key-value store; the second run loads that snapshot, computes the differences, and emits them under diff.
This is the difference between a one-shot company-research tool and a competitive-intelligence monitoring product. Examples of what diff surfaces:
newSecFilings— new 10-K, 10-Q, or 8-K filed since last run (M&A, earnings, material events)newGithubRepos— new public repo published (product launches, new SDK)newSubdomains— new subdomain in Certificate Transparency logs (acquisitions, new internal tools, staging environments standing up)newTxtRecords— new verification token (Slack workspace, Google site verification, Okta, ad network)nameServersChanged/mxRecordsChanged— infra migration, M&A signal, email provider changehomepageTitleChanged/homepageDescriptionChanged— rebrand, pivot, messaging shift
The status message at the end of a monitoring run reads Done. 9 sources returned data: … | Δ since last run: 1 new filing, 2 new repos, 14 new subdomains | PPE charge: $1.00.
Use Cases
- Sales & BD preparing company briefs before outbound — identify tech stack, e-commerce platform, payment processor, filings status, and social channels to personalize outreach
- Competitive intelligence — pull website + GitHub + SEC + Wayback + status-page + changelog + subdomain growth into one report; schedule it weekly to track competitor cadence
- VC & PE researchers — assess public-market presence, open-source footprint, npm + Docker distribution, and academic citations on prospective investments; schedule on portfolio companies for change alerts
- Journalists & investigators — Wikipedia summary, SEC filings, DNS records, social presence, and Hacker News mentions in seconds — usable directly in stories
- M&A due diligence — preliminary technical + public-records checks on acquisition targets, including subdomain enumeration for asset inventory
- Marketing strategists — audit a brand's digital footprint across social, tech stack, and operational-transparency surfaces (status page, changelog, security.txt)
- Security & SRE teams — Certificate Transparency subdomain enumeration + DMARC posture + security-headers + security.txt presence, all in one pass; schedule for cert-rotation and infra-change alerts
- DevRel & developer marketers — track Hacker News momentum, GitHub stars, and changelog updates over time on competitor and partner products
How to Use the API
You can call this actor programmatically from any language.
Python
import requestsimport timerun = requests.post("https://api.apify.com/v2/acts/ryanclinton~company-deep-research/runs",params={"token": "YOUR_APIFY_TOKEN"},json={"domain": "stripe.com","includeFinancials": True,"includeResearch": True,"includeGithub": True,"enableMonitoring": True,"maxResults": 20},timeout=30,).json()run_id = run["data"]["id"]while True:status = requests.get(f"https://api.apify.com/v2/actor-runs/{run_id}",params={"token": "YOUR_APIFY_TOKEN"},timeout=10,).json()if status["data"]["status"] in ("SUCCEEDED", "FAILED", "ABORTED"):breaktime.sleep(5)dataset_id = status["data"]["defaultDatasetId"]items = requests.get(f"https://api.apify.com/v2/datasets/{dataset_id}/items",params={"token": "YOUR_APIFY_TOKEN"},timeout=30,).json()report = items[0]print(report["summary"]["headline"])for line in report["summary"]["keyTakeaways"]:print(f" - {line}")print(f"\nConfidence: {report['summary']['confidence']['explanation']}")
JavaScript
const response = await fetch("https://api.apify.com/v2/acts/ryanclinton~company-deep-research/run-sync-get-dataset-items?token=YOUR_APIFY_TOKEN",{method: "POST",headers: { "Content-Type": "application/json" },body: JSON.stringify({domain: "stripe.com",enableMonitoring: true,maxResults: 20,}),});const [report] = await response.json();console.log(report.summary.headline);report.summary.keyTakeaways.forEach((line) => console.log(` - ${line}`));console.log(`Confidence: ${report.summary.confidence.explanation}`);
cURL
curl -X POST "https://api.apify.com/v2/acts/ryanclinton~company-deep-research/run-sync-get-dataset-items?token=YOUR_APIFY_TOKEN" \-H "Content-Type: application/json" \-d '{"domain": "stripe.com","enableMonitoring": true,"maxResults": 20}'
Reading the SUMMARY KV record
For orchestrators using Actor.call, the run's key-value store also contains a lightweight SUMMARY record so you don't need to paginate the dataset:
from apify_client import ApifyClientclient = ApifyClient("YOUR_APIFY_TOKEN")run = client.actor("ryanclinton/company-deep-research").call(run_input={"domain": "stripe.com"})summary = client.key_value_store(run["defaultKeyValueStoreId"]).get_record("SUMMARY")["value"]print(summary["confidence"]["level"], summary["sourcesWithData"], "of", summary["sourcesAttempted"])
How It Works
Input (domain, optional companyName, module toggles)│▼Phase A — Website (sequential, gives us companyName + raw HTML for downstream parsing)│ Fetch HTML + response headers, extract title / og:* / favicon / social links,│ detect tech stack from in-actor signatures (CMS / framework / analytics / CDN /│ e-commerce / payment processors / ad pixels / fonts / security headers)│▼Phase B — All independent modules in parallel (Promise.all)├── Wikipedia — direct page summary then search fallback├── GitHub — try 3 org-name guesses, top repos by stars,│ language breakdown, npm scope packages, Docker Hub org├── SEC EDGAR — EFTS full-text search + atom company search +│ data.sec.gov submissions enrichment (exchange, SIC, address)├── OpenAlex — citation-sorted academic papers├── DNS — A/AAAA/MX/TXT/NS/CAA + _dmarc lookup +│ SPF/DMARC/DKIM parse + email-provider classification├── Subdomains — Certificate Transparency log enumeration via crt.sh├── Infrastructure — Wayback first-seen + 10 well-known-file probes│ (security.txt, ai.txt, llms.txt, robots, sitemap, RSS,│ status, changelog, pricing, careers)├── Community — Hacker News mention count + top stories (Algolia HN API)└── Social Media — 6 platforms, prefers website-discovered links over slug guesses│▼Phase C — Compile report, build summary hero block (deterministic synthesis)│▼Phase D — On enableMonitoring: load prior snapshot, compute diff, save current snapshot│▼Push to dataset (one record), save lightweight SUMMARY to KV store, charge PPE if data found
Data sources
| Step | Source | API Used | Auth Required |
|---|---|---|---|
| 1 | Company website | Direct HTTPS fetch + HTML parsing + in-actor tech-stack signatures | No |
| 2 | Wikipedia | REST API (/api/rest_v1/page/summary) + search API | No |
| 3 | GitHub | REST API (/orgs/{name}, /orgs/{name}/repos) + search fallback | Optional token (60 → 5,000 req/hr) |
| 3a | npm | registry.npmjs.com/-/v1/search?text=scope:{org} | No |
| 3b | Docker Hub | hub.docker.com/v2/repositories/{org}/ | No |
| 4 | SEC EDGAR | EFTS search + browse-edgar atom + data.sec.gov/submissions/CIK{padded}.json | No |
| 5 | OpenAlex | REST API (/works?search=) | No |
| 6 | DNS | Node.js dns.promises (resolve4/6, resolveMx/Txt/Ns/Caa) + _dmarc.{domain} lookup | No |
| 7 | Social Media | HTTP GET to profile URLs | No |
| 8 | Subdomains | Certificate Transparency logs via crt.sh JSON API | No |
| 9 | Infrastructure | Direct fetches to /.well-known/security.txt, /ai.txt, /llms.txt, /robots.txt, /sitemap.xml, status.{domain}, /changelog, etc. + Internet Archive CDX | No |
| 10 | Hacker News | Algolia HN Search API (hn.algolia.com/api/v1/search) | No |
How much does it cost?
This actor is priced $1 per company researched under Pay-Per-Event. You are only charged when at least one source returns data — runs against parked, unreachable, or invalid domains are not billed.
The actor uses 512 MB of memory (default) and completes in 15–45 seconds for most domains. Phase B runs all 9 independent modules in parallel, so total run time is bounded by the slowest single API rather than sequential summation.
| Plan | Monthly Cost | Included PPE budget | Approx companies researched |
|---|---|---|---|
| Free | $0 | $5 (built-in) | ~5 |
| Personal | $49/month | $49 included | ~49 |
| Team | $499/month | $499 included | ~499 |
Apify platform compute (memory-seconds) is billed separately by Apify and is typically a few cents per run.
Tips
- Provide
companyNameexplicitly for companies whose website title is a tagline. This dramatically improves accuracy across Wikipedia, SEC, GitHub, and Hacker News. - Schedule on a domain you care about to unlock the
difffield — the second run onwards returns what changed since last time. This converts the actor from one-shot research into a competitive-intel monitoring product. - Disable unused modules to cut run time. If you only need website + tech stack + DNS + social, turn off SEC, research, GitHub, subdomains, infrastructure, and community.
- Use a GitHub token when researching multiple companies in a batch. Without one, GitHub allows 60 unauthenticated requests/hour. A free personal access token raises this to 5,000/hour.
- Combine with other actors — feed the SEC CIK number into the SEC EDGAR Filing Analyzer, or pass the domain into Website Tech Stack Detector for a deeper Wappalyzer-grade fingerprint.
- Batch process company lists by calling this actor via the Apify API in a loop. Each run is independent, so you can research hundreds of companies in parallel.
Limitations
- Company name detection depends on website title — sites with tagline-only titles (e.g., "Build the Future") will produce poor search results across Wikipedia, SEC, GitHub, and Hacker News unless you provide
companyNamemanually. - SEC EDGAR is US-only — the financials module only finds companies that file with the US Securities and Exchange Commission.
- GitHub org matching is heuristic — the actor tries 3 name guesses (domain base, lowercased company name, dashed company name) plus a search fallback. Companies with GitHub org names that differ significantly may not be found.
- npm + Docker Hub probes assume the org name matches the GitHub org — works well for
stripe,airbnb,vercel; doesn't work when the company uses a different naming convention on each platform. - Tech stack detection covers ~60 high-value signatures — Cloudflare, Next.js, Stripe, Shopify, GA4, Segment, etc. It is not a full Wappalyzer replacement (Wappalyzer covers 3,000+ signatures). For deep technographics, combine with the Website Tech Stack Detector.
- Subdomain enumeration depends on crt.sh — only finds subdomains that have ever been issued a public TLS certificate. Internal-only subdomains, subdomains using wildcard certs, and subdomains using private CAs are not visible.
- Wikipedia search may match wrong entity — common company names (e.g., "Apple") may match the article for a different entity. Providing the full company name helps.
- Hacker News mentions are name-based — common company names will produce false positives in the mention count. Top stories are usually correct.
- USPTO patents are off by default — the USPTO PatentsView API was retired in August 2024 and the replacement requires an API key. To keep this actor truly key-free, patents are returned as
{found: false, count: 0}rather than gating behind a key requirement. difffield is empty on the first run for a domain — the actor saves a snapshot at the end of the first run; the second run is when comparison kicks in.
What this actor does NOT do
To set expectations honestly:
- It does NOT replace Clearbit, ZoomInfo, Apollo, or PitchBook. Those are licensed commercial enrichment APIs with employee counts, revenue ranges, technographic confidence scores, intent signals, and verified contact data. This actor uses official + open public sources only.
- It does NOT find verified personal email addresses. For email enrichment use Hunter.io, Apollo.io, or the Person Enrichment Lookup actor (which wraps People Data Labs).
- It does NOT scrape LinkedIn employee profiles or LinkedIn employee counts. LinkedIn is anti-scraping; for LinkedIn data use a dedicated LinkedIn scraper actor.
- It does NOT pull website traffic data. SimilarWeb / Semrush / Ahrefs are paid for a reason — public surfaces don't expose this.
- It does NOT compute a comprehensive tech-stack fingerprint. It uses ~60 high-value signatures over the homepage HTML; for full Wappalyzer-grade analysis (3,000+ signatures, login-walled SPA support, JS execution) use the Website Tech Stack Detector actor.
- It does NOT classify the company by NAICS / SIC beyond what the SEC publishes. Public companies get the SEC SIC code; private companies do not.
- It does NOT score the company for sales fit, lead quality, or risk. It returns structured facts; you apply your scoring on top.
Combine with other Apify actors
| Actor | How to combine |
|---|---|
| Lead Enrichment Pipeline | Company research is built into step 4 of this pipeline — use the pipeline for full lead enrichment with email, phone, scoring |
| Person Enrichment Lookup | Pair with this actor for company-level intel: email + verified contact for individuals at the company |
| Website Contact Scraper | Scrape contacts first, then run this actor on the domains to enrich with company intel |
| Website Tech Stack Detector | Deep Wappalyzer-grade tech-stack fingerprinting beyond the in-actor 60 signatures |
| SEC EDGAR Filing Analyzer | Feed the CIK from this actor's output for deep SEC filing analysis |
Responsible Use
- All data is from public sources — Wikipedia (Creative Commons), SEC (public domain), OpenAlex (open access), GitHub (public API), DNS (public records), crt.sh (public Certificate Transparency logs), Wayback Machine (public archive), Hacker News (public via Algolia API).
- Respect GitHub rate limits — use a personal access token when running batch queries to avoid the 60-req/hr unauthenticated limit.
- Comply with SEC EDGAR fair use policy — the actor includes a descriptive User-Agent string with a real contact email. Avoid excessive request volumes.
- Use for legitimate business research — sales intelligence, competitive analysis, due diligence, journalism, security research.
FAQ
Is this actor free to use? The PPE price is $1.00 per company researched, and you're only charged when at least one of the 14+ sources returns data. Failed / parked / unreachable domains are not billed.
Does it work for non-US companies? Yes. Website analysis, tech stack, Wikipedi