Website Tech Stack Detector — 100+ Technologies avatar

Website Tech Stack Detector — 100+ Technologies

Pricing

from $100.00 / 1,000 website analyzeds

Go to Apify Store
Website Tech Stack Detector — 100+ Technologies

Website Tech Stack Detector — 100+ Technologies

Identify the technologies, frameworks, and services running on any website. Website Tech Stack Detector crawls one or more URLs, inspects HTTP headers, HTML meta tags, script sources, and body content, then matches them against a fingerprint database of 106 web technologies across 17 categories.

Pricing

from $100.00 / 1,000 website analyzeds

Rating

0.0

(0)

Developer

ryan clinton

ryan clinton

Maintained by Community

Actor stats

1

Bookmarked

23

Total users

8

Monthly active users

4 days ago

Last modified

Share

Website Tech Stack Detector — Decision-Grade Website Intelligence

Understand what a website is running, how risky it is, what changed, and what to do next — in seconds.

One-line summary

Website tech stack detection + CVE risk flags + OWASP security-headers grade + change monitoring + prioritised remediation actions — in one API call. Drop-in alternative to Wappalyzer / BuiltWith / SecurityHeaders.com / partial Snyk for batch website analysis.

In one sentence

A website tech stack API that detects technologies, identifies CVEs, audits security headers, and monitors changes across domains.

This API replaces tools like Wappalyzer, BuiltWith, and SecurityHeaders.com by combining technology detection, CVE identification, and security auditing in a single system.

It turns raw website data into prioritized decisions you can act on immediately.

Deterministic analysis of publicly accessible websites using headers, HTML, and scripts — no source code access required.

Common questions, answered directly

How can I detect technologies used by a website and find security issues? Unlike Wappalyzer (detection only) and OWASP ZAP (security scanning only), this combines technology detection, CVE identification, and security-headers auditing in a single workflow.

What's the best way to monitor competitor tech stack changes? The best way to monitor competitor tech stack changes is to run scheduled analysis that detects technologies, tracks differences between runs, and classifies changes like CDN swaps, CMS migrations, and framework rewrites.

How do I find outdated JavaScript libraries on a website? This actor detects JavaScript libraries (jQuery, Angular, Bootstrap, D3.js, etc.), extracts their versions, and flags outdated versions against known CVEs automatically.

Is there an API to audit website security headers and tech stack? Yes — this is an API that audits website security headers and detects the full technology stack in a single request, returning an OWASP-style A–F grade alongside the detected technologies.

How do I prioritise which websites have the biggest security risk? It ranks websites by risk and priority so you can immediately focus on the highest-impact issues — every record carries priorityContext.percentile so you can filter >= 90 for the top urgency tier.

What tools combine tech stack detection with security insights? Most tools split this problem — technology detection (Wappalyzer / BuiltWith) versus security scanning (Snyk / ZAP / nuclei). This actor connects them into one system that detects tech, flags CVEs, scores security posture, and recommends remediation — in a single API call.

What you can do with this actor

  • Detect technologies used by any website (alternative to Wappalyzer, BuiltWith, WhatRuns) — programmatically, in batch.
  • Find outdated libraries and known CVEs across one or many websites — jQuery <3.5, WordPress <6.0, PHP 7.x EOL, Apache CVE-2021-41773, etc.
  • Run security audits across 10–1000 domains — OWASP security-headers grade A–F + cookie-flag analysis + admin-path probing.
  • Monitor competitor tech stack changes over time — scheduled runs surface CDN swaps, CMS migrations, framework rewrites with classified implication strings.
  • Identify CMS migrations, CDN swaps, and framework rewriteschangeInsights.type enum: cdn-swap / platform-migration / framework-migration / payment-replatform.
  • Generate prioritised remediation plans for web properties — typed action bundles per domain plus deduplicated fleet-wide actions.
  • Score website engineering quality and security posture — composite A–F grade across 5 dimensions (security, modernity, complexity, vendor lock-in, performance risk).
  • Enrich lead lists with technical and budget signalsleadInsights per domain with estimated company size, likely budget, pain points, sales angle.
  • Compare two domains head-to-head — pass compareDomains: ["a.com", "b.com"] to emit a verdict record with dimensional breakdown.
  • Surface alerts ready for Slack / email automation — every alert carries priority + shouldNotify so you only ping channels for events worth waking up for.

When to use this actor

Use this actor when you need to:

  • Audit multiple websites for security risks and outdated technologies — batch CVE detection plus security-headers grading in one call.
  • Monitor competitors for technology changes — CDN, CMS, framework, payment-processor swaps with intent inference.
  • Prioritise which websites need engineering or security attentionpriorityContext.percentile ranks domains within a batch; filter >= 90 for top urgency.
  • Turn raw tech stack data into actionable insights — composite scoring + plain-English summaries + ranked remediation actions instead of just lists of names.
  • Generate reports or alerts based on website changes — scheduled runs emit Slack-ready notification strings on every meaningful shift.
  • Replace a manual analyst pass — what would take a developer an hour per site (DevTools + extensions + spreadsheets) runs in seconds at scale.

This actor is not for: full vulnerability scanning (use Snyk / OWASP Dependency-Check), penetration testing (use ZAP / nuclei), or detecting tech behind authentication (none of these tools can).

TL;DR — what one record actually looks like

This is the executive output mode payload — the 12 fields that drop straight into Slack, Sheets, Zapier, or an LLM tool call:

{
"recordType": "domain",
"schemaVersion": "3.0",
"domain": "example.com",
"grade": "C",
"score": 62,
"risk": "high",
"rank": 2,
"percentile": 93,
"summary": "Legacy WordPress stack with 2 known CVEs (highest: high) — security grade C — low engineering maturity signal.",
"topAction": "Upgrade WordPress core to >=6.0",
"topSignals": ["High CVE risk", "Legacy CMS", "Missing CSP"],
"actionBundle": { "type": "security-hardening", "priority": "high", "estimatedEffort": "low", "impact": "high" }
}

That's the whole product. Full enriched output adds 30+ more fields (vendor intel, change classification, deep security, page signals, breakdowns, alerts, etc.) — see Output layers below.

Output layers

The output is intentionally rich. To keep cognitive load low, it's organised in three layers:

LayerWhat it answersFields
1. Decision"What do I do about this domain?"grade, score (overall.score), risk / riskLevel, priorityContext, topAction, topSignals, actionBundle, alerts (with priority + shouldNotify)
2. Explanation"Why is the score what it is?"overall.breakdown (signed deltas + confidenceImpact), detectionConfidence (with source-tagged reasons), coverage, changeInsights, competitivePosition
3. Detail"Show me the raw data"technologies, categories, page, vendorIntel, lifecycle, deepSecurity, diff, security, notification

Use outputMode: 'executive' to collapse everything to Layer 1. Use the default enriched mode to get all three.

Most tech-detection tools stop at "WordPress + jQuery + Cloudflare." That's a fact. It isn't a decision.

This actor connects the four dots that matter:

Technology → Risk → Change → Business meaning → Action

Crawl one URL or a thousand, and for every domain you get back: 106 detected technologies, CVE flags on outdated versions, an OWASP security-headers grade, a classified change diff vs the prior run, a plain-English summary + ranked recommended actions, lead intelligence for sales triggers, competitive cohort positioning, and Slack-ready alerts for the events that warrant a human looking at this domain right now.

Who this is for

You are…The painWhat this actor gives you
Security teams / consultantsFull pentests are expensive. You need triage signal across 50–500 domains and a prioritised hit list.CVE flags + headers grade + cookie-flag analysis + admin-path probing + priorityScore + riskLevel per domain.
Sales / lead-gen teamsGeneric prospect lists with no urgency, budget, or fit signal. Cold emails are generic.leadInsights (estimated company size + likely budget + pain points + sales angle) plus changeInsights for buying-trigger detection (framework migration, CDN swap, payment replatform).
Competitive intelligence / productCompetitors silently swap CDNs, replatform CMSs, modernise frameworks — and you find out months later.Scheduled diff + classified change intelligence + cohort positioning (behindPeers, aheadOfPeers, techTrend) + alerts on every meaningful shift.
Agencies / freelancersManual audits take hours per site. Clients want actionable reports, not raw tech lists.outputMode: 'executive' produces an audit-ready summary in one record per domain — grade, summary, recommended actions, risk level.
Investors / portfolio operatorsTechnical due diligence at scale across acquisitions / portfolios. Hidden tech debt = hidden risk.Batch intelligence, engineeringMaturity signal, lifecycle flags on EOL tech, fleet-average overall + security scores in the run summary.
Internal engineering / DevOpsLarge orgs with many web properties. Standards drift between teams.Inventory + diff per property, alerts on regressions (security grade drops, new CVEs, exposed admin paths).

What this actor does

For every URL you submit, the pipeline produces:

  • Composite scoring — overall grade A–F across 5 dimensions: security, modernity, complexity, vendor lock-in, performance risk.
  • Plain-English insight — one-sentence summary, risk level, priority score, engineering-maturity signal, and ranked recommended actions per domain.
  • CVE risk flags for outdated jQuery, Angular, Bootstrap, D3.js, PHP, Nginx, Apache, WordPress, Drupal, Joomla, ASP.NET, IIS.
  • OWASP security-headers grade with optional advanced probe of cookie flags + exposed admin paths.
  • Change intelligence — classified diffs (CDN swap, CMS migration, framework rewrite, payment replatform…) with implications, not just added / removed lists.
  • Lead intelligence — estimated company size, likely budget, pain points, sales angle.
  • Competitive cohort positioning — most-common stack in the batch, outlier flag, ahead/behind peers, dominant tech trend.
  • Lifecycle flags — EOL technologies, years-outdated counter.
  • Alerts — Slack/webhook-ready array fired on critical CVEs, failing security grades, new tech detected, exposed admin paths.
  • Custom detectors — encode your own internal-tool fingerprints; matches surface as first-class technologies.
  • Use-case presets — one-click config for security audits, sales prospecting, competitor tracking, portfolio analysis.
  • Output modesraw for back-compat, enriched (default) for full premium output, executive for a collapsed decision-grade view.
  • Headless fallbackauto mode (default) re-fetches SPA-detected domains under Playwright Chromium so runtime-only technologies become visible to the same detection pipeline. Force always-on with headless, force always-off with cheerio.
  • Vendor intelligence — every detected tech is enriched with marketPosition, common alternatives, estimated cost tier, and typical adopter size. Stack data becomes a business signal, not just a list of names.
  • Score breakdown for explainabilityoverall.breakdown shows the signed deltas behind each sub-score so users can audit why a domain landed at C and not B.
  • Fix suggestions with example codeinsight.fixSuggestions[] pairs each issue with a copy-paste-ready snippet (CSP / HSTS / X-Frame-Options / etc.) so the next step takes one minute, not one Stack Overflow trip.
  • Change intent inferencechangeInsights.intent infers the business reason behind a stack change (performance / cost / modernization / replatform / consolidation / expansion) with calibrated confidence and supporting evidence.
  • Notification blocknotification: { slackMessage, emailSubject, emailSummary } per domain, ready to drop into a Slack webhook or email digest with no post-processing.
  • Batch-insight record — a single recordType: 'batch-insight' row at the end of the dataset summarising the cohort: top recurring CVEs, most-adopted tech, highest-priority domains, dominant tech trend, change-type counts.
  • Compare two domains — pass compareDomains: ["a.com", "b.com"] to emit a recordType: 'comparison' verdict record with winner, dimensional breakdown, and confidence.
  • Action bundlesactionBundle per domain collapses fixSuggestions + recommendedActions into a typed playbook (security-hardening / modernization / replatform-prep / maintenance) with priority + effort + impact tags.
  • Fleet-wide deduplicated actions — the batch-insight record carries fleetActions[] ranked by how many domains in the batch carry the same action. Run on 30 prospects, get a one-page remediation plan that ignores per-domain noise.
  • Priority context — every domain ships with priorityContext: { rank, cohortSize, percentile } so downstream consumers can filter "top 5% most urgent" without re-sorting.
  • Risk bucketinsight.riskBucket (urgent / high / medium / low) is the operational queue label, distinct from the characterisation riskLevel.
  • Detection confidencedetectionConfidence: { score, level, reasons } per domain so users can audit "how much do I trust this detection?".
  • Coverage objectcoverage: { complete, reason, renderedBy } is the primary detection-coverage indicator: explicit boolean + reason + which engine produced it.
  • Top signalsinsight.topSignals[] surfaces the 3 most important plain-English signals about this domain — perfect for dashboards, alerts, LLM tool-call summaries.
  • Closest-match competitorcompetitivePosition.closestMatch: { domain, overlap } finds the cohort domain with the highest tech-set Jaccard overlap. Sort prospects by similarity to a converted customer in one query.
  • Distance from cohort mediancompetitivePosition.distanceFromMedian (signed) shows exactly how far ahead or behind a domain sits.
  • Schema versioned — every record carries schemaVersion: '3.0'. Consumers can pin to a major version and trust additive-only evolution within it.
  • Alert precision — every alert carries priority (urgent / high / medium / low) AND shouldNotify (boolean). Slack/email automation should branch on shouldNotify so you never paste low-priority noise into operational channels.
  • Confidence-tagged scoring — every breakdown entry includes confidenceImpact (high / medium / low) so users see not just why the score moved but how certain the system is about that move.
  • Source-tagged confidence reasonsdetectionConfidence.reasons are {text, source} pairs, source ∈ headers / meta / scripts / html / render / pages / coverage. Auditable trail for enterprise.
  • Run context on every recordrunContext: { preset, outputMode, renderMode, generatedAt } lets downstream pipelines reason about which configuration produced a record.
  • Distinct generatedAt vs analyzedAtanalyzedAt is when the crawl finished; generatedAt is when the record was emitted. Critical for caching, replays, and pipeline trust.

High-value trigger events

The actor flags these events explicitly — they're the moments that justify a human action (an email, a Slack alert, a ticket, a follow-up call):

EventWhy it mattersWhere to find it
New critical CVE detectedSecurity incident — patch window opens.alerts[] with type: "new-critical-cve"
Security grade dropsPosture regression — site is now weaker than last week.changeInsights + alerts[] with type: "failing-security-grade"
CMS migration (e.g. WooCommerce → Shopify)Major replatform — competitor or vendor is actively rebuilding. Buying trigger.changeInsights.type: "platform-migration"
Framework rewrite (e.g. AngularJS → React)Engineering refresh in flight — modernisation budget signal.changeInsights.type: "framework-migration"
CDN swap (e.g. Fastly → Cloudflare)Performance / cost / security-policy decision. Worth checking what drove it.changeInsights.type: "cdn-swap"
Payment processor replatformSignificant revenue-pathing change — high-stakes vendor decision.changeInsights.type: "payment-replatform"
Admin path exposed (/.env, /.git/HEAD, /wp-config.php)Active credential-leak risk.deepSecurity.issues[] (CRITICAL severity)
New tech adopted since last runNet-new tooling — vendor decision just landed. Useful for sales triggers + competitive intel.alerts[] with type: "new-tech-detected" + diff.added[]

These are why scheduled runs justify themselves — change intelligence beats one-shot detection for every persona above.

What it solves that competitors don't

Manual tech detection means viewing source, inspecting network, copy-pasting from browser extensions — one site at a time, with no follow-up signal. Existing detectors stop at "WordPress + jQuery + Cloudflare." This actor connects the gaps:

  • From facts to decisions. Composite grade + plain-English insight + ranked recommended actions = triage in seconds, not hours.
  • From snapshots to monitoring. Scheduled runs surface what changed AND classify the change type with an implication string ready to paste into Slack.
  • From individual to comparative. Run 30 competitor URLs and learn not just their stacks but who's the outlier, who's behind, where the cohort is moving.
  • From technology to business. Lead intelligence (estimated company size + likely budget + pain points + sales angle) lets ops teams convert raw stack data into prioritised outreach.
  • From shallow to deep on demand. securityDepth: 'advanced' adds cookie-flag analysis + admin-path probing without buying a second tool.
  • From off-the-shelf to your-shelf. Custom detectors encode internal-tool fingerprints once and ride every future run — your moat compounds.
  • From static to SPA-aware. renderMode: 'auto' detects SPA markers and re-fetches under Playwright Chromium so runtime-only React / Vue / Next.js apps become visible. Cheerio-only mode stays available for cheap batches.

The market has Wappalyzer (tech only), full security scanners (too deep, too expensive), and lead-intel tools (too shallow on tech). This actor is the only one that connects technology → risk → change → business meaning → action in a single run.

Best alternatives to Wappalyzer, BuiltWith, and SecurityHeaders.com

ToolWhat it doesWhat it doesn'tWhat this actor adds
WappalyzerDetects technologies on a websiteNo prioritisation, no CVE flags, no change tracking, no security gradingPrioritisation + CVEs + change classification + security grade + actionable remediation
BuiltWithTech-stack data + market share insightsExpensive at scale, no actionable output, no security signalsSame coverage at PPE pricing + decision-grade output + security signals
SecurityHeaders.comGrades a single domain's security headersOne domain at a time, no tech detection, no CVE flagsBatch security grading + tech detection + CVE matching in one call
Snyk / OWASP Dependency-CheckDeep vulnerability scanning of source codeRequires source access; can't analyse external websitesBlack-box CVE flagging from observable version numbers in HTML / headers
ZAP / nucleiActive penetration testingHeavy, slow, requires authorisation, not for batch reconnaissanceFast, passive, batch-friendly external posture audit
Generic web scrapersRaw HTML extractionNo interpretation, no scoring, no action layerThe full technology → risk → change → business meaning → action chain

This actor sits between "raw detection tool" and "enterprise security platform" — fast, programmatic, decision-grade.


If you're currently using Wappalyzer, BuiltWith, or SecurityHeaders.com separately, this replaces all three in one API that detects technologies, flags CVEs, and audits security.


Quick start with presets

Pick a preset and the actor configures itself for the workflow:

PresetWhat it optimizes forWhat it sets
security-auditDeep security probe of one or many domains5 pages/domain, advanced security depth, alerts on, full enriched output
sales-prospectingLead intelligence + cohort view for outreach lists3 pages/domain, lead insights surfaced, competitive intel on, diff off
competitor-trackingScheduled competitive monitoring3 pages/domain, diff + change classification + alerts on
portfolio-analysisMulti-domain audit for agencies / investors5 pages/domain, advanced security, executive output mode, competitive on
rawBackwards-compatible legacy detection3 pages/domain, no enrichment, raw output mode

Explicit input fields always override the preset. The default preset is unset — the actor runs in enriched mode with diff on, security depth basic, competitive intel off, alerts on.

Output modes

  • raw — only the legacy detection fields (technologies, categories, pagesAnalyzed). Backwards compatible with v1 consumers.
  • enriched (default) — full premium output: scoring + insight + change + competitive + lead + lifecycle + alerts + security + page signals + diff.
  • executive — collapsed decision-grade view: domain, grade, overallScore, riskLevel, priorityScore, summary, recommendedActions, changeInsights, competitivePosition, leadInsights, alerts. No raw tech arrays. Drop straight into a CRM, Slack, or LLM tool call without post-processing.

Render modes

ModeBehaviourWhen to use
auto (default)Cheerio crawls every domain. Domains where Cheerio sees an SPA marker (__next, __nuxt, empty React/Vue root, ng-app, Gatsby) plus a hollow body get re-fetched under Playwright Chromium. The detection pipeline runs again on the rendered DOM.Most workflows — gets the speed of Cheerio for static sites and the coverage of headless for SPAs without you choosing.
cheerioNever spin up a browser. Pure Cheerio — fastest, cheapest, no Playwright memory cost.Large batches of static sites, or when you've confirmed your targets aren't SPAs.
headlessRender every domain in Chromium up-front. Highest memory, highest coverage.Small batches of SPA-heavy targets where you want maximum signal (e.g. portfolio audits).

The headless fallback is capped at 50 domains per run to bound memory and time. Each result includes a renderedBy field — cheerio, headless, or cheerio+headless — so you can audit which engine produced the detection.

Memory note: Playwright Chromium needs ~2 GB. The actor defaults to 2048 MB / max 4096 MB. Drop to 1024 MB only if you set renderMode: 'cheerio'.

Input parameters

ParameterTypeRequiredDefaultDescription
urlsString[]Yessample of 3Website URLs or bare domains. maxItems: 1000.
presetStringNounsetOne of security-audit, sales-prospecting, competitor-tracking, portfolio-analysis, raw.
outputModeStringNoenrichedraw / enriched / executive.
maxPagesPerDomainIntegerNo3Max pages per domain (1–10).
compareToPriorRunBooleanNotruePersist + diff against prior run.
securityDepthStringNobasicbasic (headers grade only) or advanced (+ cookie flags + admin path probing).
renderModeStringNoautoauto (Cheerio first, Playwright fallback for SPAs) / cheerio (never render) / headless (always render).
emitNotificationBooleanNotrueAdds notification (Slack / email-ready strings) to each domain record.
emitBatchInsightBooleanNotrueEmits a recordType: 'batch-insight' summary record at the end (auto-skipped on single-domain runs).
compareDomainsString[2]NoTwo domains from urls to head-to-head compare. Emits a recordType: 'comparison' verdict record.
enableCompetitiveIntelBooleanNofalseCohort analysis when batch has 2+ successful domains.
emitAlertsBooleanNotruePopulate alerts[] when conditions cross thresholds.
customDetectorsObject[]NoUser-defined fingerprints, see Custom detectors below.
proxyConfigurationObjectNoApify ProxyProxy settings.

Input examples

Default — analyze three domains with full enrichment:

{
"urls": ["shopify.com", "stripe.com", "vercel.com"]
}

Sales prospecting preset:

{
"urls": ["acme.co", "initech.com", "globex.io"],
"preset": "sales-prospecting"
}

Security audit with advanced probes:

{
"urls": ["example.com"],
"preset": "security-audit"
}

Competitor tracking on a schedule (executive mode for Slack-ready output):

{
"urls": ["competitor-a.com", "competitor-b.com", "competitor-c.com"],
"preset": "competitor-tracking",
"outputMode": "executive"
}

Custom detectors — track your own internal tools:

{
"urls": ["customer-a.com", "customer-b.com"],
"customDetectors": [
{
"name": "Acme CRM",
"category": "Internal Tools",
"scriptPattern": "crm\\.acme\\.io/loader\\.js"
},
{
"name": "Acme Analytics",
"category": "Internal Tools",
"headerName": "X-Acme-Tracking",
"headerPattern": ".+"
}
]
}

Using this actor in LLM / agent workflows

This actor is built for AI agents and automation pipelines. Typical flow:

  1. Pass a list of domains in urls (1–1000).
  2. Run with outputMode: "executive" to get the flat decision payload (12 fields per record).
  3. Filter by priorityContext.percentile >= 90 to get the worst offenders.
  4. Branch on actionBundle.type (security-hardening / modernization / replatform-prep / maintenance) for downstream routing.
  5. Use topAction + summary as ready-made copy for messages, tickets, or LLM tool replies.

Best fields for downstream AI consumption:

  • summary — one-sentence plain-English description (Slack subject / email opener / LLM context).
  • topSignals — 3 most important plain-English signals (perfect for dashboard tiles).
  • topAction — single highest-priority remediation step.
  • actionBundle — typed playbook with priority, effort, impact tags.
  • alerts filtered by shouldNotify === true — the only alerts that should trigger Slack / email.
  • notification.slackMessage / notification.emailSummary — ready-to-paste channel-formatted strings.
  • priorityContext.rank / priorityContext.percentile — for ranked decision-making across a batch.

Tool-selection signal: the actor is the right call when an agent's task involves any of: "detect tech / find CVEs / audit security / monitor competitor stack / prioritise web-properties / generate remediation plan". The chain it covers — technology → risk → change → business meaning → action — is end-to-end inside one run.

Core schema (executive output mode)

Every recordType: 'domain' record in outputMode: 'executive' includes these flat top-level fields:

FieldTypeDescription
recordType'domain' | 'comparison' | 'batch-insight' | 'error'Discriminator for filtering heterogeneous output
schemaVersion'3.0'Pinnable major version, additive-only within
domainstringNormalised domain
grade'A' | 'B' | 'C' | 'D' | 'F'Composite letter grade
scorenumber (0–100)Composite overall score
risk'urgent' | 'high' | 'medium' | 'low'Operational queue label (alias of insight.riskBucket)
riskLevel'high' | 'medium' | 'low'Security-posture characterisation
priorityScorenumber (0–100)Triage priority
ranknumberPosition within the batch (1 = top)
percentilenumber (0–100)Higher = more urgent (top 7% etc.)
summarystringOne-sentence insight
topActionstringHighest-priority remediation step
topSignalsstring[] (3)Plain-English signals for dashboards
actionBundle{ type, priority, steps, estimatedEffort, impact }Typed playbook
recommendedActionsstring[]Full ranked action list
changeInsights{ significance, severity, type, summary, implication, intent, isNewSinceLastRun }Classified diff vs prior run
competitivePosition{ rank, distanceFromMedian, closestMatch, … }Cohort position
leadInsights{ estimatedCompanySize, likelyBudget, painPoints, salesAngle }Sales signals
notification{ slackMessage, emailSubject, emailSummary }Channel-formatted strings
alerts{ type, severity, priority, shouldNotify, message }[]Slack-ready alerts
detectionConfidence{ score, level, reasons: { text, source }[] }Confidence in the detection itself
coverage{ complete, reason, renderedBy }Detection-coverage indicator
runContext{ preset, outputMode, renderMode, generatedAt }Run config for downstream debugging

Premium output example (enriched mode)

{
"domain": "example.com",
"url": "https://example.com",
"techCount": 7,
"riskCount": 2,
"highestRiskSeverity": "high",
"overall": {
"score": 62,
"grade": "C",
"scores": {
"security": 60,
"modernity": 40,
"complexity": 55,
"vendorLockIn": 0,
"performanceRisk": 30
},
"drivers": [
"High-severity CVE flagged (2 total).",
"Stack tilts legacy / EOL."
]
},
"insight": {
"summary": "Legacy WordPress stack with 2 known CVEs (highest: high) — security grade C — low engineering maturity signal.",
"riskLevel": "high",
"priorityScore": 87,
"engineeringMaturity": "low",
"recommendedActions": [
"Upgrade WordPress core to >=6.0 (currently 5.4.2).",
"Upgrade jQuery to >=3.5.0 (currently 3.4.1).",
"Add a Content-Security-Policy header to mitigate XSS / clickjacking.",
"Add Strict-Transport-Security with max-age >= 1 year."
],
"signalSources": ["Cloudflare", "WordPress", "jQuery", "..."]
},
"leadInsights": {
"estimatedCompanySize": "SMB",
"likelyBudget": "low",
"painPoints": ["legacy stack", "outdated software with known CVEs", "weak security posture"],
"salesAngle": "security upgrade + modernization"
},
"lifecycle": [
{ "tech": "WordPress", "version": "5.4.2", "status": "deprecated", "eol": true, "eolYear": null, "yearsOutdated": null }
],
"alerts": [
{ "type": "multiple-high-cves", "severity": "medium", "message": "2 CVEs flagged with at least one high severity." },
{ "type": "new-tech-detected", "severity": "low", "message": "New technology since last run: Cloudflare." }
],
"changeInsights": {
"significance": "high",
"type": "cdn-swap",
"summary": "Switched CDN from Fastly to Cloudflare.",
"implication": "Likely performance, cost, or security-policy driven. Re-test edge caching and rules."
},
"competitivePosition": {
"cohortSize": 12,
"mostCommonStackInBatch": ["React", "Cloudflare", "Google Analytics"],
"overlapWithCohort": 1,
"outlier": false,
"behindPeers": true,
"aheadOfPeers": false,
"cohortMedianOverallScore": 78,
"cohortMedianSecurityScore": 70,
"techTrend": "cohort tilts modern frontend"
},
"security": {
"grade": "C",
"score": 60,
"present": {
"contentSecurityPolicy": false,
"strictTransportSecurity": true,
"xFrameOptions": true,
"xContentTypeOptions": true,
"referrerPolicy": true,
"permissionsPolicy": false
},
"missing": ["Content-Security-Policy", "Permissions-Policy"],
"notes": ["HSTS max-age <6 months — recommend 1 year+."]
},
"deepSecurity": {
"cookieFlags": {
"cookiesObserved": 4,
"cookiesMissingSecure": 1,
"cookiesMissingHttpOnly": 2,
"cookiesMissingSameSite": 0,
"notes": ["1/4 cookies missing Secure flag.", "2/4 cookies missing HttpOnly flag."]
},
"adminPaths": [
{ "path": "/wp-admin/", "status": 200, "accessible": true },
{ "path": "/.env", "status": 404, "accessible": false }
],
"issues": [
"1/4 cookies missing Secure flag.",
"2/4 cookies missing HttpOnly flag.",
"Admin path reachable: /wp-admin/ (HTTP 200)."
]
},
"diff": {
"enabled": true,
"isFirstRun": false,
"lastSeenAt": "2026-04-23T09:14:21.000Z",
"added": ["Cloudflare"],
"removed": ["Fastly"],
"unchanged": 6
},
"page": {
"htmlBytes": 84320,
"scriptCount": 14,
"externalScriptCount": 9,
"externalScriptDomains": ["www.googletagmanager.com", "static.cloudflareinsights.com"],
"imageCount": 22,
"stylesheetCount": 4,
"iframeCount": 0,
"hasOpenGraph": true,
"hasTwitterCard": true,
"hasFavicon": true,
"hasCanonical": true,
"hasViewport": true,
"lang": "en",
"title": "Example",
"metaDescription": "Example domain for documentation."
},
"technologies": [
{ "name": "WordPress", "category": "CMS", "version": "5.4.2", "website": "https://wordpress.org", "confidence": "high", "risks": [{ "severity": "high", "cve": "WP-CORE-EOL", "summary": "WordPress core <6.0 is missing several years of security back-ports.", "fixedIn": "6.0.0" }] },
{ "name": "jQuery", "category": "JavaScript Libraries", "version": "3.4.1", "website": "https://jquery.com", "confidence": "high", "risks": [{ "severity": "medium", "cve": "CVE-2020-11023", "summary": "XSS via untrusted <option> elements.", "fixedIn": "3.5.0" }] }
],
"categories": { "CMS": ["WordPress"], "JavaScript Libraries": ["jQuery"], "CDN & Performance": ["Cloudflare"] },
"httpStatus": 200,
"pagesAnalyzed": 3,
"analyzedAt": "2026-04-30T10:30:00.000Z",
"generatedAt": "2026-04-30T10:30:00.000Z",
"failureType": null,
"scrapeError": null,
"recommendation": null,
"coverage": { "complete": true, "reason": null, "renderedBy": "cheerio+headless" }
}

A run-level SUMMARY is also written to the actor's default key-value store with: preset, outputMode, totalRequested, successful, failed, failureBreakdown, totalDetections, totalCveFlags, domainsWithCriticalCve, domainsWithHighCve, fleetOverallScoreAvg, fleetSecurityScoreAvg, alertCounts, topPriorityDomains, ppe.totalChargedUsd.

Record types

The dataset is heterogeneous — every record carries a recordType discriminator so you can filter cleanly in SQL / Sheets / agent tool calls (WHERE recordType = 'comparison').

recordTypeWhen emittedContains
domainOne per submitted URLPer-domain analysis: tech, scoring, insight, change, alerts, notification, vendor intel, etc.
comparisonWhen compareDomains: [a, b] is setHead-to-head verdict: winner (a / b / tie / no-call), dimensional breakdown, confidence, plain-language verdict + reason
batch-insightAt end of multi-domain runs (auto-skipped on single-domain)Cohort summary: top recurring CVEs, most-adopted tech, highest-priority domains, dominant trend, change-type counts
errorOn unrecoverable failurefailureType + message + timestamp

Notification block

Every domain record carries a ready-to-paste notification block when emitNotification: true (default):

{
"notification": {
"slackMessage": ":rotating_light: *example.com* grade C (62/100) — Switched CDN from Fastly to Cloudflare.",
"emailSubject": "[example.com] grade C (62/100) — high priority",
"emailSummary": "Domain: example.com\nGrade: C (62/100)\nRisk level: high\nPriority score: 87/100\nSummary: ...\nChange since last run: Switched CDN from Fastly to Cloudflare.\nImplication: Likely performance, cost, or security-policy driven.\nTop recommended action: Upgrade WordPress core to >=6.0\n..."
}
}

Wire notification.slackMessage into a Slack incoming webhook body or notification.emailSummary into an email automation — no template engine needed.

Comparison record

{
"recordType": "comparison",
"domainA": "siteA.com",
"domainB": "siteB.com",
"winner": "a",
"verdict": "siteA.com is the stronger stack overall (78 vs 62).",
"reason": "Wins on: overall, security, modernity, complexity (lower is better).",
"dimensions": [
{ "dimension": "overall", "aValue": 78, "bValue": 62, "verdict": "siteA.com better" },
{ "dimension": "security", "aValue": 80, "bValue": 60, "verdict": "siteA.com better" },
{ "dimension": "modernity", "aValue": 75, "bValue": 40, "verdict": "siteA.com better" }
],
"confidence": "high"
}

Use cases: sales (us vs them), investor due diligence (target A vs target B), competitor positioning (your site vs theirs).

Batch-insight record

{
"recordType": "batch-insight",
"cohortSize": 30,
"failed": 2,
"topRisks": ["jQuery — XSS via untrusted <option> elements (×7)", "WordPress — core <6.0 EOL (×4)"],
"topTech": [
{ "name": "Cloudflare", "count": 22 },
{ "name": "Google Analytics", "count": 19 },
{ "name": "React", "count": 14 }
],
"highestPriorityDomains": [
{ "domain": "example.com", "priorityScore": 87, "riskLevel": "high", "summary": "Legacy WordPress stack with 2 CVEs..." }
],
"dominantTrend": "cohort tilts toward modern JS frameworks (React / Next / Vue / Nuxt / Svelte)",
"changeTypeCounts": { "cdn-swap": 3, "framework-migration": 1 },
"analyzedAt": "2026-04-30T10:30:00.000Z"
}

This is what a decision-maker reads first. Filter the dataset with WHERE recordType = 'batch-insight' to get just the executive view.

How the scoring works

Composite overall score = weighted blend of five sub-scores, each 0–100:

Sub-scoreWeightWhat it measures
security40%OWASP headers grade + CVE penalties (critical -35, high -20, medium -8 + per-CVE -3 capped at -15)
modernity25%Modern-stack hits (Next.js, Vercel, Tailwind, etc. +8 each) minus legacy-stack hits (jQuery, AngularJS, PHP 7, etc. −6 each) minus EOL hits (−12 each), 60 baseline
complexity10% (inverse)Tech count + external script domains + script count + HTML weight — high complexity is penalised
vendorLockIn10% (inverse)Hosted-vendor presence (Shopify / Squarespace / Wix / Webflow / Ghost) — high lock-in is penalised
performanceRisk15% (inverse)HTML bytes + external scripts + iframes + script count — high risk is penalised

Overall grade thresholds: A ≥90, B ≥75, C ≥60, D ≥40, F otherwise.

Score breakdown (explainability)

Every sub-score includes a breakdown[] array of signed deltas with notes — so you can audit exactly why the score landed where it did:

"overall": {
"score": 62,
"grade": "C",
"scores": { "security": 60, "modernity": 40, "complexity": 55, "vendorLockIn": 0, "performanceRisk": 30 },
"breakdown": {
"security": [
{ "delta": 70, "note": "Headers grade B (70/100)" },
{ "delta": -8, "note": "Medium-severity CVE detected" },
{ "delta": -2, "note": "1 total CVE flag (capped at -15)" }
],
"modernity": [
{ "delta": 60, "note": "Baseline" },
{ "delta": -6, "note": "Legacy tech detected: jQuery" },
{ "delta": -12, "note": "EOL tech detected: WordPress" }
]
}
}

This is critical for enterprise trust — when someone asks "why is this a C?", the breakdown answers in two lines.

Alert precision (priority + shouldNotify)

Every alert ships with two extra fields beyond severity:

"alerts": [
{
"type": "new-critical-cve",
"severity": "high",
"priority": "urgent",
"shouldNotify": true,
"message": "Critical CVE detected on WordPress."
},
{
"type": "new-tech-detected",
"severity": "low",
"priority": "low",
"shouldNotify": false,
"message": "New technology since last run: Cloudflare."
}
]

Wire shouldNotify === true as the gate in your Slack / email / PagerDuty automation. The remaining alerts stay in the dataset for audit but won't wake anyone up. This is the noise-control pattern that prevents alert fatigue.

Priority mapping:

  • new-critical-cve / critical-exposureurgent (notify)
  • multiple-high-cves / failing-security-gradehigh (notify)
  • new-tech-detectedlow (no notify by default)

Batch priority + top action bundle

The batch-insight record now carries one-glance fleet sizing:

{
"batchPriority": { "urgent": 3, "high": 7, "medium": 12, "low": 8 },
"topActionBundle": { "type": "security-hardening", "affectedDomains": 9 }
}

batchPriority tells you how big the queue is at each tier. topActionBundle collapses the dominant remediation theme across the cohort — paste it into a sprint planner verbatim.

Action bundles

actionBundle collapses the per-domain remediation plan into one named playbook with priority + effort + impact tags:

"actionBundle": {
"type": "security-hardening",
"priority": "high",
"steps": [
"Upgrade WordPress core to >=6.0 (currently 5.4.2).",
"Upgrade jQuery to >=3.5.0 (currently 3.4.1).",
"Add a Content-Security-Policy header to mitigate XSS / clickjacking.",
"Add Strict-Transport-Security with max-age >= 1 year."
],
"estimatedEffort": "low",
"impact": "high"
}

Bundle types: security-hardening (CVEs / failing security grade), modernization (legacy / EOL stack), replatform-prep (active CMS / framework migration in flight), maintenance (everything else), no-action (clean run).

Fleet actions (deduplicated across batch)

The batch-insight record carries fleetActions[] — the same action recurring across domains is rolled up with a count, ranked by reach:

"fleetActions": [
{ "action": "Upgrade jQuery to >=3.5.0.", "affectedDomains": 7, "priority": "high" },
{ "action": "Add a Content-Security-Policy header to mitigate XSS / clickjacking.", "affectedDomains": 5, "priority": "medium" },
{ "action": "Upgrade WordPress core to >=6.0.", "affectedDomains": 3, "priority": "high" }
]

This is the agency / security-team / fleet-monitor view: a one-page remediation plan that ignores per-domain noise.

Priority context

Every successful record includes its rank within the batch:

"priorityContext": { "rank": 2, "cohortSize": 30, "percentile": 93 }

Read as "rank #2 of 30, top 7% most urgent". Filter the dataset with priorityContext.percentile >= 90 to grab just the worst offenders.

Detection confidence

detectionConfidence answers "how much should I trust this detection?":

"detectionConfidence": {
"score": 82,
"level": "high",
"reasons": [
"Response headers captured",
"7 technologies detected",
"5 high-confidence detections",
"3 pages analyzed"
]
}

When coverage.complete: false, the confidence score is automatically reduced and a "Coverage incomplete — confidence reduced" reason is appended.

Coverage

Primary detection-coverage indicator — structured boolean + reason + render engine:

"coverage": {
"complete": true,
"reason": null,
"renderedBy": "cheerio+headless"
}

When complete: false, reason carries a human-readable explanation (e.g. "SPA detected, partial detection likely. Consider renderMode: 'headless'.").

Vendor intelligence

Each detected tech is enriched with business context — marketPosition, common alternatives, estimated cost tier, typical adopter size. Same crawl, no extra cost:

"vendorIntel": [
{ "vendor": "Cloudflare", "category": "CDN", "marketPosition": "leader", "alternatives": ["Fastly", "Akamai", "CloudFront"], "estimatedCostTier": "low", "typicalCompanySize": "SMB-Mid" },
{ "vendor": "WordPress", "category": "CMS", "marketPosition": "leader", "alternatives": ["Drupal", "Ghost", "Webflow", "Wix"], "estimatedCostTier": "free", "typicalCompanySize": "SMB" }
]

Use cases: lead scoring (cost tier + typical adopter size = budget signal), sales replacement plays (alternatives), investor due diligence (market position + maturity).

Fix suggestions

insight.fixSuggestions[] pairs each issue with a solution and a copy-paste-ready code snippet where one applies:

"fixSuggestions": [
{
"issue": "Missing Strict-Transport-Security (HSTS) header",
"solution": "Add HSTS with at least a 1-year max-age and include subdomains where safe.",
"example": "Strict-Transport-Security: max-age=63072000; includeSubDomains; preload",
"severity": "high"
},
{
"issue": "WordPress 5.4.2 — WP-CORE-EOL",
"solution": "Upgrade WordPress to >= 6.0.0. WordPress core <6.0 is missing several years of security back-ports.",
"example": null,
"severity": "high"
}
]

Sorted by severity (critical → high → medium → low). Capped at 12.

Change intent

changeInsights.intent infers the business reason behind a stack change with calibrated confidence:

"changeInsights": {
"type": "cdn-swap",
"summary": "Switched CDN from Fastly to Cloudflare.",
"implication": "Likely performance, cost, or security-policy driven. Re-test edge caching and rules.",
"intent": {
"reason": "cost-reduction",
"confidence": 0.65,
"evidence": ["Added: Cloudflare", "Removed: Fastly"]
}
}

Reason enum: performance-optimization / cost-reduction / modernization / replatform / consolidation / expansion / unknown.

Custom detectors

Encode your own fingerprints once and every future run picks them up. Each detector requires a name and at least one matcher:

{
"customDetectors": [
{
"name": "Acme CRM",
"category": "Internal Tools",
"website": "https://acme.io",
"scriptPattern": "crm\\.acme\\.io/loader\\.js"
},
{
"name": "Acme Tracking",
"category": "Internal Tools",
"headerName": "X-Acme-Tracking",
"headerPattern": ".+"
},
{
"name": "Acme A/B",
"category": "Internal Tools",
"htmlPattern": "data-acme-experiment="
}
]
}

Matched custom detectors flow through the entire pipeline as first-class technologies — they appear in technologies[], categories, the diff block, the lifecycle / lead / scoring layers (with neutral weight), and in custom alert rules. Once a customer encodes their internal stack, switching off Apify becomes painful — that's the moat.

Deep security mode (securityDepth: 'advanced')

Beyond the OWASP-headers grade, the advanced probe adds:

  1. Set-Cookie flag analysis — counts cookies missing Secure, HttpOnly, SameSite. Surfaces them as plain-language notes.
  2. Admin path probing — concurrent HEAD requests to /wp-admin/, /wp-login.php, /admin/, /.git/HEAD, /.env, /server-status, /phpmyadmin/, /wp-config.php, etc. Anything returning 200 / 301 / 302 / 401 is flagged as accessible: true and surfaced under deepSecurity.issues.
  3. CRITICAL escalations/.env, /.git/HEAD, /wp-config.php accessible → critical-severity alert in alerts[].

The probe runs per-domain after the main crawl, with a hard 8-second deadline so it never blocks the run.

Change intelligence

The diff against the prior run is classified, not just listed:

TypeTriggerImplication
cdn-swapOne CDN tech in removed, another in added"Likely performance, cost, or security-policy driven. Re-test edge caching."
platform-migrationOne CMS swapped for another"Major replatform. Re-validate URL structure, redirects, integrations, SEO."
framework-migrationFrontend framework swap (React ⇄ Vue, etc.)"Frontend rewrite or major refactor in flight."
infrastructure-changeHosting / web-server swap (Nginx ⇄ Apache, etc.)"Origin moved. Audit DNS, HTTPS, uptime monitoring."
analytics-replatformAnalytics swap"Tracking baseline reset — historical cohorts will not align."
payment-replatformPayment processor swap"Significant revenue-pathing change."
support-tool-changeChat / help-desk swap"Verify embedded widgets render across the funnel."
tech-addedPure additionsBucketed by count.
tech-removedPure removalsCost-cut / consolidation signal.
mixed-changesBoth adds and removes, no clean swap"Compare side-by-side to understand intent."
no-changeNo diff"No detectable stack changes since the prior run."

Programmatic access (API)

Python

from apify_client import ApifyClient
client = ApifyClient("YOUR_API_TOKEN")
run = client.actor("ryanclinton/website-tech-stack-detector").call(run_input={
"urls": ["shopify.com", "stripe.com", "vercel.com"],
"preset": "competitor-tracking",
"outputMode": "executive",
})
for site in client.dataset(run["defaultDatasetId"]).iterate_items():
print(f'{site["domain"]}: {site["grade"]} ({site["overallScore"]}) — {site["summary"]}')
for action in site.get("recommendedActions", [])[:3]:
print(f' → {action}')

JavaScript

import { ApifyClient } from "apify-client";
const client = new ApifyClient({ token: "YOUR_API_TOKEN" });
const run = await client.actor("ryanclinton/website-tech-stack-detector").call({
urls: ["shopify.com", "stripe.com", "vercel.com"],
preset: "competitor-tracking",
outputMode: "executive",
});
const { items } = await client.dataset(run.defaultDatasetId).listItems();
for (const site of items) {
console.log(`${site.domain}: ${site.grade} (${site.overallScore}) — ${site.summary}`);
}

cURL

curl -X POST "https://api.apify.com/v2/acts/ryanclinton~website-tech-stack-detector/runs?token=YOUR_API_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"urls": ["shopify.com", "stripe.com"],
"preset": "security-audit"
}'

How it works

Detection pipeline

For each crawled page, three passes run in cost order:

  1. HTTP headersServer, X-Powered-By, X-Drupal-Cache, X-Shopify-Stage, etc. High confidence.
  2. HTML meta + scripts<meta name="generator"> + <script src> matchers. High confidence.
  3. HTML body patterns — full-body regex (wp-content, gatsby-image, data-wf-site). Medium confidence.

After the three passes, implies chains resolve (WooCommerce ⇒ WordPress, Next.js ⇒ React, Gatsby ⇒ React, Nuxt ⇒ Vue.js). Custom detectors apply on the same page in parallel. Then versions are extracted and the CVE map runs version comparisons. Then the OWASP headers grade is computed. Then scoring, insight, lead, lifecycle, and alerts. Then per-batch competitive cohort analysis. Then deep security probes (when enabled). Finally results sort by priority score and push.

CVE risk database

Bundled, conservative — only well-documented advisories with a clear fixedIn boundary. Examples:

  • jQuery <3.5.0 → CVE-2020-11023 / CVE-2020-11022; jQuery <3.4.0 → CVE-2019-11358
  • Bootstrap <4.3.1 → CVE-2019-8331
  • Apache <2.4.51 → CVE-2021-42013 + CVE-2021-41773
  • Nginx <1.20.1 → CVE-2021-23017
  • AngularJS 1.x → EOL flag; PHP <8.0 → EOL flag
  • WordPress <6.0 / Drupal <10 / Joomla <4 / IIS <8 → EOL aggregate flags

Refresh cadence: quarterly, or whenever a major advisory drops.

How much does it cost?

Cheerio HTML parsing (no headless browser):

ScenarioTimeCost
10 sites, 3 pages each~30 seconds< $0.01 platform + $0.10/site PPE
50 sites, 3 pages each~2–3 minutes~$0.03 platform + $0.10/site PPE
100 sites, 3 pages each~5–8 minutes~$0.05–$0.10 platform + $0.10/site PPE

PPE billing details:

  • $0.10 per successfully analyzed website. Failures (failureType set) are not charged.
  • eventChargeLimitReached halts cleanly so runs never overshoot spending limits.
  • Final status message reports total PPE charges explicitly.

Tips

  1. Start with preset: "competitor-tracking" for ongoing monitoring. The default 3 pages + diff + alerts catches changes without burning credits.
  2. Use outputMode: "executive" for Slack / email / agent integrations — drops straight into a notification template.
  3. Filter the dataset by insight.priorityScore in your downstream pipeline to triage which domains need action first.
  4. Sort by overall.scores.security in Sheets/SQL to identify weakest postures.
  5. Encode your internal tools as custom detectors — once they ride every run, switching costs become massive.
  6. Schedule weekly runs. The changeInsights block fires Slack-ready strings on every CDN swap / CMS migration / framework rewrite.

What this actor does NOT do

Honest scope so you pick the right tool:

  • Not a CVE database. The bundled list covers the 12 versioned technologies; it isn't a full NVD mirror. For exhaustive vulnerability scanning use Snyk / OWASP Dependency-Check.
  • Not a deep penetration scanner. Advanced security mode probes cookie flags + a small admin-path list. It does not run nuclei templates, fuzz parameters, or test authentication.
  • Not behind-auth coverage. The actor cannot reach pages that require login.
  • Not a Shopify-app / WordPress-plugin detector. That requires a 350+ signature DB with its own maintenance cadence.
  • Not a real-time API. This is a batch crawler. For interactive lookups, BuiltWith and Wappalyzer (paid) offer SaaS APIs.

Limitations

  • JS rendering is opt-in by default. Cheerio is fast; Playwright is thorough. auto mode chooses per-domain based on SPA markers, but a site that disguises itself as static while loading content via fetch() may still slip through. Force headless if you want zero risk of missing runtime-only tech.
  • Fingerprint scope — 106 technologies covers the most common web stack but may miss niche or very new tools.
  • CVE map staleness — bundled with the actor and refreshed quarterly. Pair with a live CVE feed for zero-day exposure.
  • Version detection limited — only the 12 technologies that expose versions through standard surfaces.
  • Confidence is binaryhigh (definitive signal) or medium (HTML pattern / implied dependency).
  • Security grade is homepage-only. Most security headers are origin-wide so this is reasonable.
  • Rate limiting — 10 concurrent / 120 per minute. Lists over 1000 sites should be split across runs.
  • Heuristic intelligence — composite scores, lead intelligence, and sales angle are deterministic templates over observable signals, not analyst-produced. Use them to prioritise, not to underwrite.

Responsible use

Public website content only:

  • Respect robots.txt. CheerioCrawler respects robots.txt; disallowed paths are skipped.
  • Use reasonable depth. 3 pages per domain is the conservative default.
  • Don't weaponise CVE flags or admin path findings. This is for defensive intel — your own audits, your own prospects' diligence, vendor risk assessment.

FAQ

What's the difference between riskLevel and overall.grade? overall.grade (A–F) is the composite score across five dimensions. insight.riskLevel (high/medium/low) is purely a security-and-CVE signal — even a high-grade site can show riskLevel: medium if it has one medium CVE. Use them together: grade for "how does this site compare?", riskLevel for "is there an immediate security action?".

How is priorityScore computed? Inverse of overall score, lifted by CVE severity (critical → ≥90, high → ≥75) and risk level (high → ≥70). Use it to triage which domains need action first.

How does the change classification work? On every run with compareToPriorRun: true, the actor stores the current tech list per domain in its key-value store. The next run compares against the snapshot, classifies the type of change (CDN swap, CMS migration, framework rewrite, etc.), and emits an implication string. First runs get isFirstRun: true and an empty changeInsights.

What does engineeringMaturity: "low" mean? A coarse signal blending modernity score + overall grade. Low = legacy stack and/or weak security posture; high = modern frameworks + strong scores; medium = otherwise. Useful for sales segmentation and tech-due-diligence triage.

Can I trust the lead intelligence for prospecting? Treat it as a signal, not a fact. estimatedCompanySize and likelyBudget are coarse heuristics from observable web signals — the actual answer requires people-data enrichment. Use the actor's signal to rank a list, then verify the top tier.

Why don't I see a competitivePosition block? Set enableCompetitiveIntel: true and run with at least 2 successful domains. Cohort analysis only fires when the batch is large enough to compute meaningful medians.

What does customDetectors need at minimum? A name plus at least one of: scriptPattern (regex matched against <script src> URLs), htmlPattern (regex matched against the full HTML body), or headerName + headerPattern (regex matched against a specific response header value).

What's in deepSecurity.issues? A flat list of plain-language issues found by the advanced probe — cookie flag warnings, admin path exposures, and any CRITICAL escalations (/.env, /.git/HEAD, /wp-config.php reachable).

Will I be charged for failures? No. PPE charges fire only when failureType === null.

What's in the run summary? A SUMMARY key in the actor's default KV store: preset, outputMode, totals, failure breakdown, fleet-average overall + security scores, alert counts by type, top 5 priority domains, total PPE charges.

Which preset matches my workflow? Sales / lead-gen → sales-prospecting. Scheduled competitor monitoring → competitor-tracking. Security-team triage → security-audit. Agency / investor multi-domain audit → portfolio-analysis. Backwards-compatible v1 detection → raw. Each preset's defaults are listed in the Quick start with presets table above.

What does "engineering maturity" actually measure? A coarse blend of overall.scores.modernity and the overall composite. Low = legacy stack signals (jQuery, AngularJS, EOL CMSs) and/or weak security posture. High = current frameworks (Next.js / Vercel / Tailwind / etc.) plus strong scores. Useful for sales segmentation and tech due diligence — treat it as a triage signal, not a verdict.

ActorWhat it doesHow it complements this actor
Website Contact ScraperExtracts emails, phones, and contactsPair with leadInsights for sales-ready outreach lists
B2B Lead QualifierScores and grades leadsThis actor adds standalone tech depth + CVE + change intel
B2B Lead Gen SuiteFull enrichment pipelineCombine for complete profiles: contacts + scores + tech + intelligence
Company Deep Research AgentComprehensive company intelligencePair stack intel with broader research
WHOIS Domain LookupDomain registration / ownershipPair domain age and registrar with stack analysis
SaaS Competitive IntelligenceCompetitive analysis for SaaSSupplement with stack comparisons + change intel