Tech Stack Detector — Frameworks, CMS, Analytics, JSON Out avatar

Tech Stack Detector — Frameworks, CMS, Analytics, JSON Out

Pricing

Pay per usage

Go to Apify Store
Tech Stack Detector — Frameworks, CMS, Analytics, JSON Out

Tech Stack Detector — Frameworks, CMS, Analytics, JSON Out

Competitor tech stack as CSV/JSON in 2 min — frameworks, CMS, analytics, CDN, servers, trackers. No Wappalyzer seat fee, no BuiltWith cap. 19 runs. Backed by 951-run Trustpilot flagship + 31-actor portfolio. spinov001@gmail.com · blog.spinov.online · t.me/scraping_ai

Pricing

Pay per usage

Rating

0.0

(0)

Developer

Alex

Alex

Maintained by Community

Actor stats

1

Bookmarked

2

Total users

0

Monthly active users

4 days ago

Last modified

Share

Website Tech-Stack Detector — Detect ~60 Web Technologies (CMS, Frameworks, Analytics, Hosting, Payment) from URL via Static HTML+Header Patterns

Scan a list of URLs and return what each site is built with. Pattern-matches HTML body and HTTP response headers against ~60 fingerprints across 10 categories. Static analysis only (no JS rendering) — fast and cheap.

What It Does

  • Fetches each URL with Cheerio (static HTML, no headless browser)
  • Pattern-matches HTML + response headers against the technology dictionary
  • Returns one flat record per URL with detected technologies, categorized

Honest Disclosure (read this first)

This actor is intentionally lean — pure static fingerprinting. It does not do the things a Wappalyzer Pro or BuiltWith subscription does. Specifically:

  • No JS rendering. Cheerio reads HTML as delivered. SPA-heavy sites that hydrate clientside (most React-only apps without SSR) will appear emptier than they are. If you need JS-rendered detection, this actor is not your tool.
  • No version detection. Output lists ["React", "Next.js"], NOT [{name:"React", version:"18.2"}]. Versions require deeper parsing this actor does not implement.
  • No confidence scoring. A pattern matches or it doesn't. Treat as binary.
  • No GA/GTM ID extraction. Output reports presence of "Google Analytics" / "Google Tag Manager", not the property IDs.
  • No SSL/TLS inspection. For TLS data, use the SSL Certificate Checker actor.
  • No script-domain enumeration. Output reports scriptsCount (a number), not the list of third-party domains.
  • Fingerprint dictionary is hardcoded in the actor source (60 entries verified against src/main.js). It updates when the actor is rebuilt — there is no live remote registry.
  • ⚠️ False-positive risk on unanchored regex. Patterns like /react|__react/i, /vue\.js|data-v-/i, /angular/i match anywhere in the HTML, including marketing copy ("our team uses React", "Vue.js Conference 2024 sponsor"). A site that mentions React in a blog post but doesn't actually use it will be flagged. Treat this dictionary as a screening tool, not a forensic verdict.
  • ⚠️ Known dictionary bug — WordPress header check. The WordPress entry (src/main.js line 12) sets headers: ['x-powered-by: Express'] — this is the Express.js (Node) X-Powered-By value, not a WordPress header. As written, an Express.js site WILL be flagged as WordPress via the header path. The HTML path (/wp-content\//, /wp-includes\//, /wp-json\//) is correct and is how WordPress is reliably detected — but if you process headerless responses or HTML-stripped payloads, expect a false-positive. We're not auto-correcting this in production until we audit the impact on existing 16 runs.

Detected Technologies (~60 total, 10 categories)

CategoryExamples
CMS (8)WordPress, Shopify, Wix, Squarespace, Webflow, Drupal, Joomla, Ghost
Framework (9)React, Next.js, Vue.js, Nuxt.js, Angular, Svelte, Gatsby, Remix, Astro
Analytics (10)Google Analytics, Google Tag Manager, Facebook Pixel, Hotjar, Mixpanel, Amplitude, Segment, Plausible, PostHog, Clarity
Hosting / CDN (7)Vercel, Netlify, Cloudflare, AWS, Google Cloud, Fastly, Akamai
Payment (4)Stripe, PayPal, Paddle, LemonSqueezy
Chat / Support (6)Intercom, Crisp, Drift, Zendesk, HubSpot, Tawk.to
Email (3)Mailchimp, ConvertKit, SendGrid
E-commerce (3)WooCommerce, BigCommerce, Magento
Security (3)reCAPTCHA, hCaptcha, Cloudflare Turnstile
Other (7)jQuery, Bootstrap, Tailwind CSS, Font Awesome, Google Fonts, Sentry, LaunchDarkly

The full list is in src/main.jstechPatterns. Pull requests welcome to add fingerprints.

Input

{
"urls": ["https://stripe.com", "https://shopify.com"]
}
ParameterTypeRequiredDescription
urlsarrayYesURLs to scan. Strings without protocol get https:// prepended.

That's the entire input. There is no includeCategories, no renderJs, no followRedirects, no timeoutMs. The actor uses Crawlee defaults (30s request timeout, max concurrency 10, follows HTTP redirects automatically, 3 internal retries on transient errors).

Output Schema (one record per URL)

FieldTypeDescription
urlstringThe URL that was scanned
titlestring<title> tag content
descriptionstring<meta name="description"> (truncated to 200 chars)
generatorstring<meta name="generator"> value if present
serverstring|nullServer HTTP header
poweredBystring|nullX-Powered-By HTTP header
technologiesCountintCount of detected technologies
technologiesstring[]Flat list, e.g. ["React","Next.js","Google Analytics","Cloudflare"]
categories.cmsstring[]Detected CMS names
categories.frameworkstring[]Detected frameworks
categories.analyticsstring[]Detected analytics platforms
categories.hostingstring[]Detected hosting/CDN
categories.paymentstring[]Detected payment processors
categories.chatstring[]Detected chat widgets
categories.emailstring[]Detected email tools
categories.ecommercestring[]Detected e-commerce platforms
categories.securitystring[]Detected captcha/security
categories.otherstring[]Anything else (jQuery, Tailwind, fonts, Sentry, LaunchDarkly)
scriptsCountintCount of <script src=""> tags only — inline <script> blocks NOT counted
stylesheetsCountintCount of <link rel="stylesheet"> tags only — inline <style> NOT counted
ogImagestring<meta property="og:image"> content
scrapedAtISO timestampWhen the URL was scanned

Output Example

{
"url": "https://stripe.com",
"title": "Stripe | Financial infrastructure for the internet",
"description": "Millions of companies use Stripe to ...",
"generator": "",
"server": "cloudflare",
"poweredBy": null,
"technologiesCount": 7,
"technologies": ["Next.js","React","Stripe","Google Analytics","Google Tag Manager","Cloudflare","Google Fonts"],
"categories": {
"cms": [],
"framework": ["Next.js","React"],
"analytics": ["Google Analytics","Google Tag Manager"],
"hosting": ["Cloudflare"],
"payment": ["Stripe"],
"chat": [],
"email": [],
"ecommerce": [],
"security": [],
"other": ["Google Fonts"]
},
"scriptsCount": 14,
"stylesheetsCount": 3,
"ogImage": "https://stripe.com/img/v3/home/og-image.png",
"scrapedAt": "2026-04-29T11:50:00.000Z"
}

Python Usage

from apify_client import ApifyClient
from collections import Counter
client = ApifyClient("YOUR_APIFY_TOKEN")
run = client.actor("knotless_cadence/website-tech-stack-detector").call(
run_input={"urls": [
"https://stripe.com",
"https://shopify.com",
"https://notion.so",
]}
)
records = list(client.dataset(run["defaultDatasetId"]).iterate_items())
# Tally analytics adoption across the list
analytics = Counter()
for r in records:
for a in r["categories"]["analytics"]:
analytics[a] += 1
for name, count in analytics.most_common():
print(f"{count:3d} {name}")

When This Is The Right Tool

  • Quick prospect filtering — scan 100-500 B2B prospect URLs to filter by tech (e.g. "Shopify users" or "Segment users"). Static HTML usually carries enough signal for this.
  • Competitive ecosystem mapping — detect which sites in a sector use a specific integration.
  • Cheap dataset enrichment — when you need "what tech does this site use" as a column in a CSV.

When To Use Something Else

  • Sites with no SSR (pure SPA shells) — output will be near-empty. Use a JS-rendering actor.
  • You need versions or confidence scores — use Wappalyzer's API or BuiltWith.
  • You need GA property IDs / GTM container IDs — those require deeper parsing this actor does not do.
  • You need TLS / SSL data — use SSL Certificate Checker.
  • You need forensic-grade detection (no false-positives on copy) — use Wappalyzer or BuiltWith. This actor's regex patterns are unanchored.
ActorReturns
SSL Certificate CheckerTLS expiry, SAN list, signature algorithm
WHOIS Domain LookupDomain registration metadata (RDAP)
IP Geolocation LookupCountry, ISP, ASN
Website Screenshot ScraperPNG capture for any URL

Browse all 31 published actors → apify.com/knotless_cadence


Apify-as-a-Service — when you need data, not infrastructure

TierPriceWhat you get
Pilot$971 custom actor, basic config, 7-day support
Standard$297Custom actor + Slack/email alerts, 30-day support
Premium$797Custom actor + dashboard + 90-day support + 1 modification round

A custom build can deliver: a corrected WordPress header pattern, anchored regex (lower false-positive rate), JS-rendering via Playwright, version detection via specific JS-bundle parsing, GA/GTM ID extraction, third-party domain enumeration.

Email: spinov001@gmail.com — drop your specs, schema, or target URLs.

Proof of work: 31 published actors (78 total in portfolio) — 949+ Trustpilot runs, 80+ Reddit, 25+ HN. Recently delivered a paid 3-article series for a client in the proxy industry ($150).

More tips: t.me/scraping_ai · blog.spinov.online


Honest disclosure: 60 fingerprints across 10 categories. WordPress header pattern is incorrectly set to Express.js's x-powered-by value — HTML detection still works, header-only false-positive risk on Express sites. Regex patterns are unanchored — marketing-copy mentions of frameworks can trigger false positives. Static HTML only, no JS rendering, no version detection, no confidence scoring. Crawlee defaults: 30s timeout, concurrency 10, 3 retries on transients.