Tech Stack Detector — Frameworks, CMS, Analytics, JSON Out
Pricing
Pay per usage
Tech Stack Detector — Frameworks, CMS, Analytics, JSON Out
Competitor tech stack as CSV/JSON in 2 min — frameworks, CMS, analytics, CDN, servers, trackers. No Wappalyzer seat fee, no BuiltWith cap. 19 runs. Backed by 951-run Trustpilot flagship + 31-actor portfolio. spinov001@gmail.com · blog.spinov.online · t.me/scraping_ai
Pricing
Pay per usage
Rating
0.0
(0)
Developer
Alex
Actor stats
1
Bookmarked
2
Total users
0
Monthly active users
4 days ago
Last modified
Categories
Share
Website Tech-Stack Detector — Detect ~60 Web Technologies (CMS, Frameworks, Analytics, Hosting, Payment) from URL via Static HTML+Header Patterns
Scan a list of URLs and return what each site is built with. Pattern-matches HTML body and HTTP response headers against ~60 fingerprints across 10 categories. Static analysis only (no JS rendering) — fast and cheap.
What It Does
- Fetches each URL with Cheerio (static HTML, no headless browser)
- Pattern-matches HTML + response headers against the technology dictionary
- Returns one flat record per URL with detected technologies, categorized
Honest Disclosure (read this first)
This actor is intentionally lean — pure static fingerprinting. It does not do the things a Wappalyzer Pro or BuiltWith subscription does. Specifically:
- No JS rendering. Cheerio reads HTML as delivered. SPA-heavy sites that hydrate clientside (most React-only apps without SSR) will appear emptier than they are. If you need JS-rendered detection, this actor is not your tool.
- No version detection. Output lists
["React", "Next.js"], NOT[{name:"React", version:"18.2"}]. Versions require deeper parsing this actor does not implement. - No confidence scoring. A pattern matches or it doesn't. Treat as binary.
- No GA/GTM ID extraction. Output reports presence of "Google Analytics" / "Google Tag Manager", not the property IDs.
- No SSL/TLS inspection. For TLS data, use the SSL Certificate Checker actor.
- No script-domain enumeration. Output reports
scriptsCount(a number), not the list of third-party domains. - Fingerprint dictionary is hardcoded in the actor source (60 entries verified against
src/main.js). It updates when the actor is rebuilt — there is no live remote registry. - ⚠️ False-positive risk on unanchored regex. Patterns like
/react|__react/i,/vue\.js|data-v-/i,/angular/imatch anywhere in the HTML, including marketing copy ("our team uses React", "Vue.js Conference 2024 sponsor"). A site that mentions React in a blog post but doesn't actually use it will be flagged. Treat this dictionary as a screening tool, not a forensic verdict. - ⚠️ Known dictionary bug — WordPress header check. The
WordPressentry (src/main.jsline 12) setsheaders: ['x-powered-by: Express']— this is the Express.js (Node)X-Powered-Byvalue, not a WordPress header. As written, an Express.js site WILL be flagged as WordPress via the header path. The HTML path (/wp-content\//,/wp-includes\//,/wp-json\//) is correct and is how WordPress is reliably detected — but if you process headerless responses or HTML-stripped payloads, expect a false-positive. We're not auto-correcting this in production until we audit the impact on existing 16 runs.
Detected Technologies (~60 total, 10 categories)
| Category | Examples |
|---|---|
| CMS (8) | WordPress, Shopify, Wix, Squarespace, Webflow, Drupal, Joomla, Ghost |
| Framework (9) | React, Next.js, Vue.js, Nuxt.js, Angular, Svelte, Gatsby, Remix, Astro |
| Analytics (10) | Google Analytics, Google Tag Manager, Facebook Pixel, Hotjar, Mixpanel, Amplitude, Segment, Plausible, PostHog, Clarity |
| Hosting / CDN (7) | Vercel, Netlify, Cloudflare, AWS, Google Cloud, Fastly, Akamai |
| Payment (4) | Stripe, PayPal, Paddle, LemonSqueezy |
| Chat / Support (6) | Intercom, Crisp, Drift, Zendesk, HubSpot, Tawk.to |
| Email (3) | Mailchimp, ConvertKit, SendGrid |
| E-commerce (3) | WooCommerce, BigCommerce, Magento |
| Security (3) | reCAPTCHA, hCaptcha, Cloudflare Turnstile |
| Other (7) | jQuery, Bootstrap, Tailwind CSS, Font Awesome, Google Fonts, Sentry, LaunchDarkly |
The full list is in src/main.js → techPatterns. Pull requests welcome to add fingerprints.
Input
{"urls": ["https://stripe.com", "https://shopify.com"]}
| Parameter | Type | Required | Description |
|---|---|---|---|
urls | array | Yes | URLs to scan. Strings without protocol get https:// prepended. |
That's the entire input. There is no includeCategories, no renderJs, no followRedirects, no timeoutMs. The actor uses Crawlee defaults (30s request timeout, max concurrency 10, follows HTTP redirects automatically, 3 internal retries on transient errors).
Output Schema (one record per URL)
| Field | Type | Description |
|---|---|---|
url | string | The URL that was scanned |
title | string | <title> tag content |
description | string | <meta name="description"> (truncated to 200 chars) |
generator | string | <meta name="generator"> value if present |
server | string|null | Server HTTP header |
poweredBy | string|null | X-Powered-By HTTP header |
technologiesCount | int | Count of detected technologies |
technologies | string[] | Flat list, e.g. ["React","Next.js","Google Analytics","Cloudflare"] |
categories.cms | string[] | Detected CMS names |
categories.framework | string[] | Detected frameworks |
categories.analytics | string[] | Detected analytics platforms |
categories.hosting | string[] | Detected hosting/CDN |
categories.payment | string[] | Detected payment processors |
categories.chat | string[] | Detected chat widgets |
categories.email | string[] | Detected email tools |
categories.ecommerce | string[] | Detected e-commerce platforms |
categories.security | string[] | Detected captcha/security |
categories.other | string[] | Anything else (jQuery, Tailwind, fonts, Sentry, LaunchDarkly) |
scriptsCount | int | Count of <script src=""> tags only — inline <script> blocks NOT counted |
stylesheetsCount | int | Count of <link rel="stylesheet"> tags only — inline <style> NOT counted |
ogImage | string | <meta property="og:image"> content |
scrapedAt | ISO timestamp | When the URL was scanned |
Output Example
{"url": "https://stripe.com","title": "Stripe | Financial infrastructure for the internet","description": "Millions of companies use Stripe to ...","generator": "","server": "cloudflare","poweredBy": null,"technologiesCount": 7,"technologies": ["Next.js","React","Stripe","Google Analytics","Google Tag Manager","Cloudflare","Google Fonts"],"categories": {"cms": [],"framework": ["Next.js","React"],"analytics": ["Google Analytics","Google Tag Manager"],"hosting": ["Cloudflare"],"payment": ["Stripe"],"chat": [],"email": [],"ecommerce": [],"security": [],"other": ["Google Fonts"]},"scriptsCount": 14,"stylesheetsCount": 3,"ogImage": "https://stripe.com/img/v3/home/og-image.png","scrapedAt": "2026-04-29T11:50:00.000Z"}
Python Usage
from apify_client import ApifyClientfrom collections import Counterclient = ApifyClient("YOUR_APIFY_TOKEN")run = client.actor("knotless_cadence/website-tech-stack-detector").call(run_input={"urls": ["https://stripe.com","https://shopify.com","https://notion.so",]})records = list(client.dataset(run["defaultDatasetId"]).iterate_items())# Tally analytics adoption across the listanalytics = Counter()for r in records:for a in r["categories"]["analytics"]:analytics[a] += 1for name, count in analytics.most_common():print(f"{count:3d} {name}")
When This Is The Right Tool
- Quick prospect filtering — scan 100-500 B2B prospect URLs to filter by tech (e.g. "Shopify users" or "Segment users"). Static HTML usually carries enough signal for this.
- Competitive ecosystem mapping — detect which sites in a sector use a specific integration.
- Cheap dataset enrichment — when you need "what tech does this site use" as a column in a CSV.
When To Use Something Else
- Sites with no SSR (pure SPA shells) — output will be near-empty. Use a JS-rendering actor.
- You need versions or confidence scores — use Wappalyzer's API or BuiltWith.
- You need GA property IDs / GTM container IDs — those require deeper parsing this actor does not do.
- You need TLS / SSL data — use SSL Certificate Checker.
- You need forensic-grade detection (no false-positives on copy) — use Wappalyzer or BuiltWith. This actor's regex patterns are unanchored.
Related Free Actors
| Actor | Returns |
|---|---|
| SSL Certificate Checker | TLS expiry, SAN list, signature algorithm |
| WHOIS Domain Lookup | Domain registration metadata (RDAP) |
| IP Geolocation Lookup | Country, ISP, ASN |
| Website Screenshot Scraper | PNG capture for any URL |
Browse all 31 published actors → apify.com/knotless_cadence
Apify-as-a-Service — when you need data, not infrastructure
| Tier | Price | What you get |
|---|---|---|
| Pilot | $97 | 1 custom actor, basic config, 7-day support |
| Standard | $297 | Custom actor + Slack/email alerts, 30-day support |
| Premium | $797 | Custom actor + dashboard + 90-day support + 1 modification round |
A custom build can deliver: a corrected WordPress header pattern, anchored regex (lower false-positive rate), JS-rendering via Playwright, version detection via specific JS-bundle parsing, GA/GTM ID extraction, third-party domain enumeration.
Email: spinov001@gmail.com — drop your specs, schema, or target URLs.
Proof of work: 31 published actors (78 total in portfolio) — 949+ Trustpilot runs, 80+ Reddit, 25+ HN. Recently delivered a paid 3-article series for a client in the proxy industry ($150).
More tips: t.me/scraping_ai · blog.spinov.online
Honest disclosure: 60 fingerprints across 10 categories. WordPress header pattern is incorrectly set to Express.js's x-powered-by value — HTML detection still works, header-only false-positive risk on Express sites. Regex patterns are unanchored — marketing-copy mentions of frameworks can trigger false positives. Static HTML only, no JS rendering, no version detection, no confidence scoring. Crawlee defaults: 30s timeout, concurrency 10, 3 retries on transients.