π§± Wappalyzer Replacement β Tech Stack Detection API
Pricing
$10.00 / 1,000 tech stack detection per domain analyzeds
π§± Wappalyzer Replacement β Tech Stack Detection API
Bulk tech stack detection from any website. Uses OSS Wappalyzer fingerprint rulesets (6000+ technologies across 100+ categories) via HTTP analysis. Replaces the paywalled Wappalyzer API.
Pricing
$10.00 / 1,000 tech stack detection per domain analyzeds
Rating
0.0
(0)
Developer
Stephan Corbeil
Actor stats
0
Bookmarked
2
Total users
1
Monthly active users
2 days ago
Last modified
Categories
Share
Wappalyzer Replacement β The Free Tech Stack Detection API That Actually Works
Bulk "what's this site built with?" lookups. OSS fingerprint ruleset. No paywall, no dead extension, no 30-request-per-day limits.
In July 2023 Wappalyzer β the browser extension that every SEO consultant, VC analyst, competitive-intel researcher, and sales-engineering team had pinned for a decade β quietly stopped shipping updates to its open-source ruleset, pulled its free API, and locked meaningful bulk access behind a $250/month "Professional" tier that still caps you at 1,000 lookups and a handful of parallel requests. The free browser extension still technically installs, but it's no longer a real API β just a pop-up on a single tab. There is no first-party "scan 500 domains overnight" story anymore.
This actor fixes that. It re-implements the useful 80% of Wappalyzer as a clean, bulk-friendly Apify actor β using the same style of regex-based fingerprint rules that Wappalyzer pioneered, but drawing from the maintained community forks (enthec/webappanalyzer, tunetheweb/wappalyzer) instead of the frozen original. Pay-per-event. No monthly minimum. No sign-up beyond an Apify account.
How It Works
Every URL you submit goes through a three-signal HTTP analysis:
- Response headers β
cf-ray,x-powered-by,x-shopify-stage,server,x-vercel-id,via,x-amz-cf-id, and ~80 other vendor-specific headers pinpoint CDNs, web servers, hosting platforms, and backend frameworks with near-zero false-positive rate. - HTML body patterns β every page's HTML is regex-matched against ~200 fingerprints for things like
<meta name="generator" content="WordPress 6.4">,__NEXT_DATA__,data-reactroot,window.__NUXT__,<meta property="og:...">, GTM container IDs (GTM-XXXXXX), GA4 measurement IDs (G-XXXXXXXXXX), Facebook Pixel snippets (fbq('init', ...)), and hundreds more. <script src>URLs β third-party script tags are extracted and pattern-matched, because that's where most analytics tools, JS frameworks, payment SDKs, and chat widgets give themselves away.js.stripe.com/v3,cdn.segment.com,googletagmanager.com/gtm.js,connect.facebook.net/.../fbevents.js,widget.intercom.io, etc.
Every match gets a confidence score:
| Signal | Confidence |
|---|---|
| Header match | 100 |
<script src> match | 90 |
| HTML body regex match | 85 |
| Implied-only (e.g. Shopify β Ruby) | 60 |
Detections are then propagated transitively via implies rules β so a Shopify hit automatically adds Ruby and Liquid, a Next.js hit implies React and Node.js, a WooCommerce hit implies WordPress + PHP. The union is deduped, category-summarized, and returned as one row per URL.
Nothing in this pipeline requires JavaScript rendering, Playwright, Chrome DevTools, or any form of browser automation β it's just gzipped HTTP fetches plus regex. That keeps it fast (typically 200β800 ms per URL), cheap ($0.01 per domain via PPE), and predictable at 1,000+ URL batch sizes. For SPA sites that do require rendering, see the limitations section below.
Input Parameters
| Field | Type | Default | Description |
|---|---|---|---|
urls | string[] | ["https://stripe.com","https://shopify.com"] | Websites to analyze. Bare hosts (example.com) or full URLs β https:// is auto-prepended. |
categories_filter | string[] | [] (all) | Only return technologies in these categories. Examples: CMS, Analytics, CDN, Ecommerce, JavaScript Frameworks, Payment Processors. |
include_confidence | boolean | true | Include per-technology 0β100 confidence scores in the output. |
include_versions | boolean | true | Extract semver version numbers when the fingerprint exposes one (e.g. React 18.2.0 from react@18.2.0). |
timeout_seconds | integer (3β60) | 15 | Per-URL HTTP timeout. Raise for slow or heavy pages. |
Example Request & Response
Input
{"urls": ["https://stripe.com"],"include_confidence": true,"include_versions": true,"timeout_seconds": 15}
Output (per URL β one row per URL in the dataset)
{"url": "https://stripe.com","final_url": "https://stripe.com/","status_code": 200,"tech_count": 12,"categories": {"JavaScript Frameworks": ["React"],"CDN": ["Cloudflare"],"Analytics": ["Segment", "Google Analytics"],"Payment Processors": ["Stripe"],"Security": ["HSTS"],"Tag Managers": ["Google Tag Manager"],"Font Scripts": ["Google Fonts"]},"technologies": [{"name": "React", "version": "18.2.0", "category": "JavaScript Frameworks", "confidence": 95, "evidence": "script: /react@18.2.0/react.production.min.js"},{"name": "Cloudflare", "version": null, "category": "CDN", "confidence": 100, "evidence": "header: cf-ray"},{"name": "Stripe", "version": null, "category": "Payment Processors", "confidence": 90, "evidence": "script: js.stripe.com/v3"},{"name": "Google Tag Manager", "version": null, "category": "Tag Managers", "confidence": 90, "evidence": "script: googletagmanager.com/gtm.js"}],"scan_time_ms": 430}
Python ApifyClient Example
from apify_client import ApifyClientclient = ApifyClient("YOUR_APIFY_TOKEN")run = client.actor("nexgendata/wappalyzer-replacement").call(run_input={"urls": ["https://notion.so","https://vercel.com","https://linear.app","https://openai.com",],"categories_filter": ["Analytics", "CDN", "JavaScript Frameworks"],"include_confidence": True,"include_versions": True,})for item in client.dataset(run["defaultDatasetId"]).iterate_items():print(item["url"], "β", item["tech_count"], "techs")for tech in item["technologies"]:print(f" {tech['category']:<30} {tech['name']:<25} conf={tech.get('confidence')}")
5 Real Use Cases
1. SDR / Outbound Lead Qualification
Your SDR team needs to filter 10,000 target domains down to "companies actually running Shopify" or "sites using HubSpot Marketing Automation." Ship the domain list into this actor, filter on categories_filter: ["Ecommerce"] or by technology name, and hand the SDRs a pre-qualified list. The $0.01/domain cost is an order of magnitude cheaper than BuiltWith Pro.
2. Competitive Tech Intelligence
Investors, product managers, and M&A teams routinely need to answer "what's this company's tech stack actually like?" for dozens of targets in a week. Point the actor at your competitor set and get one row per domain with framework, CDN, analytics, commerce platform, payment processor, and CMS identified β same data every due-diligence deck wants, without paying SimilarWeb or BuiltWith.
3. SEO / Core Web Vitals Root Cause Analysis
When a client's site is slow, 60% of the time the culprit is the stack itself β a too-old WordPress theme, Shopify Plus bloat, seven different analytics pixels stacked on top of each other. Detect everything running on the page, then decide what to kill. Pairs especially well with our page-speed-bulk-checker actor.
4. Security & Supply-Chain Auditing
Running a vendor-risk review? You need to know if your SaaS vendors are on unpatched WordPress, running jQuery 1.x, shipping a Log4j-era Java stack, or using a payment processor you don't have a DPA with. Bulk-scan your entire vendor list once a quarter and flag anything interesting for a real security review.
5. Marketing-Tech Audit of Your Own Sites
Most mid-sized companies have 10+ marketing properties β microsites, landing pages, country subsites β all loading a slightly different mix of GTM containers, old Facebook Pixels, a forgotten HotJar install, a leftover Intercom widget from 2019, and one suspiciously-embedded Segment snippet. Scan all of them, diff against your intended stack, clean up.
Wappalyzer vs BuiltWith vs this Actor
| Capability | Wappalyzer Pro | BuiltWith Pro | Wappalyzer Replacement (this actor) |
|---|---|---|---|
| Bulk API access | Tier-locked, β₯$250/mo | β₯$295/mo | Yes, pay-per-event |
| Price per 1,000 domains | ~$125+ (Pro tier) | ~$295+ | ~$10 (PPE) |
| Free tier | Browser extension only | Short trial | No minimum, $0.01/domain |
| Fingerprint source | Frozen OSS (2023) | Proprietary panel | Maintained OSS forks |
| JS-rendering | Extension only | Yes | No (HTTP-only for speed + cost) |
| Sign-up required | Yes | Yes | Apify account only |
| Output format | Dashboard / CSV | Dashboard / CSV | Clean JSON, CSV, Excel, webhook |
| Fingerprints shipped | Full 6000+ | Proprietary | ~200 highest-traffic (expandable) |
If you specifically need full 6000+ fingerprint coverage plus JS-rendering of SPAs, BuiltWith Pro or Wappalyzer Pro are still the right call. For the 90%-common case β "is this domain running Shopify, WordPress, or Next.js, and what analytics pixels are on the page?" β this actor gets you there at 1/30th the cost.
Why Run This on NexGenData / Apify?
- Zero infra. No Python environment, no maintaining fingerprint rulesets, no regex engine tuning.
- Bulk-first. Ship 5,000 URLs in one run, get one clean JSON row per URL.
- Pay-per-event. $0.01 per domain analyzed. 1,000 domains = $10. No monthly minimum.
- Fast. ~200β800 ms per URL on most sites. 1,000 URLs finish in roughly 10β15 minutes.
- Integrations. Pipe straight to Google Sheets, Slack, Zapier, Make, n8n, or a webhook.
Related Actors in the NexGenData Suite
- company-tech-stack-detector β complementary "tech fingerprint per domain" with a slightly different detection heuristic; great to cross-check.
- company-data-aggregator β WHOIS + DNS + GitHub + SSL + tech headers per domain. Use this actor for the deep tech-stack slice of that profile.
- tranco-rank-lookup β domain popularity ranking. Pair with tech-stack data to answer "what stack are the top 1,000 e-commerce sites actually running?"
- page-speed-bulk-checker β Core Web Vitals + Lighthouse scores for a list of URLs. Tech stack + CWV is the canonical "why is this site slow?" combo.
FAQ
Q: How accurate is the detection?
For the 200 technologies shipped with this actor, accuracy is comparable to the open-source Wappalyzer extension β which is to say very good for headers and script-src matches (effectively 100% precision for things like Cloudflare, Shopify, Stripe, HubSpot, Google Tag Manager) and good-but-not-perfect for HTML body patterns (classic HTML regexes occasionally false-positive on tutorials, documentation pages, and "how to detect X" articles). The confidence score is there so you can filter. For mission-critical use, drop anything with confidence < 85.
Q: Are there rate limits?
No hard rate limits on the actor itself β you're limited by Apify's default concurrency (typically dozens of parallel runs per account) and by the target sites' own rate limits. The actor sends a single HTTPS request per URL, uses a browser-realistic User-Agent, and respects timeout_seconds. For very large batches (10K+ URLs), split across multiple runs to stay polite.
Q: How many technologies can you detect? ~200 of the highest-traffic fingerprints are bundled in this build β covering essentially every major CMS, JS framework, CDN, hosting platform, analytics tool, ads pixel, payment processor, chat widget, A/B test tool, authentication provider, and UI framework you'll routinely see on real-world sites. The full Wappalyzer OSS ruleset is ~6,000 technologies; most of the other 5,800 are extremely long-tail (regional CMSes, niche forums, etc.). If you need one in particular and we don't have it, open a request on our Apify page.
Q: Is there a self-hosted option?
Yes β the actor source is viewable on the Apify platform, and the fingerprint ruleset is a plain JSON file. You're welcome to fork it, run it in your own Apify account, or port the detection logic to any other environment. The underlying community rulesets (enthec/webappanalyzer, tunetheweb/wappalyzer) are MIT-licensed.
Q: Does this work on JavaScript-rendered / SPA sites? Not in this version. The actor uses pure HTTP fetches β so if a site's tech stack is only detectable after client-side JS executes (think certain Gatsby hydration payloads, or sites that inject the Segment snippet from a code-split chunk), this actor may miss it. For ~95% of sites this is fine because most trackers, CDNs, frameworks, and payment providers leak enough via headers or the initial HTML response to be detected. For the remaining 5%, the roadmap has a Playwright-backed variant; upvote it on the Apify issue tracker if you need it.
Q: Is the data private? Do you log the URLs I scan? Apify logs the actor's input and output per run (standard platform behavior), but nothing is stored outside your own account's datasets. We don't have any upstream "URL reputation" database that the actor contributes to. If you're scanning sensitive internal URLs, run against a private dataset and set retention to 1 day.
Q: Why $0.01 per domain? It's the lowest round number that comfortably covers CPU + bandwidth cost on Apify's PPE infrastructure while keeping the actor radically cheaper than every commercial alternative. At 1,000 domains / $10 you're beating BuiltWith by 30x, Wappalyzer Pro by ~12x, and every "tech stack CSV" data broker by roughly 100x.
Q: Can I add my own fingerprints?
Not directly β the fingerprint JSON is bundled at build time. If you need custom rules, either fork the actor (the ruleset is a single fingerprints.json file) or open a request and we'll ship high-demand additions in the next version.