CMS & Tech Stack Detector
Pricing
from $3.50 / 1,000 results
CMS & Tech Stack Detector
Detects the CMS and full technology stack of any website in seconds. Returns the platform (WordPress, Shopify, Webflow, Wix, Drupal, custom, …) plus every other technology found on the homepage: frameworks, CDNs, analytics, ecommerce add-ons, payment processors, and more.
Pricing
from $3.50 / 1,000 results
Rating
0.0
(0)
Developer
Thodor
Maintained by CommunityActor stats
1
Bookmarked
2
Total users
1
Monthly active users
2 days ago
Last modified
Categories
Share
Pass in a list of domains, get back what CMS each one runs, plus the rest of the stack (framework, CDN, analytics, marketing tools). One row per domain. Pay per result: failed fetches don't push a row and aren't billed.
This is a Wappalyzer alternative and a BuiltWith alternative for people who need to bulk-classify domains for lead generation, agency prospecting, ABM, or competitive analysis without a heavy monthly subscription, a 2-technology cap (BuiltWith Basic), or API credits that expire after 60 days (Wappalyzer Pro).
Wappalyzer alternative for bulk CMS detection
Each domain gets a single type label, so you can filter or segment a list of 100 or 100,000 prospects in one column:
type | What it means | Examples |
|---|---|---|
| CMS | Traditional content management system | WordPress, Drupal, Joomla, Sitecore, Adobe Experience Manager, Ghost, Strapi, Contentful |
| Ecommerce | Online-store platform | Shopify, WooCommerce, Magento, BigCommerce, PrestaShop, Salesforce Commerce Cloud |
| Website builder | No-code drag-and-drop builder | Wix, Squarespace, Webflow, Carrd, Framer, Bubble, Duda |
| Blog | Dedicated blogging platform | Medium, Substack, Tumblr, Hashnode, Beehiiv, Bear Blog |
| Framework | No CMS, but a known web/JS framework | Next.js, Nuxt.js, React, Vue.js, Svelte, Gatsby, Astro, Remix |
| Unknown | We reached the site, but nothing in the HTML matched | Hand-built static sites, heavily-stripped custom builds, JS-rendered SPAs we can't see |
null | We couldn't reach the site (4xx / 5xx / DNS error) | No row is pushed; you are not charged |
Alongside type, every row carries:
cms— name and version of the specific platform, ornullif no CMS was found.framework— the underlying tech as a plain name (Next.js, Webflow, WordPress, …).cdn,analytics,marketing— three ready-to-filter lists of the tools the site uses.breakdown— the complete list of every technology found, including the ones already shown above, so you have the full picture in one place. Each entry also carries apricingarray (when known):low/mid/high/poafor the cost tier, plusfreemium/recurring/onetime/paygfor the billing model.
What this is good for
Five concrete plays. If yours isn't here, the data fits any CRM or spreadsheet.
- Competitor-displacement campaigns. Filter your prospect list for every site running a specific marketing tool or CMS, then feed those URLs into Apollo or Hunter for verified emails. "Find every Klaviyo store missing a loyalty app" is two filters: keep the rows where
marketingcontains "Klaviyo" and the full tech list has no loyalty tool. - Agency migration prospecting. Pull every site still on outdated platforms (Joomla, Drupal 7, classic ASP, Magento 1) in your geo and pitch a re-platform. Filter by CMS name and version in your spreadsheet, done in one column.
- Shopify-app and WordPress-plugin sales. Find every Shopify store in your region by filtering for type "Ecommerce" + CMS "Shopify", then scan the full tech list to see which apps they already run (or don't).
- Clay / n8n / Make enrichment column. Feed a "Company Domain" column through this Actor's quick-response endpoint and get CMS, framework, CDN, analytics, and marketing tools appended to every row, without paying for Clay's Explorer tier or a monthly BuiltWith plan.
- ABM TAM sizing. Measure Next.js / Shopify Plus / Webflow adoption across 10,000 sites in your vertical to validate ICP before spinning up an outbound team.
- Enterprise vs SMB segmentation. Each detected technology carries a pricing tag when known. Filter
breakdownfor entries withpricingcontaininghighorpoato flag enterprise-priced stacks (HubSpot Enterprise, Marketo, …), orfreemiumfor free-plan adopters you could upsell. About 60% of detected technologies carry pricing data.
A common adjacent need: this Actor returns the tech stack, not contact data. Feed the URL → CMS output into the Email Scraper to pull emails off the same domains, or pair with the Apollo Scraper, Contact Info Scraper, or Clay's email-enrichment column for decision-maker enrichment.
How it works
- Clean up the URL. Bare domains,
www.-prefixed, or any deep URL all work; we always check the homepage.www.is dropped, sowww.x.comandx.comend up as one row. - Fetch the homepage looking like a Chrome browser. We mimic Chrome so the site doesn't realise it's being scraped, but we don't run a full browser. That's enough to get past most Cloudflare and bot-check pages.
- Try again as a search-engine crawler if the first fetch looks blocked. If the page looks paywalled or cookie-walled, we try a second time pretending to be the DuckDuckGo bot. Many sites serve a cleaner version to search engines than to browsers, where the actual platform clues live.
- Match against ~7,500 known technologies from the open-source enthec/webappanalyzer project (the community successor to the original Wappalyzer database). We add extra checks for modern frameworks (Next.js, Nuxt, React, Vue, Svelte, Astro, Remix, Gatsby) so they show up even when the live site hides the usual clues.
- Clean up and categorise. One
typelabel plus the curatedcms/framework/cdn/analytics/marketinglists. We also fix common upstream quirks. For example, Amazon S3 is object storage and not a CDN (we move it), and Leadfeeder is a marketing tool and not analytics (we move it). - Save one row per domain. Failed fetches (DNS error, 4xx, 5xx, timeout) do not create a row, so you're only billed for sites we actually returned a result for.
Input
Two ways to give it a list of domains:
Apify Console (spreadsheet-friendly). Open the Actor, paste one URL per line into the Start URLs or domains box, hit Start. Bare domains, www.-prefixed, deep URLs are all accepted.
API / programmatic. Pass an array under start_urls:
{"start_urls": [{ "url": "shopify.com" },{ "url": "https://www.nytimes.com" },{ "url": "techcrunch.com" }]}
You can pass 1 URL or 100,000 in one call. 10,000 finishes in ~50–100 minutes, 100,000 runs overnight. Subdomains are distinct rows; docs.example.com and example.com produce two separate detections.
Output
Three real examples, all from a live run.
A WordPress site (TechCrunch)
{"domain": "techcrunch.com","url_checked": "https://techcrunch.com/","type": "CMS","cms": { "name": "WordPress", "version": "6.9.4" },"framework": "WordPress","cdn": [],"analytics": [],"marketing": ["Google Tag Manager", "Sailthru"],"breakdown": [{ "name": "MySQL", "version": null, "categories": ["Databases"], "pricing": [] },{ "name": "Nginx", "version": null, "categories": ["Web servers"], "pricing": [] },{ "name": "PHP", "version": null, "categories": ["Programming languages"], "pricing": [] },{ "name": "React", "version": null, "categories": ["JavaScript frameworks"], "pricing": [] },{ "name": "Sailthru", "version": null, "categories": ["Marketing automation"], "pricing": ["poa"] },{ "name": "WordPress", "version": "6.9.4", "categories": ["CMS", "Blogs"], "pricing": ["low", "recurring", "freemium"] },{ "name": "Yoast SEO Premium", "version": "25.1", "categories": ["SEO"], "pricing": ["low", "freemium", "recurring"] }],"tech_count": 18}
(Breakdown above is trimmed for readability; the live row contains every detected technology. Sites change: when this Actor was first written TechCrunch was on Amazon CloudFront with HubSpot installed; today the page leaks Sailthru instead.)
A Shopify store
{"domain": "shopify.com","url_checked": "https://shopify.com/","type": "Ecommerce","cms": { "name": "Shopify", "version": null },"framework": "Shopify","cdn": ["Cloudflare"],"analytics": [],"marketing": [],"breakdown": [{ "name": "Cloudflare", "version": null, "categories": ["CDN"], "pricing": [] },{ "name": "FedEx", "version": null, "categories": ["Shipping carriers"], "pricing": [] },{ "name": "React", "version": null, "categories": ["JavaScript frameworks"], "pricing": [] },{ "name": "Shopify", "version": null, "categories": ["Ecommerce"], "pricing": ["low", "recurring"] }],"tech_count": 10}
An Unknown result (heavily-stripped custom build). You'll see this for hand-rolled static sites and some JS-rendered SPAs where the production build hides every platform marker:
{"domain": "example.com","url_checked": "https://example.com/","type": "Unknown","cms": null,"framework": null,"cdn": [],"analytics": [],"marketing": [],"breakdown": [],"tech_count": 0}
| Field | Meaning |
|---|---|
domain | Canonical hostname (lowercased, leading www. stripped). |
url_checked | The exact homepage URL fetched. |
type | One of the seven values in the table above. Always set when the fetch succeeded. |
cms | { name, version } of the platform, or null if no CMS-tier match. version may be null if the site doesn't expose it. |
framework | Plain-string "what runs this site": Next.js / Webflow / WordPress / etc. |
cdn | CDN providers (Cloudflare, CloudFront, Fastly, Akamai, BunnyCDN, jsDelivr, …). |
analytics | Web-analytics tools (Google Analytics, Plausible, Mixpanel, Heap, Fathom, Amplitude, Matomo, …). |
marketing | Marketing automation, email marketing, CRM, tag managers, live chat, A/B testing, retargeting, CDP. |
breakdown | Full detection list, sorted alphabetically. Each item: name, optional version, list of categories, and a pricing array (when upstream data is available; see below). |
breakdown[].pricing | Cost tier (low <$100/mo, mid $100–$1k/mo, high >$1k/mo, poa price-on-asking) and/or billing model (freemium, recurring, onetime, payg). Empty array when upstream has no pricing data, about 40% of techs (mostly open-source projects, browser APIs, and infrastructure primitives). |
tech_count | Unique technologies detected; useful for sorting "most stack-rich" domains. |
Download the dataset as JSON, CSV, Excel, HTML, or XML from the Dataset tab. Tools like Clay, Make, n8n, and Zapier can stream rows out as they're produced via webhook.
BuiltWith alternative: pay per result, no monthly minimum
Billed per successful detection: one row pushed to the dataset per domain we returned a result for. Failed fetches (DNS error, 4xx, 5xx, timeout) do not push a row and are free. The maxItems setting on a run is respected: you won't get charged for more than you asked for.
Throughput: ~5–10 minutes for 1,000 domains, ~50–100 minutes for 10,000, overnight for 100,000 (concurrency capped at 10 simultaneous fetches to keep memory predictable).
Compare:
| This Actor | Wappalyzer Pro | BuiltWith Basic | WhatCMS | |
|---|---|---|---|---|
| Billing model | Pay per result | Monthly subscription | Monthly subscription | Per-lookup or subscription |
| Monthly minimum | None | Yes | Yes | None |
| Credit expiration | None | 60 days | n/a | n/a |
| Technologies you can filter on | Unlimited | Unlimited | Capped at 2 | CMS only |
| Bulk lookup via API | Yes | Yes | Yes | Yes (paid tier) |
Single type label per site | Yes | No | Partial | Yes (CMS only) |
| Bot-UA cloaking workaround | Yes | No | n/a | Unknown |
| Modern-framework heuristic layer (Next.js / Vue / Svelte / Astro) | Yes | Partial | Partial | n/a (CMS only) |
Where the technology database comes from
In August 2023, Wappalyzer closed its open-source rules and moved everything behind a paid subscription. The ~7,500-technology database that powered every "what is this site running" tool for a decade was suddenly frozen.
A few weeks later (September 2023), the enthec/webappanalyzer project picked up where Wappalyzer left off, keeping the database open, public, and actively maintained. It now covers ~7,500 technologies across 108 categories and is updated roughly weekly (most recent update: April 2026; 500+ stars, 119 forks on GitHub).
Almost no shipped tool uses it. Most "Wappalyzer alternative" libraries you'll find online are still using the frozen pre-2023 database, which means they don't recognise Next.js App Router, modern Shopify themes, recent Cloudflare products, the post-2024 wave of headless CMSes, or anything added in the last two years. This Actor reads the up-to-date enthec database directly, and on top of that adds a clean-up pass to fix the quirks that the upstream data still has (Amazon S3 wrongly tagged as a CDN, B2B retargeting tools tagged as analytics, and so on).
Custom changes on top of the upstream
Source for the curious: github.com/Polluxs/apify/tree/master/apify-cms-detector.
A naive "load enthec JSON + match regex" loop produces output most users would consider broken, both because the database has known false positives that any pre-2023 Wappalyzer fork would hit, and because enthec's category IDs aren't the same as the old database's. Everything I had to fix to get clean output:
- Of the ~7,500 fingerprints, exactly 884 are JS-only. They fire only on
window.Xglobals, so any HTTP-only matcher (no headless browser) has a hard ceiling at ~6,634 detectable techs. Useful number if you're planning capacity or comparing alternatives. - Category-ID remapping for the post-2023 database. Pre-2023 Wappalyzer numbered "Email marketing" as cat 95 and "Personalisation" as cat 70. enthec renumbered: cat 95 is now "Digital asset management", cat 70 is now "SSL/TLS certificate authorities". Every Wappalyzer port written before September 2023 silently mis-buckets technologies if its
MARKETING_CATEGORY_IDSconstants haven't been audited against the newcategories.json. I rebuilt the bucket constants from scratch. - "Cart Functionality" tie-break fix. Upstream ships a generic
cats=[6]detector called "Cart Functionality" that triggers on any page with shopping-cart markup. In a naive port it beats out the real platform alphabetically, sostripe.com,shopify.com, and most ecommerce sites end up reporting "Cart Functionality" as the CMS instead of Shopify / Stripe. I exclude it from the CMS / framework picker (SKIP_CMS_FRAMEWORK_PICK); it still appears inbreakdownso the match is visible. - Bucket priority flip + ~25 curation overrides. The upstream double-tags hundreds of marketing-automation tools (Braze, CleverTap, Airship, …) as both Marketing automation AND Analytics. I flipped the bucket priority from analytics-first to marketing-first so those naturally land in
marketing, then added explicit overrides for the edge cases the upstream gets wrong: Amazon S3 excluded from CDN (object storage, not CDN), styled-components / Emotion / JSS excluded from frameworks (CSS-in-JS), Leadfeeder / LinkedIn Insight Tag moved analytics → marketing (B2B retargeting), Datadog / BugSnag / etc. dropped from analytics (APM, not web analytics). All deltas live in oneBUCKET_OVERRIDESdict so the curation is auditable. - Label fixes on the breakdown rows so they match the top-level buckets. Amazon S3 reads "Object storage" (not "CDN"), styled-components reads "CSS-in-JS" (not "JavaScript frameworks"), Leadfeeder reads "B2B retargeting" (not "Analytics"), Datadog reads "APM", Ahrefs drops the "Analytics" tag, Imperva drops the "CDN" tag, etc.
- Modern-framework heuristic layer. Wappalyzer's strict rules occasionally miss Next.js / Nuxt / React / Vue / Svelte / Astro / Remix / Gatsby on production builds that strip the obvious markers. I look explicitly for
/_next/,__NEXT_DATA__,window.__NUXT__,data-reactroot,data-svelte-h,data-astro-,/_remix/,gatsby-image, etc. as secondary markers. - DuckDuckGo bot-UA fallback. Some sites cloak: paywalled or cookie-walled to real browsers, clean SEO version to crawlers. If the Chrome response looks materially different from the DuckDuckGo-UA response, I run detection on the bot version. Recovers a meaningful chunk of otherwise-
Unknownsites for cheap.
Open an issue if you spot a curation case I've missed. BUCKET_OVERRIDES and SKIP_CMS_FRAMEWORK_PICK are one-line additions.
Why open source matters here
The detection ruleset, the curation overrides, and the matcher source are all public. When a detection is wrong on a real site, you have actual leverage: open an issue on this Actor (curation overrides are one-line additions) or PR the fingerprint upstream at enthec/webappanalyzer where the fix lands for every tool that consumes the database. Many eyes on one shared ruleset means edge cases get found and patched faster than any single team could ship them. Closed-data tools (BuiltWith, the post-2023 Wappalyzer SaaS) are fine for what they are; the update loop just runs on their schedule, not the community's.
What we don't detect
We don't run a full browser. This Actor mimics a real Chrome browser so it doesn't get blocked the way ordinary scrapers do, but it doesn't run the page's JavaScript. Skipping JavaScript is what keeps it fast, and on ~95% of sites the platform still shows up in the page source so a real browser would add nothing. Every fetch is real-time; nothing is cached.
The 5% that does need a browser:
- Tools that only show up after the page's JavaScript has run. Most of the popular ones (HubSpot, Salesforce, Segment, Mixpanel, Amplitude, FullStory, …) still have static markers so we catch them. The notable misses are the Adobe enterprise stack (Adobe Analytics, Target, DTM, Launch), Drift, Cloudflare Turnstile / Zaraz / Rocket Loader, Microsoft Application Insights, and Amazon CloudWatch RUM; these are 100% JavaScript-only with no static fingerprint to fall back on.
- Sites that build their entire homepage with JavaScript and leave almost nothing in the page source, returning
type: "Unknown". - The underlying framework (Next.js, Nuxt, React) usually still leaves traces in the page source (like
/_next/paths or__NEXT_DATA__blocks) and we explicitly look for those, so even a JavaScript-heavy site typically returnstype: "Framework"with the correct name.
Want browser-rendered detection? Open an issue on the Issues tab with the URL(s) where you're seeing Unknown. If demand is there I'll add an opt-in browser-rendered mode (the default stays cheap and fast for the 95%).
How accurate is it? Independent testing across ~2,000 sites (Nick Sawinyh, SEOmator) puts CMS detection accuracy at 87–93% across all major tools, and backend-tech detection at ~27%. We sit in the same band, and the freshness of the enthec database is what tips the modern-tech edge cases (Next.js App Router, Astro, recent Shopify themes, post-2024 CDN products) where libraries still using the pre-2023 frozen database quietly miss things.
Integrations
Clay (HTTP Enrichment column). Skip the Explorer tier. Paste the run-sync URL into a Clay HTTP column and append cms / framework / cdn to every row:
GET https://api.apify.com/v2/acts/<username>~apify-cms-detector/run-sync-get-dataset-items?token=<APIFY_TOKEN>&method=POST&start_urls[0][url]={{Domain}}
The ?method=POST query parameter is the standard Apify trick for tools that only support GET webhooks (Clay, certain Zapier paths, browser bookmarks).
n8n. Add an HTTP Request node, method POST, URL:
https://api.apify.com/v2/acts/<username>~apify-cms-detector/run-sync-get-dataset-items?token=<APIFY_TOKEN>
Body (JSON): { "start_urls": [{ "url": "{{$json.domain}}" }] }. Wire a CRM trigger in, filter on {{$json.cms.name === "Shopify"}}, send to Apollo or Outreach.
Make / Zapier / Google Sheets. Same pattern: a single HTTP module pointed at run-sync-get-dataset-items, returning JSON in one round-trip. Combine with Apify Schedules to refresh a Google Sheet of prospects nightly and Slack-alert on CMS changes.
curl (any other workflow).
curl -X POST "https://api.apify.com/v2/acts/<username>~apify-cms-detector/run-sync-get-dataset-items?token=$APIFY_TOKEN" \-H "Content-Type: application/json" \-d '{ "start_urls": [{ "url": "shopify.com" }] }'
FAQ
Does this work on JavaScript-heavy sites (single-page apps)? Partly. We don't run JavaScript; we read the page source like a search engine does. Any tool that only appears after JavaScript runs in your browser is invisible to us. The good news: most modern sites still leave framework traces in the page source (Next.js leaves /_next/ paths, React leaves data-reactroot markers, etc.) and we explicitly look for those, so you'll usually still see the framework name even when the rest is hidden. About 5% of sites return Unknown. If you need full browser-mode detection, open an issue with the URL.
Why not just run a real browser then? Speed and cost. Running a full browser per site is roughly 10× more expensive per URL and takes seconds instead of milliseconds. It also gets blocked more often by Cloudflare and Akamai bot-detection products, because their detectors specifically look for headless browsers. For the 95% of sites where the platform shows up in the page source, a browser adds nothing. The 5% where it'd genuinely help is exactly what an opt-in browser mode would solve; let me know via Issues if you're in that 5%.
What's the difference vs. BuiltWith / Wappalyzer / WhatCMS? No monthly subscription and no lock-in. BuiltWith Basic caps you at 2 technologies you can filter on; Wappalyzer Pro expires unused API credits after 60 days. This Actor bills per result with no minimum and no expiration. The detection engine itself uses the same upstream fingerprint database as Wappalyzer (the actively-maintained enthec fork, ~7,500 technologies).
How many technologies do you detect? Why not all 7,500? All 7,500 are loaded. Detection rate depends on how well a tech leaks into the HTML. Popular CMSes and ecommerce platforms (WordPress, Shopify, Webflow, Wix, Magento, Drupal) hit near 100%; obscure plugins and JS-only libraries hit much lower. We don't artificially cap.
Do you return version numbers? Confidence scores? Versions: yes, when the site exposes them (e.g. WordPress's <meta name="generator"> tag). Many sites strip these in production; version: null means matched-but-no-version. Confidence: not surfaced today; under consideration.
What about Cloudflare-protected sites? Most of them work. Mimicking a real Chrome browser gets us through Cloudflare's standard bot challenges, and the DuckDuckGo bot fallback recovers a chunk of what's left. Sites with the hardest JavaScript challenges (Cloudflare Turnstile, Akamai Bot Manager) will sometimes return Unknown. Open an issue with the URL if you hit one.
Do you detect headless CMSes (Contentful, Sanity, Strapi)? When they leak. Contentful's CDN domain and Sanity's API endpoints are in the fingerprint database. If the site fully proxies them, we'll see the framework (Next.js, Astro, etc.) but not the headless backend.
Why is cms null for this site? Either (a) the site isn't built on a CMS (check the framework field instead); (b) it's a JavaScript-heavy site where platform clues never appear in the page source; or (c) the site explicitly stripped all generator tags. Cross-check by running the URL through Wappalyzer's browser extension. If Wappalyzer sees it too, we should. Open an issue with the URL.
Why wasn't X detected even though I can see it on the site? Each detection has a confidence score, and a tool is only reported once the total confidence reaches 100. Strong signals (like a unique header or generator tag) count as 100 on their own, so they fire by themselves. Weaker signals are worth 50, which means one weak signal alone isn't enough — you need two. This is the upstream Wappalyzer model and we follow it. It cuts down on noisy false positives, but it does mean obscure tools sometimes hide behind a single weak marker.
Why does a site that re-platformed still show its old platform? Migrations rarely strip every legacy marker. Old /wp-content/ paths in image references, <meta name="generator"> tags in the homepage source, robots.txt entries, sitemap structures — all linger for months or years. If a detection seems wrong on a site you know switched stacks, check the Wayback Machine to see when the change happened; the lingering markers typically fade over 6–18 months as the site is rebuilt.
Can a site lie to detection? Yes. Tech-stack detection reads public signals (HTML, headers, cookies, script src URLs) and any of those can be added or removed. Julien Verneaut demonstrated that Wappalyzer can be tricked into reporting 1,929 technologies on a single page by stuffing it with matchable scripts and cookies. Practical implication: treat tech-stack data as a signal, not gospel. For high-stakes calls (a million-dollar account, a security audit), cross-reference with another source before committing.
Why does the result sometimes show a strange category label inside breakdown? The upstream Wappalyzer database has some legacy and overlapping categories (you'll occasionally see "Captchas" tagged on unrelated tech). We pass most of those through as-is. For the handful of technologies the upstream consistently mislabels (Amazon S3 tagged as CDN, styled-components tagged as a JS framework, Leadfeeder tagged as analytics, …), we apply a small relabel, so Amazon S3 reads "Object storage", styled-components reads "CSS-in-JS", Leadfeeder reads "B2B retargeting", etc. That keeps breakdown[].categories consistent with the top-level cdn / analytics / marketing / framework arrays.
Do you provide contact data? No. This Actor returns the tech stack. Pipe the URL into the Email Scraper to pull addresses off the same domains, or pair with the Apollo Scraper, Contact Info Scraper, or Clay's email-finder for decision-maker enrichment.
How is this different from running Wappalyzer's open-source rules myself? Three things. (1) Fallbacks to actually get the page (cookie-accept walls, redirects, browsers cloaked behind a search-engine-only version), all handled without a full headless browser. (2) An extra detection layer for modern frameworks (Next.js, Nuxt, React, Vue, Svelte, Astro, Remix, Gatsby) that fires when the strict Wappalyzer rules don't. (3) A clean-up pass on the categorisation: Amazon S3 doesn't end up in cdn (it's object storage), Leadfeeder ends up in marketing instead of analytics (it's a B2B retargeting tool), the generic "Cart Functionality" entry doesn't steal the CMS slot from Shopify, and so on.
What if I have a CSV of domains, not a JSON file? Open the Actor in the Apify Console, paste the column directly into the Start URLs or domains box (one per line). No JSON formatting needed.
What format do my URLs need to be in? Anything reasonable. Bare domain (example.com), www.example.com, https://example.com, deep URLs (https://example.com/blog/post); we strip everything to the homepage and lowercase the host. www. is dropped so www.x.com and x.com collapse into one row; if you want both, pass the subdomain explicitly (e.g. docs.example.com).
What if a detection is wrong? Open an issue on the Apify Console Issues tab with the URL and what you expected to see. Both false positives and false negatives are useful: they help us tune the curation rules.
Is scraping public site metadata legal? Detecting the platform from public HTML is generally permitted: you're reading already-published metadata, the same data Wappalyzer's browser extension reads. You remain responsible for following each target site's Terms of Service and applicable law.
Support
Open an issue on this Actor's Issues tab on the Apify Console. Include the URL, the field that's wrong, and what you expected. Detection bugs are usually fixable by adding a fingerprint or a curation override; those land in the next build.