Tech Stack Detector: BuiltWith Wappalyzer Alt avatar

Tech Stack Detector: BuiltWith Wappalyzer Alt

Pricing

$0.08 / 1,000 url analyzeds

Go to Apify Store
Tech Stack Detector: BuiltWith Wappalyzer Alt

Tech Stack Detector: BuiltWith Wappalyzer Alt

Detects a website's technologies (CMS, server, frameworks, analytics, payments) from its own served HTML, headers, and cookies. Each detection carries the evidence that proved it, plus a confidence. No guessing. Billed only per URL with at least one detection, no start fee.

Pricing

$0.08 / 1,000 url analyzeds

Rating

0.0

(0)

Developer

Pono Data

Pono Data

Maintained by Community

Actor stats

0

Bookmarked

3

Total users

2

Monthly active users

6 days ago

Last modified

Share

Website Tech Stack Detector

Give it a list of website URLs and get back the technologies each site runs: CMS, web server, frameworks, JavaScript libraries, analytics, tag managers, payments, CDN, and hosting. Every detection comes with the exact on-page signal that proved it and a confidence score.

The rule is simple: no guessing. A technology is reported only when an explicit signal in the site's own served page proves it. If nothing on the page proves a technology, it is not in the results. You are never handed a guess to act on.

What proves a detection

The actor reads each page's own served HTML, HTTP response headers, and cookies, then fingerprints them against a curated, high-precision ruleset. Each detection records which signal matched:

  • response headers (for example Server: nginx, X-Powered-By: PHP/8.2, X-Generator: Drupal, CF-Ray for Cloudflare, X-Shopify-Stage for Shopify)
  • Set-Cookie names (for example wordpress_logged_in_*, _shopify_*, laravel_session, ASP.NET_SessionId)
  • the <meta name="generator"> tag (WordPress, Drupal, Joomla, Wix, Squarespace, Ghost, Hugo, and more, with the version when the tag carries one)
  • script and link URLs (for example /wp-content/ for WordPress, cdn.shopify.com for Shopify, /_next/static for Next.js, gtag/js for Google Analytics) and a few unambiguous HTML markers (for example ng-version for Angular)

This is a curated high-precision starter set covering the common, high-signal technologies. It is not an exhaustive fingerprint database, and it favors precision over coverage: a rule that would misfire on a real site is tightened or left out. A version is reported only when it was captured verbatim from the matched signal. Versions are never invented.

Input

  • URLs: one per line.
  • Respect robots.txt: when on (default), the host's robots.txt is checked and disallowed URLs are skipped.
  • Delay between requests to one host: a polite throttle (default 1 second) so pages are never fetched back to back.
  • Max delivered URLs: cap on billed rows (0 = no cap).

Output

One row per analyzed URL: url, finalUrl, httpStatus, server, techCount, categories, and a detections array. Each entry in detections carries tech, category, evidence (the literal signal), confidence, and source (which served artifact carried the signal), plus row provenance (sourceUrl, retrievedAt, confidence, dataSource).

The row confidence is the strongest detection confidence on the page. Each detection carries its own. Confidence is 1.0 for an explicit version or an unambiguous vendor string, 0.8 for an unambiguous marker without a version, and 0.6 for a strong heuristic marker that several look-alikes could share.

How it works, and how it stays polite

The actor fetches each page once, with a declared identifying User-Agent. It reads robots.txt first and skips anything disallowed for our agent, keeps a jittered delay between requests to the same host, and treats a refusal (HTTP 401, 403, 429, or 451) as the host declining: it stops requesting that host for the rest of the run rather than retrying. A URL that is blocked, disallowed by robots, fails to fetch, or yields no evidenced detection is written to a separate free dataset and is not billed. A site owner can ask us to skip their domain at https://ponodata.com/opt-out ; opted-out hosts are skipped and never charged.

Billing

Pay only per URL that returns at least one evidenced detection. There is no per-run start fee. URLs that are blocked, disallowed, fail, or return no detection cost nothing.

Sample output

A real run reading each site's own served HTML, headers, and cookies:

URLTechsCategoriesServer
https://wordpress.org5CMS, Web server, Font / iconsnginx
https://www.shopify.com4CDN / proxy, Ecommerce, JavaScript frameworkcloudflare
https://vercel.com3Hosting / PaaS, Web frameworkVercel
https://getbootstrap.com5CDN / proxy, Hosting / PaaS, UI frameworkcloudflare

For wordpress.org, one detection reads

{tech: "WordPress 7.1", category: "CMS", evidence: "meta generator: WordPress 7.1-alpha", confidence: 1.0}
. Open the sourceUrl and view source to verify every signal. Sites that return no evidenced technology route to the free reject dataset.

See also

More clean, pay-only-for-results data tools from Pono Data:

Full catalog: https://apify.com/thoob