Tech Stack Detector: BuiltWith Wappalyzer Alt
Pricing
$0.08 / 1,000 url analyzeds
Tech Stack Detector: BuiltWith Wappalyzer Alt
Detects a website's technologies (CMS, server, frameworks, analytics, payments) from its own served HTML, headers, and cookies. Each detection carries the evidence that proved it, plus a confidence. No guessing. Billed only per URL with at least one detection, no start fee.
Pricing
$0.08 / 1,000 url analyzeds
Rating
0.0
(0)
Developer
Pono Data
Maintained by CommunityActor stats
0
Bookmarked
3
Total users
2
Monthly active users
6 days ago
Last modified
Categories
Share
Website Tech Stack Detector
Give it a list of website URLs and get back the technologies each site runs: CMS, web server, frameworks, JavaScript libraries, analytics, tag managers, payments, CDN, and hosting. Every detection comes with the exact on-page signal that proved it and a confidence score.
The rule is simple: no guessing. A technology is reported only when an explicit signal in the site's own served page proves it. If nothing on the page proves a technology, it is not in the results. You are never handed a guess to act on.
What proves a detection
The actor reads each page's own served HTML, HTTP response headers, and cookies, then fingerprints them against a curated, high-precision ruleset. Each detection records which signal matched:
- response headers (for example
Server: nginx,X-Powered-By: PHP/8.2,X-Generator: Drupal,CF-Rayfor Cloudflare,X-Shopify-Stagefor Shopify) - Set-Cookie names (for example
wordpress_logged_in_*,_shopify_*,laravel_session,ASP.NET_SessionId) - the
<meta name="generator">tag (WordPress, Drupal, Joomla, Wix, Squarespace, Ghost, Hugo, and more, with the version when the tag carries one) - script and link URLs (for example
/wp-content/for WordPress,cdn.shopify.comfor Shopify,/_next/staticfor Next.js,gtag/jsfor Google Analytics) and a few unambiguous HTML markers (for exampleng-versionfor Angular)
This is a curated high-precision starter set covering the common, high-signal technologies. It is not an exhaustive fingerprint database, and it favors precision over coverage: a rule that would misfire on a real site is tightened or left out. A version is reported only when it was captured verbatim from the matched signal. Versions are never invented.
Input
- URLs: one per line.
- Respect robots.txt: when on (default), the host's robots.txt is checked and disallowed URLs are skipped.
- Delay between requests to one host: a polite throttle (default 1 second) so pages are never fetched back to back.
- Max delivered URLs: cap on billed rows (0 = no cap).
Output
One row per analyzed URL: url, finalUrl, httpStatus, server, techCount,
categories, and a detections array. Each entry in detections carries
tech, category, evidence (the literal signal), confidence, and source
(which served artifact carried the signal), plus row provenance (sourceUrl,
retrievedAt, confidence, dataSource).
The row confidence is the strongest detection confidence on the page. Each
detection carries its own. Confidence is 1.0 for an explicit version or an
unambiguous vendor string, 0.8 for an unambiguous marker without a version, and
0.6 for a strong heuristic marker that several look-alikes could share.
How it works, and how it stays polite
The actor fetches each page once, with a declared identifying User-Agent. It reads robots.txt first and skips anything disallowed for our agent, keeps a jittered delay between requests to the same host, and treats a refusal (HTTP 401, 403, 429, or 451) as the host declining: it stops requesting that host for the rest of the run rather than retrying. A URL that is blocked, disallowed by robots, fails to fetch, or yields no evidenced detection is written to a separate free dataset and is not billed. A site owner can ask us to skip their domain at https://ponodata.com/opt-out ; opted-out hosts are skipped and never charged.
Billing
Pay only per URL that returns at least one evidenced detection. There is no per-run start fee. URLs that are blocked, disallowed, fail, or return no detection cost nothing.
Sample output
A real run reading each site's own served HTML, headers, and cookies:
| URL | Techs | Categories | Server |
|---|---|---|---|
| https://wordpress.org | 5 | CMS, Web server, Font / icons | nginx |
| https://www.shopify.com | 4 | CDN / proxy, Ecommerce, JavaScript framework | cloudflare |
| https://vercel.com | 3 | Hosting / PaaS, Web framework | Vercel |
| https://getbootstrap.com | 5 | CDN / proxy, Hosting / PaaS, UI framework | cloudflare |
For wordpress.org, one detection reads
{tech: "WordPress 7.1", category: "CMS", evidence: "meta generator: WordPress 7.1-alpha", confidence: 1.0}sourceUrl and view source to verify every signal. Sites that return no
evidenced technology route to the free reject dataset.
See also
More clean, pay-only-for-results data tools from Pono Data:
- URL Metadata & OpenGraph Extractor - page head tags for link previews
- Bulk DNS Lookup - DNS records plus SPF, DMARC, and CAA
- Domain WHOIS via RDAP - registration data, structured from RDAP
Full catalog: https://apify.com/thoob