# Website Tech Stack Detector (`saregaa/techdetector-scraper`) Actor

Detect technologies on any website using the real Wappalyzer browser extension via Playwright — not HTTP guessing. Identifies CMS, JS frameworks, analytics, CDN, payments, and 1,000+ more. Built for bulk lead qualification, competitive analysis, and tech market research.

- **URL**: https://apify.com/saregaa/techdetector-scraper.md
- **Developed by:** [Saregaa](https://apify.com/saregaa) (community)
- **Categories:** Lead generation, Developer tools, Automation
- **Stats:** 3 total users, 1 monthly users, 100.0% runs succeeded, NaN bookmarks
- **User rating**: No ratings yet

## Pricing

from $8.00 / 1,000 results

This Actor is paid per event. You are not charged for the Apify platform usage, but only a fixed price for specific events.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-event

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

## 🔍 Website Tech Stack Detector (Wappalyzer Engine)

**High-precision technology stack detection powered by the real Wappalyzer browser extension and Playwright.**  
A free, open-source alternative to paywalled BuiltWith and Wappalyzer APIs — pay only for what you scan.

---

### What it does

This actor launches a real headless Chromium browser with the Wappalyzer extension injected, visits each target URL, and returns a full technology fingerprint — exactly like running Wappalyzer in your own browser, but at scale and via API.

It detects **CMS, JavaScript frameworks, analytics tools, CDN providers, marketing pixels, databases, web servers, ecommerce platforms**, and more — with confidence scores, version numbers where available, and optional security risk flags.

---

### ✨ Key features

- **Real browser fingerprinting** — uses the actual Wappalyzer extension via Playwright, not HTTP header guessing
- **Confidence scoring** — every technology tagged as `high`, `medium`, or `low` confidence
- **Version detection** — captures version strings where Wappalyzer exposes them (e.g. `Next.js 15.1.12`)
- **Category grouping** — results grouped by category (Analytics, CMS, CDN, etc.) for easy filtering
- **Security risk flags** — optional checks for outdated jQuery, CMS without CDN/WAF, forms without CAPTCHA, missing security headers, and exposed `X-Powered-By`
- **Bulk processing** — scan hundreds of URLs in a single run
- **Run summary** — final dataset record with tech distribution stats and cost breakdown
- **Graceful error handling** — failed URLs are logged with error details, run never crashes

---

### 📥 Input

| Field | Type | Default | Description |
|---|---|---|---|
| `urls` | `string[]` | — | List of URLs to scan. Prefix with `https://` or leave bare — normalization is automatic |
| `include_risk` | `boolean` | `true` | Run security risk checks on each result |

**Example input:**
```json
{
  "urls": [
    "https://apify.com",
    "react.dev",
    "https://www.gymshark.com"
  ],
  "include_risk": true
}
````

***

### 🔌 Proxy

**A proxy is required for reliable operation.** The actor uses a single browser instance and rotates requests through datacenter proxies to avoid blocks and rate limits. Datacenter proxies are sufficient — residential proxies are not needed.

You can use:

- **Apify Proxy** (datacenter) — available directly in the actor's proxy settings
- **Your own proxy** — pass via the standard Apify proxy configuration

Without a proxy, many sites will block or rate-limit the scanner after just a few requests.

***

### ⏱️ Performance

The actor runs a **single browser instance** sequentially through the URL list.

| Metric | Value |
|---|---|
| Throughput | ~120 URLs / hour |
| 100 URLs | ~50 min |
| 500 URLs | ~4 hours |
| 1,000 URLs | ~8 hours |

For large batches, consider splitting across multiple actor runs.

***

### 📤 Output

Each scanned URL produces one JSON record in the dataset:

```json
{
  "url": "https://apify.com",
  "url_normalized": "apify.com",
  "scanned_at": "2026-05-26T17:02:05+00:00",
  "fetch_method": "wappalyzer_playwright",
  "status_code": 200,
  "technologies": [
    {
      "name": "Next.js",
      "category": "JavaScript frameworks",
      "confidence": "high",
      "version": "16.2.6",
      "detected_by": ["wappalyzer_browser_extension"]
    }
  ],
  "categories": {
    "JavaScript frameworks": ["React", "Next.js"],
    "Analytics": ["Google Analytics", "Microsoft Clarity"]
  },
  "risk_flags": [],
  "tech_count": 22,
  "error": null
}
```

The last record in every dataset is a `run_summary` with aggregated stats, top technology distribution across all scanned sites, and estimated cost breakdown.

#### Output fields

| Field | Description |
|---|---|
| `url` | Original input URL |
| `url_normalized` | Hostname only, for deduplication |
| `scanned_at` | ISO 8601 timestamp |
| `fetch_method` | Always `wappalyzer_playwright` |
| `status_code` | `200` if technologies detected, `0` on failure |
| `technologies` | Array of detected tech objects with name, category, confidence, version |
| `categories` | Technologies grouped by category |
| `http_headers` | Security-relevant headers (server, CSP, HSTS, etc.) |
| `risk_flags` | Array of security risk objects — see below |
| `tech_count` | Total number of detected technologies |
| `error` | `null` on success, error object on failure |

***

### 🛡️ Security risk flags

When `include_risk: true`, the actor runs lightweight security checks and adds flags to each result:

| Code | Level | Trigger |
|---|---|---|
| `CMS_WITHOUT_CDN` | medium | WordPress/Drupal/Joomla detected without Cloudflare, Fastly, Akamai, or similar |
| `OUTDATED_JQUERY` | medium | jQuery version < 3.x (known XSS vectors) |
| `JQUERY_ON_ECOMMERCE` | low | jQuery detected on a shop with unconfirmed version |
| `FORMS_WITHOUT_CAPTCHA` | low | Gravity Forms / Typeform / Formstack without reCAPTCHA or hCaptcha |
| `EXPOSED_X_POWERED_BY` | low | `X-Powered-By` header reveals PHP/ASP.NET/Express version |
| `MISSING_SECURITY_HEADERS` | low | Two or more of CSP, X-Frame-Options, X-Content-Type-Options, HSTS are absent |

***

### 💡 Use cases

- **Lead enrichment & sales prospecting** — identify which CRM, chat, or marketing stack a prospect uses before outreach
- **Competitive analysis** — benchmark your tech choices against competitors or industry peers
- **Market research** — map technology adoption across a list of domains
- **Security audits** — surface missing headers and outdated dependencies at scale
- **Tech stack migrations** — inventory what needs replacing before a platform switch

***

### ⚖️ How it compares

| | This actor | nexgendata/wappalyzer-replacement | misterkhan/tech-stack-scanner | Wappalyzer API | BuiltWith |
|---|---|---|---|---|---|
| **Detection engine** | Real browser + extension | OSS fingerprint rules (HTTP) | Multi-tier HTTP | Closed-source | Proprietary |
| **Browser rendering** | ✅ Yes | ❌ No | ❌ No | ✅ Yes | ✅ Yes |
| **Confidence scores** | ✅ | ❌ | ✅ | ✅ | ✅ |
| **Version detection** | ✅ | Partial | ✅ | ✅ | ✅ |
| **Security risk flags** | ✅ Built-in | ❌ | ❌ | ❌ | ❌ |
| **Price / 1,000 URLs** | ~$8 | ~$10 | ~$5 | $250/mo cap | $500+/mo |
| **Open source** | ✅ | ✅ | ❌ | ❌ | ❌ |

> The key differentiator: HTTP-only detectors miss technologies that are injected client-side (analytics pixels, chat widgets, A/B testing tools). A real browser with the Wappalyzer extension catches them reliably.

***

### 💰 Pricing

This actor uses **pay-per-result** pricing on the Apify platform — no subscription, no minimums.

| Volume | Estimated cost |
|---|---|
| 100 URLs | ~$0.84 |
| 1,000 URLs | ~$8.04 |
| 10,000 URLs | ~$80.04 |

Costs include the Apify platform run fee ($0.035/run) plus $0.008 per scanned URL.

New to Apify? The free tier includes enough credits to scan ~60 URLs before any payment is required.

***

### 🔧 Tips for best results

- **Always configure a proxy** — datacenter proxies are sufficient, residential are not required
- **Include `https://`** in URLs for fastest processing (normalization adds a small overhead)
- **Large batches (500+ URLs)** run best split across multiple actor runs for better reliability
- **SPAs and JS-heavy sites** (React, Next.js, Angular) are where this actor shines over HTTP-only alternatives — the real browser executes all client-side code before fingerprinting

***

### 📋 Example results

**Detected on `https://apify.com`** (22 technologies):
React, Next.js 16.2.6, Algolia, styled-components, Turbopack, Sentry, Amazon CloudFront, Amazon S3, Google Analytics, Google Tag Manager, Intercom, HubSpot, Segment, Microsoft Clarity, LinkedIn Insight Tag, and more.

**Detected on `https://www.whitehouse.gov`** (13 technologies):
WordPress, MySQL, PHP, Nginx, Yoast SEO, Google Analytics, Google Tag Manager, PWA, Preact, Parse.ly — plus a `CMS_WITHOUT_CDN` risk flag.

**Detected on `https://www.gymshark.com`** (14 technologies):
Shopify, Next.js 15.5.18, React, Algolia, Amazon CloudFront, Intercom, Braze, Datadog, mParticle, LinkedIn Insight Tag.

***

### 📄 License

Actor code is open source. The Wappalyzer fingerprint ruleset is MIT-licensed via the community fork. Output data is yours to use commercially — check the target site's Terms of Service for any scraping restrictions.

# Actor input Schema

## `urls` (type: `string`):

Paste your list of domains here. Place each website URL on a new line.

## `include_risk` (type: `boolean`):

If enabled, performs basic security configuration checks (outdated jQuery, CMS without CDN, form tools missing captchas).

## `proxyConfiguration` (type: `object`):

Select proxies to be used by the scanner. It is highly recommended to use Datacenter proxies for cost-efficiency.

## Actor input object example

```json
{
  "urls": "https://react.dev\nhttps://news.ycombinator.com\nhttps://www.whitehouse.gov\nhttps://www.gymshark.com",
  "include_risk": true,
  "proxyConfiguration": {
    "useApifyProxy": true
  }
}
```

# Actor output Schema

## `results` (type: `string`):

URL string pointing to the default dataset items containing technology stack analysis.

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {
    "urls": `https://react.dev
https://news.ycombinator.com
https://www.whitehouse.gov
https://www.gymshark.com`,
    "proxyConfiguration": {
        "useApifyProxy": true
    }
};

// Run the Actor and wait for it to finish
const run = await client.actor("saregaa/techdetector-scraper").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = {
    "urls": """https://react.dev
https://news.ycombinator.com
https://www.whitehouse.gov
https://www.gymshark.com""",
    "proxyConfiguration": { "useApifyProxy": True },
}

# Run the Actor and wait for it to finish
run = client.actor("saregaa/techdetector-scraper").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{
  "urls": "https://react.dev\\nhttps://news.ycombinator.com\\nhttps://www.whitehouse.gov\\nhttps://www.gymshark.com",
  "proxyConfiguration": {
    "useApifyProxy": true
  }
}' |
apify call saregaa/techdetector-scraper --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=saregaa/techdetector-scraper",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "Website Tech Stack Detector",
        "description": "Detect technologies on any website using the real Wappalyzer browser extension via Playwright — not HTTP guessing. Identifies CMS, JS frameworks, analytics, CDN, payments, and 1,000+ more. Built for bulk lead qualification, competitive analysis, and tech market research.",
        "version": "0.0",
        "x-build-id": "xY5m8TGf86GYHKHdi"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/saregaa~techdetector-scraper/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-saregaa-techdetector-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/saregaa~techdetector-scraper/runs": {
            "post": {
                "operationId": "runs-sync-saregaa-techdetector-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/saregaa~techdetector-scraper/run-sync": {
            "post": {
                "operationId": "run-sync-saregaa-techdetector-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "required": [
                    "urls"
                ],
                "properties": {
                    "urls": {
                        "title": "Target URLs (One per line)",
                        "type": "string",
                        "description": "Paste your list of domains here. Place each website URL on a new line."
                    },
                    "include_risk": {
                        "title": "Evaluate Security Risk Flags",
                        "type": "boolean",
                        "description": "If enabled, performs basic security configuration checks (outdated jQuery, CMS without CDN, form tools missing captchas).",
                        "default": true
                    },
                    "proxyConfiguration": {
                        "title": "Proxy configuration",
                        "type": "object",
                        "description": "Select proxies to be used by the scanner. It is highly recommended to use Datacenter proxies for cost-efficiency."
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
