Pricing

Pay per usage

Tech Stack Detector — Frameworks, CMS, Analytics, JSON Out

Competitor tech stack as CSV/JSON in 2 min — frameworks, CMS, analytics, CDN, servers, trackers. No Wappalyzer seat fee, no BuiltWith cap. 19 runs. Backed by 951-run Trustpilot flagship + 31-actor portfolio. spinov001@gmail.com · blog.spinov.online · t.me/scraping_ai

Pricing

Pay per usage

Rating

0.0

(0)

Developer

Alex

Actor stats

Bookmarked

Total users

Monthly active users

3 months ago

Last modified

Website Tech-Stack Detector — Detect ~60 Web Technologies (CMS, Frameworks, Analytics, Hosting, Payment) from URL via Static HTML+Header Patterns

Scan a list of URLs and return what each site is built with. Pattern-matches HTML body and HTTP response headers against ~60 fingerprints across 10 categories. Static analysis only (no JS rendering) — fast and cheap.

What It Does

Fetches each URL with Cheerio (static HTML, no headless browser)
Pattern-matches HTML + response headers against the technology dictionary
Returns one flat record per URL with detected technologies, categorized

Honest Disclosure (read this first)

This actor is intentionally lean — pure static fingerprinting. It does not do the things a Wappalyzer Pro or BuiltWith subscription does. Specifically:

No JS rendering. Cheerio reads HTML as delivered. SPA-heavy sites that hydrate clientside (most React-only apps without SSR) will appear emptier than they are. If you need JS-rendered detection, this actor is not your tool.
No version detection. Output lists ["React", "Next.js"], NOT [{name:"React", version:"18.2"}]. Versions require deeper parsing this actor does not implement.
No confidence scoring. A pattern matches or it doesn't. Treat as binary.
No GA/GTM ID extraction. Output reports presence of "Google Analytics" / "Google Tag Manager", not the property IDs.
No SSL/TLS inspection. For TLS data, use the SSL Certificate Checker actor.
No script-domain enumeration. Output reports scriptsCount (a number), not the list of third-party domains.
Fingerprint dictionary is hardcoded in the actor source (60 entries verified against src/main.js). It updates when the actor is rebuilt — there is no live remote registry.
⚠️ False-positive risk on unanchored regex. Patterns like /react|__react/i, /vue\.js|data-v-/i, /angular/i match anywhere in the HTML, including marketing copy ("our team uses React", "Vue.js Conference 2024 sponsor"). A site that mentions React in a blog post but doesn't actually use it will be flagged. Treat this dictionary as a screening tool, not a forensic verdict.
⚠️ Known dictionary bug — WordPress header check. The WordPress entry (src/main.js line 12) sets headers: ['x-powered-by: Express'] — this is the Express.js (Node) X-Powered-By value, not a WordPress header. As written, an Express.js site WILL be flagged as WordPress via the header path. The HTML path (/wp-content\//, /wp-includes\//, /wp-json\//) is correct and is how WordPress is reliably detected — but if you process headerless responses or HTML-stripped payloads, expect a false-positive. We're not auto-correcting this in production until we audit the impact on existing 16 runs.

Detected Technologies (~60 total, 10 categories)

Category	Examples
CMS (8)	WordPress, Shopify, Wix, Squarespace, Webflow, Drupal, Joomla, Ghost
Framework (9)	React, Next.js, Vue.js, Nuxt.js, Angular, Svelte, Gatsby, Remix, Astro
Analytics (10)	Google Analytics, Google Tag Manager, Facebook Pixel, Hotjar, Mixpanel, Amplitude, Segment, Plausible, PostHog, Clarity
Hosting / CDN (7)	Vercel, Netlify, Cloudflare, AWS, Google Cloud, Fastly, Akamai
Payment (4)	Stripe, PayPal, Paddle, LemonSqueezy
Chat / Support (6)	Intercom, Crisp, Drift, Zendesk, HubSpot, Tawk.to
Email (3)	Mailchimp, ConvertKit, SendGrid
E-commerce (3)	WooCommerce, BigCommerce, Magento
Security (3)	reCAPTCHA, hCaptcha, Cloudflare Turnstile
Other (7)	jQuery, Bootstrap, Tailwind CSS, Font Awesome, Google Fonts, Sentry, LaunchDarkly

The full list is in src/main.js → techPatterns. Pull requests welcome to add fingerprints.

Input

{
  "urls": ["https://stripe.com", "https://shopify.com"]
}

Parameter	Type	Required	Description
`urls`	array	Yes	URLs to scan. Strings without protocol get `https://` prepended.

That's the entire input. There is no includeCategories, no renderJs, no followRedirects, no timeoutMs. The actor uses Crawlee defaults (30s request timeout, max concurrency 10, follows HTTP redirects automatically, 3 internal retries on transient errors).

Output Schema (one record per URL)

Field	Type	Description
`url`	string	The URL that was scanned
`title`	string	`<title>` tag content
`description`	string	`<meta name="description">` (truncated to 200 chars)
`generator`	string	`<meta name="generator">` value if present
`server`	string\|null	`Server` HTTP header
`poweredBy`	string\|null	`X-Powered-By` HTTP header
`technologiesCount`	int	Count of detected technologies
`technologies`	string[]	Flat list, e.g. `["React","Next.js","Google Analytics","Cloudflare"]`
`categories.cms`	string[]	Detected CMS names
`categories.framework`	string[]	Detected frameworks
`categories.analytics`	string[]	Detected analytics platforms
`categories.hosting`	string[]	Detected hosting/CDN
`categories.payment`	string[]	Detected payment processors
`categories.chat`	string[]	Detected chat widgets
`categories.email`	string[]	Detected email tools
`categories.ecommerce`	string[]	Detected e-commerce platforms
`categories.security`	string[]	Detected captcha/security
`categories.other`	string[]	Anything else (jQuery, Tailwind, fonts, Sentry, LaunchDarkly)
`scriptsCount`	int	Count of `<script src="">` tags only — inline `<script>` blocks NOT counted
`stylesheetsCount`	int	Count of `<link rel="stylesheet">` tags only — inline `<style>` NOT counted
`ogImage`	string	`<meta property="og:image">` content
`scrapedAt`	ISO timestamp	When the URL was scanned

Output Example

{
  "url": "https://stripe.com",
  "title": "Stripe | Financial infrastructure for the internet",
  "description": "Millions of companies use Stripe to ...",
  "generator": "",
  "server": "cloudflare",
  "poweredBy": null,
  "technologiesCount": 7,
  "technologies": ["Next.js","React","Stripe","Google Analytics","Google Tag Manager","Cloudflare","Google Fonts"],
  "categories": {
    "cms": [],
    "framework": ["Next.js","React"],
    "analytics": ["Google Analytics","Google Tag Manager"],
    "hosting": ["Cloudflare"],
    "payment": ["Stripe"],
    "chat": [],
    "email": [],
    "ecommerce": [],
    "security": [],
    "other": ["Google Fonts"]
  },
  "scriptsCount": 14,
  "stylesheetsCount": 3,
  "ogImage": "https://stripe.com/img/v3/home/og-image.png",
  "scrapedAt": "2026-04-29T11:50:00.000Z"
}

Python Usage

from apify_client import ApifyClient
from collections import Counter

client = ApifyClient("YOUR_APIFY_TOKEN")

run = client.actor("knotless_cadence/website-tech-stack-detector").call(
    run_input={"urls": [
        "https://stripe.com",
        "https://shopify.com",
        "https://notion.so",
    ]}
)
records = list(client.dataset(run["defaultDatasetId"]).iterate_items())

# Tally analytics adoption across the list
analytics = Counter()
for r in records:
    for a in r["categories"]["analytics"]:
        analytics[a] += 1

for name, count in analytics.most_common():
    print(f"{count:3d}  {name}")

When This Is The Right Tool

Quick prospect filtering — scan 100-500 B2B prospect URLs to filter by tech (e.g. "Shopify users" or "Segment users"). Static HTML usually carries enough signal for this.
Competitive ecosystem mapping — detect which sites in a sector use a specific integration.
Cheap dataset enrichment — when you need "what tech does this site use" as a column in a CSV.

When To Use Something Else

Sites with no SSR (pure SPA shells) — output will be near-empty. Use a JS-rendering actor.
You need versions or confidence scores — use Wappalyzer's API or BuiltWith.
You need GA property IDs / GTM container IDs — those require deeper parsing this actor does not do.
You need TLS / SSL data — use SSL Certificate Checker.
You need forensic-grade detection (no false-positives on copy) — use Wappalyzer or BuiltWith. This actor's regex patterns are unanchored.

Actor	Returns
SSL Certificate Checker	TLS expiry, SAN list, signature algorithm
WHOIS Domain Lookup	Domain registration metadata (RDAP)
IP Geolocation Lookup	Country, ISP, ASN
Website Screenshot Scraper	PNG capture for any URL

Browse all 31 published actors → apify.com/knotless_cadence

Apify-as-a-Service — when you need data, not infrastructure

Tier	Price	What you get
Pilot	$97	1 custom actor, basic config, 7-day support
Standard	$297	Custom actor + Slack/email alerts, 30-day support
Premium	$797	Custom actor + dashboard + 90-day support + 1 modification round

A custom build can deliver: a corrected WordPress header pattern, anchored regex (lower false-positive rate), JS-rendering via Playwright, version detection via specific JS-bundle parsing, GA/GTM ID extraction, third-party domain enumeration.

Email: spinov001@gmail.com — drop your specs, schema, or target URLs.

Proof of work: 31 published actors (78 total in portfolio) — 949+ Trustpilot runs, 80+ Reddit, 25+ HN. Recently delivered a paid 3-article series for a client in the proxy industry ($150).

More tips: t.me/scraping_ai · blog.spinov.online

Honest disclosure: 60 fingerprints across 10 categories. WordPress header pattern is incorrectly set to Express.js's x-powered-by value — HTML detection still works, header-only false-positive risk on Express sites. Regex patterns are unanchored — marketing-copy mentions of frameworks can trigger false positives. Static HTML only, no JS rendering, no version detection, no confidence scoring. Crawlee defaults: 30s timeout, concurrency 10, 3 retries on transients.

Google Maps Scraper — Reviews, Contacts & Leads [No API Key]

knotless_cadence/google-maps-scraper-pro

18 runs. Google Maps: name, address, phone, site, category, rating, reviews, hours, GPS, place-ID. CSV/JSON, no key. Local-biz prospecting + competitor scout + territory mapping. Backed by 951-run Trustpilot flagship + 31-actor portfolio. spinov001@gmail.com · blog.spinov.online · t.me/scraping_ai

Alex

Website Screenshot — Full Pages, Any Resolution, PNG, No Limits

knotless_cadence/website-screenshot-scraper

20 runs. Website screenshots as PNG/JPG/PDF in 2 min — full-page, desktop + mobile, custom viewport, bulk URL input. Backed by 951-run Trustpilot flagship + 31-actor portfolio. For competitor visual tracking + UX research. spinov001@gmail.com · blog.spinov.online · t.me/scraping_ai

Alex

ArXiv Paper Scraper — Search by Category, Bulk JSON, DOI

knotless_cadence/arxiv-paper-scraper

arXiv corpus as JSON — arxivId, title, authors, abstract, categories, dates, DOI, PDF URL. By search OR category. Built for ML/AI training data + lit reviews. 19 runs. Backed by 951-run Trustpilot flagship + 31-actor portfolio. spinov001@gmail.com · blog.spinov.online · t.me/scraping_ai

Alex

IMDb Scraper — Ratings, Cast, Genres, JSON/CSV, No Key

knotless_cadence/imdb-movie-scraper

16 runs. Backed by 951-run Trustpilot flagship + 31-actor portfolio. IMDb titles in JSON/CSV — title, imdbId, type, genres, actors, directors, rating. Bulk by ID or search. No API key. For streaming intel + licensing + recommender training. spinov001@gmail.com · blog.spinov.online · t.me/scraping_ai

Alex

GitHub Profile — Repos, Stars, Activity, CSV, No Token, Bulk

knotless_cadence/github-profile-scraper

21 runs. GitHub user intel in CSV/JSON — repos, stars, followers, contribs, languages, bio, email. No API token, no rate blocks. Backed by 951-run Trustpilot flagship + 31-actor portfolio. For recruiter outreach + talent mapping. spinov001@gmail.com · blog.spinov.online · t.me/scraping_ai

Alex

GitHub Trending — CSV Stars, Topics by Period, No Token

knotless_cadence/github-trending-scraper

20 runs. GitHub Trending repos in CSV/JSON — owner, name, url, language, stars, topics. Daily/weekly/monthly + lang filter, no token. Backed by 951-run Trustpilot flagship + 31-actor portfolio. For OSS scouting + VC dealflow. spinov001@gmail.com · blog.spinov.online · t.me/scraping_ai

Alex

Yelp Scraper — Reviews, Ratings, Contacts, CSV, No API Key

knotless_cadence/yelp-business-scraper

Yelp business leads CSV/JSON — name, address, phone, website, rating, reviews, categories by keyword+city. No paid API, no copy-paste. 17 runs. Backed by 951-run Trustpilot flagship + 31-actor portfolio. For local-biz prospecting + SMB lead-gen. spinov001@gmail.com · blog.spinov.online

Alex

Walmart Reviews Scraper — Product Reviews to CSV/JSON in 2 min

knotless_cadence/walmart-reviews-scraper

25 runs / u7d=1 fresh signal. Backed by 971-run Trustpilot flagship + 32-actor portfolio (2190 lifetime runs). Walmart reviews → CSV/JSON. Bypasses 100-review UI cap. 17 fields: stars, text, author, date, helpful, images. spinov001@gmail.com · blog.spinov.online · t.me/scraping_ai

Alex

IP Geolocation — Country, City, ISP, CSV, No API Key, Bulk

knotless_cadence/ip-geolocation-lookup

20 runs. IP intel as CSV/JSON — country, region, city, ISP, ASN, timezone, lat/lon, isMobile/isProxy flags. Accepts IPs + domains. Backed by 951-run Trustpilot flagship + 31-actor portfolio. For fraud + ad-targeting + GDPR audits. spinov001@gmail.com · blog.spinov.online · t.me/scraping_ai

Alex

MCP Trend Detector — Market Trend Signals, JSON, No API Key

knotless_cadence/mcp-trend-detector

Trending topics across Reddit/HN/Google News in real time. MCP-native for Claude/ChatGPT agents. Backed by 971-run Trustpilot flagship · 32 public actors · 79-actor portfolio · paid work live: dev.to/0012303. spinov001@gmail.com · blog.spinov.online · t.me/scraping_ai

Alex