Company Tech Stack Scraper — BuiltWith Alternative avatar

Company Tech Stack Scraper — BuiltWith Alternative

Pricing

Pay per event

Go to Apify Store
Company Tech Stack Scraper — BuiltWith Alternative

Company Tech Stack Scraper — BuiltWith Alternative

A BuiltWith & Wappalyzer alternative: detect any company's real tech stack (Postgres, Django, AWS, Kubernetes, Snowflake…) from Greenhouse, Lever & Ashby job posts. 110+ frameworks, databases and cloud tools, export to JSON or CSV. B2B sales & recruiter intel — pay-per-result, no SaaS bill.

Pricing

Pay per event

Rating

0.0

(0)

Developer

DevilScrapes

DevilScrapes

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

19 days ago

Last modified

Categories

Share

ATS Tech Stack Detector

Company Tech Stack Scraper — a BuiltWith & Wappalyzer Alternative

▶️ Full tutorial on YouTube

▶️ 45-second demo on YouTube

$5.05 / 1 000 rows  ·  pay only for results  ·  no credit card to try

We do the dirty work so your dataset stays clean. 😈

A free BuiltWith / Wappalyzer alternative — for the back-end stack. Point this company tech stack scraper at any company's Greenhouse, Lever, or Ashby job board and get one structured row per active job, including a deduplicated list of canonical tech names (Postgres, Django, AWS, Kubernetes, Snowflake…) detected directly from the job description.

BuiltWith, Wappalyzer, and TheirStack charge a subscription and sniff front-end signals — JavaScript libraries, tracking pixels, CDN headers. They are blind to the back-end: the databases, queues, cloud, and infra that engineering teams declare in their own job posts. This scraper reads exactly that — the place where companies tell you what they really run — and bills per result instead of per seat.

Sales reps qualifying accounts, recruiters sourcing by skill, and competitive-intelligence analysts mapping vendor footprint all need this signal. Use it as a standalone technographics source, or alongside a front-end tool for full-stack coverage.

⚖️ BuiltWith & Wappalyzer alternative — how it compares

A quick, honest comparison. These tools don't fully overlap — they read different halves of the stack — but if you're paying a subscription for back-end signal, this is the cheaper, more accurate source.

This ActorBuiltWithWappalyzerTheirStack
What it readsBack-end stack from job descriptionsFront-end JS / pixels / headersFront-end JS / pixels / headersJob-post & web signals
Sees Postgres, Kafka, Snowflake, K8s?✅ Yes❌ No❌ No⚠️ Partial
Sees HubSpot, Segment, Cloudflare?❌ No (use a front-end tool)✅ Yes✅ Yes⚠️ Partial
PricingPay-per-result (~$0.30 / 50-job run)SubscriptionSubscriptionSubscription
Login / account required❌ NoAccountAccountAccount
ExportJSON / CSV (Apify dataset)CSV (paid tiers)CSV (paid tiers)CSV / API
Best forBack-end technographics, hiring-signal intelMarketing-stack detectionMarketing-stack detectionBroad firmographics

TL;DR: if you want to know whether a company runs Django + Postgres + AWS (and you don't want a SaaS seat to find out), this is your tool. If you want their marketing stack, pair it with a front-end sniffer.

🎯 What this scrapes

Three ATS platforms, one unified schema:

  1. Greenhouseboards-api.greenhouse.io/v1/boards/{token}/jobs?content=true
  2. Leverapi.lever.co/v0/postings/{token}
  3. Ashbyapi.ashbyhq.com/posting-api/job-board/{token} (token is case-sensitiveRamp, not ramp)

For each active job posting the Actor pulls the title, location, department, full description, and public URL — then runs a case-insensitive word-boundary regex against a curated vocabulary of ~110 canonical tech names. The result is a single detected_techs column you can pivot or filter without any per-company string normalisation.

🔥 Features

  • Three ATSs, one schema — Greenhouse, Lever, and Ashby normalised to an identical row shape.
  • Curated tech vocabulary — ~110 canonical names spanning languages (Python, Go, Rust), frameworks (Django, React, FastAPI), databases (Postgres, MongoDB, Redis, Snowflake), cloud (AWS, GCP, Azure, Vercel), infra (Kubernetes, Terraform, Docker), CI/CD, observability, and ML/data tools.
  • Per-company fault isolation — one bad token (404 from Lever, typo on Ashby) does not abort the run; the other companies still produce data.
  • Pydantic v2 validation — every input and every output row is model-validated before landing in the dataset.
  • Filter knobsmaxJobsPerCompany cap and minTechsDetected floor let you drop generic non-engineering postings.
  • Exponential backoff — 408 / 429 / 503 retries with Retry-After honoured; up to 5 attempts per request.
  • Deterministic detection — regex + vocabulary, not LLM. High precision, zero hallucinated tools.

💡 Use cases

  • B2B sales qualification — enrich every Salesforce or HubSpot account with the company's live back-end stack: "they hire Django + Postgres + AWS" tells you whether to pitch your Postgres-tuning SaaS.
  • Recruiter sourcing — pull every senior backend role from your target accounts and filter by detected_techs to find teams that match your candidate's stack.
  • Competitive intelligence — track which competitors are hiring for Kubernetes or Snowflake and infer their roadmap before it's announced.
  • CRM enrichment — replace TheirStack / BuiltWith / Wappalyzer subscriptions for the segment of buyers who publish their stack in job descriptions anyway.
  • Investment research — map private-company tech footprint over time without a Crunchbase Pro seat.
  • Open-source stack maps — feed a public page that ranks the most-hired-for technologies among YC-backed startups.

⚙️ How to use it

  1. Open the Actor input form on the Apify Store page.
  2. Add one or more {companyToken, atsType} pairs under Companies to scrape (the form prefills with airtable on Greenhouse and Ramp on Ashby).
  3. (Optional) Set Max jobs per company — leave empty for no cap.
  4. (Optional) Set Minimum detected techs2 or 3 drops generic non-engineering postings.
  5. Configure Apify Proxy — leave the default residential proxy group on; we rotate sessions automatically if the ATS endpoint rate-limits.
  6. Click Start. Rows stream into the default dataset as they are produced.

Quick examples

Two engineering-heavy companies, default settings:

{
"companies": [
{ "companyToken": "airtable", "atsType": "greenhouse" },
{ "companyToken": "Ramp", "atsType": "ashby" }
]
}

Single Lever company, drop sales/marketing roles:

{
"companies": [
{ "companyToken": "palantir", "atsType": "lever" }
],
"minTechsDetected": 3
}

Three companies across all three ATSs, capped at 50 jobs each:

{
"companies": [
{ "companyToken": "airtable", "atsType": "greenhouse" },
{ "companyToken": "palantir", "atsType": "lever" },
{ "companyToken": "Ramp", "atsType": "ashby" }
],
"maxJobsPerCompany": 50,
"minTechsDetected": 2
}

📥 Input

ParameterTypeRequiredDefaultDescription
companiesarrayYesList of {companyToken, atsType} objects. atsType must be greenhouse, lever, or ashby.
maxJobsPerCompanyintegerNounlimitedHard cap on jobs fetched per company.
minTechsDetectedintegerNo0Skip rows where detected_techs has fewer items than this value.
proxyConfigurationobjectNoApify ProxyProxy settings. Residential group recommended for rate-limited targets.

How to find a company's ATS token:

  • Greenhouse: the path segment in boards.greenhouse.io/{token} or {token}.applytojob.com. Examples: airtable, figma, stripe.
  • Lever: the path segment in jobs.lever.co/{token}. Examples: palantir, netflix, eventbrite.
  • Ashby: the path segment in jobs.ashbyhq.com/{Token}. Case-sensitive — Ramp, PostHog, Linear work; lowercase variants return zero jobs.

📤 Output

One row per active job posting. Rows are pushed to the default Apify dataset as they are produced — long runs surface results immediately.

FieldTypeDescription
atsstringgreenhouse, lever, or ashby
company_tokenstringBoard slug passed to the ATS
job_idstringATS-canonical job identifier
titlestringJob posting title
locationstring | nullLocation string (Remote, NYC, Berlin…)
departmentstring | nullDepartment or team
urlstringPublic job-post URL
description_textstringPlain-text description (HTML stripped)
detected_techsstring[]Sorted, deduplicated canonical tech names
posted_atstring | nullISO 8601 UTC publication timestamp
scraped_atstringISO 8601 UTC row-creation timestamp

Sample output row:

{
"ats": "greenhouse",
"company_token": "airtable",
"job_id": "4812345",
"title": "Senior Backend Engineer",
"location": "Remote",
"department": "Engineering",
"url": "https://boards.greenhouse.io/airtable/jobs/4812345",
"description_text": "We use Python, Django, PostgreSQL, and AWS...",
"detected_techs": ["AWS", "Django", "Postgres", "Python"],
"posted_at": "2026-05-01T09:00:00Z",
"scraped_at": "2026-06-01T12:00:00Z"
}

🔌 Integrations

The dataset is plain JSON/CSV, so it drops into the usual destinations with zero glue code:

  • Google Sheets / Airtable — export the dataset and import, or push rows via the Apify integration.
  • Make / Zapier / n8n — trigger on run finish, then fan rows into your CRM or warehouse.
  • CRM enrichment (Salesforce / HubSpot) — match on company_token and write detected_techs to a custom field for tech-stack-based lead scoring.
  • Warehouses (Postgres / BigQuery / Snowflake) — one flat row per job; detected_techs stores cleanly as a JSON array or unnested table.
  • API / MCP — pull results programmatically via the Apify API, or wire the Actor into an AI agent over MCP.

🍳 Sample queries & recipes

  • "Find companies hiring for Snowflake" — run a batch of target accounts, then filter rows where detected_techs contains Snowflake. Instant account list for a data-platform pitch.
  • Build a technographic lead list — feed your TAM's ATS tokens, set minTechsDetected: 3, and aggregate detected_techs per company_token into one stack profile per account.
  • Weekly competitor watch — schedule a run over competitor tokens and diff detected_techs week over week to catch new infra (a Kubernetes or Kafka appearing signals a roadmap shift).
  • Recruiter shortlist by stack — pull senior backend roles, filter to companies whose stack matches your candidate's, and pitch the warm fit.

💰 Pricing

EventPriceWhen it fires
actor-start$0.05Once per run (covers warm-up)
result-row$0.005Per job row emitted

A typical 50-job run costs $0.05 + 50 × $0.005 = $0.30 on pay-per-result pricing — no monthly subscription, and the back-end stack is read straight from each job description.

You pay only for rows that land. If every token is misspelled and no jobs are returned, you pay only the $0.05 start fee.

🚧 Limitations

  • Tech detection is regex-based, not LLM-based — it is deterministic and high-precision, but will miss tools not in the curated ~110-name vocabulary. Submit a feedback request to add more terms.
  • ~85% recall — job descriptions that omit or abbreviate tool names will not surface every dependency in a company's stack. This is an inherent limit of job-description parsing, not an Actor bug.
  • Ashby tokens are case-sensitiveRamp works; ramp returns zero jobs. Double-check the exact token from the company's Ashby board URL.
  • Greenhouse double-encoding — Greenhouse wraps its content field in double HTML-encoding (&lt;div&gt; for <div>). The parser unescapes twice before stripping tags; unusual encoding in edge-case boards may still slip through.
  • Lever descriptionPlain gaps — Lever sometimes omits the "Requirements" bullet list from descriptionPlain. The parser concatenates every lists[].content chunk to recover skills listed there; however, lists in non-standard formats may be missed.
  • Default dataset retention — Apify FREE-tier default storage retains datasets for 7 days. Use Actor.open_dataset(name="…") or export immediately if you need to outlive that window.
  • No personal-data extraction — this Actor reads public job posts and company-aggregated tech signals only. It does not extract candidate data or applicant details from any ATS.

❓ FAQ

Q: What makes this different from BuiltWith or Wappalyzer? A: BuiltWith and Wappalyzer sniff front-end signals — JavaScript libraries, tracking pixels, and Cloudflare headers. They are excellent for marketing-stack detection (HubSpot, Marketo, Segment) but blind to back-end infrastructure (Postgres, Kafka, Kubernetes, Snowflake). This ATS tech stack detector reads job descriptions, which is where engineering teams declare their actual data platform and server-side stack. They complement each other; they don't overlap.

Q: Is scraping these ATS job boards allowed? A: Greenhouse, Lever, and Ashby all publish their job-board APIs as official public endpoints — the same ones embedded in company career pages. We read public job post text and return company-level tech signals. No personal data, no applicant details, no authenticated endpoints.

Q: How do I find the token for a company on Greenhouse / Lever / Ashby? A: Visit the company's public jobs page. The token is the path segment: jobs.lever.co/{token}, boards.greenhouse.io/{token}, or jobs.ashbyhq.com/{Token}. Ashby tokens are case-sensitive — copy them exactly as they appear in the URL.

Q: Can I scrape hundreds of companies in one run? A: Yes. Add as many {companyToken, atsType} pairs as you need. One bad token (404 or empty board) does not abort the run — the others still produce data. Use maxJobsPerCompany to cap the volume and keep costs predictable.

Q: What happens when a company rate-limits my requests? A: We handle it. The Actor retries with exponential backoff, rotates proxy sessions on 429s and 503s, and honours Retry-After headers. If a company hits a hard block mid-run, you'll see a set_status_message with a partial count — we never silently return an empty dataset.

Q: Why does the Actor fail loud instead of returning an empty dataset? A: A silent empty result is a lie. If every token is wrong, the run exits non-zero with a clear error message — so your pipeline knows to investigate rather than assuming there were just no open roles.

🗣 Your feedback

If a tech name you care about is missing from the vocabulary, open a feedback ticket on the Actor's Apify Store page — we add new terms in batches.

Found a bug or have a feature request? Use the Store feedback form or contact DevilScrapes at https://apify.com/DevilScrapes.


Related Actors:

  • LLM Pricing Monitor — track live API pricing across OpenAI, Anthropic, Google, Mistral, Groq, Together AI, and DeepSeek in one normalised schema.
  • GitHub Org Scraper — pair tech-stack signals from job descriptions with actual code-repo activity per org.
  • Y Combinator Companies Scraper — combine YC funding signal with ATS-derived tech stack for a complete startup profile.