Pricing

from $3.00 / 1,000 results

BuiltIn.com Tech Companies & Tech Stack Scraper

Built In scraper & API: export tech companies and startups by city and industry — company name, description, website, industry, size, funding stage, tech stack and open jobs. Tech-company intelligence and B2B lead generation — fast, no login.

Pricing

from $3.00 / 1,000 results

Rating

0.0

(0)

Developer

Haketa

Actor stats

Bookmarked

Total users

Monthly active users

7 days ago

Last modified

BuiltIn.com Tech Companies & Tech Stack Scraper — Category-Tagged Technology Fingerprints for Every Startup on Built In

The fastest way to extract category-labelled tech stacks (LANGUAGES / FRAMEWORKS / DATABASES / DEVOPS / CLOUD SERVICES / SALESFORCE) from every company profile on builtin.com. Pull the unique Technology We Use fingerprint that LinkedIn, Crunchbase, AngelList and PitchBook do not publish — and turn it into a B2B SaaS prospecting list, recruiter pipeline, or competitive-vendor landscape in minutes. HTTP-only, no browser, no login.

What This Actor Does

The BuiltIn.com Tech Companies & Tech Stack Scraper is a production-grade Apify Actor that turns any public Built In company profile (builtin.com/company/{slug}) into a clean, structured JSON record — including the single most valuable field Built In publishes that no other major company database carries in structured form: the per-company Technology We Use panel, tagged by category (LANGUAGES, FRAMEWORKS, DATABASES, DEVOPS, CLOUD SERVICES, SALESFORCE, ANALYTICS, DESIGN, etc.).

Built In is a tech-focused career platform used by thousands of startups, growth-stage scale-ups, and public tech companies to recruit engineering talent. To attract developers, these companies voluntarily publish a granular breakdown of the languages, frameworks, databases, cloud providers, observability stacks, and design tools they actually run in production — the kind of fingerprint that competitive-intelligence platforms typically gate behind $30,000-a-year contracts.

This actor extracts that fingerprint at HTTP speed (~1–2 seconds per company) so you can build:

B2B SaaS prospect lists segmented by exact technology adoption ("every Built In company using MongoDB but not Snowflake")
Recruiter pipelines filtered by stack ("every startup running PyTorch + AWS + Kubernetes")
Competitive vendor-landscape maps ("Datadog vs New Relic vs Honeycomb adoption across the Built In ecosystem")
Investor portfolio dashboards tracking technical maturity signals (DevOps + CI/CD adoption == scaling)
Conference / sponsor outreach lists targeting users of a specific category-tagged technology

Every record returned includes the company's identity, industry, location, employee band, founded year, multi-office presence, full perks list, social links, recently posted jobs, and the full category-grouped tech stack — ready to drop into Postgres, Snowflake, BigQuery, Salesforce, HubSpot, or Google Sheets.

Entity types returned per company

Identity — name, slug, profileUrl, website, logoUrl, description
Firmographics — industry, industries[], location, offices[], employeeCount, foundedYear
Tech stack — techStack[] (flat list with {name, category, iconUrl}), techStackByCategory (grouped object: { LANGUAGES: [...], FRAMEWORKS: [...], DATABASES: [...] }), techStackSize
Talent signals — jobs[] (recent postings: title, jobId, slug, jobUrl), jobCount, perks[]
Social presence — socialLinks[] (LinkedIn, Twitter/X, Facebook, Instagram, YouTube, TikTok)
Provenance — sourceUrl, scrapedAt

Why scrape Built In yourself when this exists?

Built In's public HTML looks deceptively simple — until you actually try to parse 500 company profiles in a row. Teams who try the DIY route consistently hit the same wall:

The Technology We Use panel is rendered inside repeating <div class="tech-icon-container"> blocks with no JSON-LD or microdata fallback — every tech item is an <img alt="..."> plus two sibling <div> elements, one for the tech name and one for the UPPERCASE category label. Naïve scrapers grab the names and lose the category entirely.
Category labels are positional, not semantic — you cannot rely on a CSS class to know whether MongoDB belongs to DATABASES or to LANGUAGES; you have to walk the sibling structure and detect "is this child an UPPERCASE category vs a tech name?" logic.
Tabs split the stack by department (Engineering / Data / Design / Marketing) — a single-pass scraper that doesn't traverse all tab containers will under-count tech stack by 40–70%.
Industry, employee count, founded year, and HQ live inside a fact list with inconsistent labels — sometimes Total Employees, sometimes Team Size, sometimes Employees; same for Headquarters vs Location vs HQ.
Multi-office companies (Stripe, Snowflake, MongoDB) list 3–10 office locations inside <address> tags scattered across the page — manual parsers usually capture only the first.
Job links use the /job/{slug}/{numericId} pattern, and the same job link can appear multiple times on the page (hero banner + recent jobs list + footer) — deduplication by jobId is required.
Social links are mixed into a generic outbound-anchor blob alongside affiliate pixels, support links, and Built In's own internal https://builtin.com/... URLs — you need a filter on (facebook|linkedin|twitter|x|youtube|instagram|tiktok).com and an explicit exclusion of the Built In domain.
404 slugs return a styled "company not found" page (which still returns HTTP 200 in some cases) — your parser must validate that an <h1> and a tech-icon-container actually exist before pushing a row.
Header generation matters — Built In aggressively serves a minimalist fallback page to bot-fingerprinted requests (no User-Agent rotation, no Accept-Language, no Sec-CH-UA hints).
Politeness matters more — 50+ requests per minute from a single IP will trigger a soft block within a few minutes.

This actor solves all of that: realistic browser headers via got-scraping, 3-attempt exponential-backoff retries, polite per-request delay + concurrency limiter, full category-grouped tech-stack normalization, dedup by jobId, social-link filter, and Actor.fail() when zero records come back so you never silently ship an empty dataset to your downstream warehouse.

Quick Start

One-Click Run

Click "Try for free" on the Apify Store page
Add a handful of company slugs (e.g. mongodb, datadog, stripe, plaid)
Hit Start — typical run is under 30 seconds for 10 companies
Download the dataset as JSON, CSV, Excel, HTML, XML, or RSS directly from the Apify dataset view

API Run (Python)

from apify_client import ApifyClient

client = ApifyClient("YOUR_APIFY_TOKEN")

run = client.actor("haketa/builtin-tech-companies-scraper").call(run_input={
    "companySlugs": ["mongodb", "datadog", "stripe", "snowflake-computing-inc", "plaid"],
    "scrapeJobs": True,
    "maxRecords": 100,
    "requestDelay": 1000,
    "maxConcurrency": 3
})

for company in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(company["name"], "→", company["techStackSize"], "tech items")
    if company.get("techStackByCategory"):
        for category, items in company["techStackByCategory"].items():
            print(f"  {category}: {', '.join(items)}")

API Run (Node.js / TypeScript)

import { ApifyClient } from 'apify-client';

const client = new ApifyClient({ token: 'YOUR_APIFY_TOKEN' });

const run = await client.actor('haketa/builtin-tech-companies-scraper').call({
    startUrls: [
        { url: 'https://builtin.com/company/mongodb' },
        { url: 'https://builtin.com/company/datadog' }
    ],
    scrapeJobs: true,
    maxRecords: 50
});

const { items } = await client.dataset(run.defaultDatasetId).listItems();

// Find every company running Kubernetes
const k8sUsers = items.filter(c =>
    c.techStack?.some(t => t.name === 'Kubernetes')
);
console.log(`${k8sUsers.length} companies on Kubernetes`);

API Run (cURL)

curl -X POST "https://api.apify.com/v2/acts/haketa~builtin-tech-companies-scraper/runs?token=YOUR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "companySlugs": ["mongodb", "datadog", "stripe"],
    "scrapeJobs": true,
    "maxRecords": 100
  }'

How It Works

Built In serves fully server-rendered HTML for every /company/{slug} URL — no SPA, no GraphQL, no client-side hydration required to see the data. That makes this actor HTTP-only via got-scraping + cheerio, which keeps cost and runtime an order of magnitude lower than browser-based equivalents.

Source endpoints

Source URL pattern	Purpose	Example
`https://builtin.com/company/{slug}`	Single company profile page	`https://builtin.com/company/mongodb`
`https://builtin.com/job/{slug}/{numericId}`	Individual job posting (extracted as link only)	`https://builtin.com/job/senior-backend-engineer/123456`

Architecture

Direct HTTPS GET with realistic Chrome 120+ desktop headers (User-Agent, Accept-Language, Sec-CH-UA), generated per request by got-scraping's header generator
No headless browser — no Puppeteer, no Playwright, no Chrome — keeps runtime ~1–2 seconds per company and memory < 256 MB
Cheerio HTML parsing with targeted selectors per data section
3-attempt retry with exponential backoff + jitter (2s × attempt + random 0–1500ms) on HTTP failures or thin-body responses
Polite request pacing — configurable requestDelay (default 1000 ms) per fetch + jitter to avoid burst-pattern detection
Concurrency limiter — async worker pool (maxConcurrency, default 3) processes the slug queue in parallel without overwhelming Built In
Tech stack normalizer — walks .tech-icon-container blocks, extracts each <img alt="..."> as the tech name, finds the sibling UPPERCASE <div> as the category, and emits both a flat techStack[] array and a grouped techStackByCategory object
Job deduplication — multiple appearances of the same /job/{slug}/{id} link on a page collapse to a single entry keyed on jobId
Social-link filter — matches (facebook|instagram|twitter|x|linkedin|youtube|tiktok).com while excluding any builtin.com URL
Office multi-extraction — captures every <address> element within the Offices section so multi-location companies (Stripe, Snowflake, MongoDB) come through fully
Hard fail on empty — Actor.fail() triggers if zero rows are written, so downstream pipelines never silently consume an empty dataset

Proxy

Proxy is optional and disabled by default. Built In's anti-bot is light enough that direct datacenter IPs work for moderate loads. If you scale to thousands of companies per run or your IP ends up rate-limited, enable Apify Residential US through the standard proxy configuration block.

Input Parameters

{
  "startUrls": [
    { "url": "https://builtin.com/company/mongodb" },
    { "url": "https://builtin.com/company/datadog" }
  ],
  "companySlugs": ["stripe", "snowflake-computing-inc", "plaid"],
  "scrapeJobs": true,
  "maxRecords": 100,
  "requestDelay": 1000,
  "maxConcurrency": 3,
  "proxyConfiguration": { "useApifyProxy": false }
}

Parameter reference

Parameter	Type	Default	Description
`startUrls`	`array<object\|string>`	`[]`	Paste any Built In company URL — `https://builtin.com/company/{slug}`. When provided, overrides `companySlugs`. Mix-and-match is allowed; the actor parses the slug out of each URL.
`companySlugs`	`array<string>`	`["mongodb"]`	Built In company URL slugs. Examples: `mongodb`, `datadog`, `stripe`, `snowflake-computing-inc`, `plaid`. Each runs as a separate task. Slugs are normalized to lower-case and stripped of any leading `/company/` prefix.
`scrapeJobs`	`boolean`	`true`	When `true`, the parser extracts up to 30 unique recent job postings per company (title, jobId, slug, jobUrl). Disable to shave a small amount of parsing time when you only care about firmographics + tech stack.
`maxRecords`	`integer`	`100`	Hard cap on total companies saved. Set `0` for unlimited. Useful for sampling.
`requestDelay`	`integer` (ms)	`1000`	Delay between company-page fetches (plus 0–500 ms jitter). 800–2000 ms is the polite zone.
`maxConcurrency`	`integer`	`3`	Parallel company-page fetches. 2–4 is safe; values > 5 risk triggering soft blocks.
`proxyConfiguration`	`object`	`{ "useApifyProxy": false }`	Optional. Built In has light anti-bot — proxy is generally not required for moderate use. Enable Apify Residential US if you scale to thousands of companies per run.

Tip: If you provide both startUrls and companySlugs, the actor merges and de-duplicates the queue, so you can mix paste-in URLs from a teammate with a programmatic list from your data warehouse without writing dedup logic yourself.

Output Schema

Every row is one company. All fields are nullable so you can ingest the dataset into a strict schema (Postgres, BigQuery) without per-record branching.

Identity & firmographics

Field	Type	Description
`name`	`string`	Company name as displayed in the `<h1>` of the profile page (e.g. `MongoDB`, `Datadog`)
`slug`	`string`	URL slug (`mongodb`, `snowflake-computing-inc`)
`profileUrl`	`string`	Canonical Built In URL: `https://builtin.com/company/{slug}`
`website`	`string`	Company's own website (first external non-Built In link on the page)
`description`	`string`	One-paragraph company description from `meta[name="description"]`
`industry`	`string`	Primary industry (e.g. `Cloud · Information Technology · Software`)
`industries`	`array<string>`	All industries when more than one is listed
`location`	`string`	Headquarters string (e.g. `New York, NY`)
`offices`	`array<string>`	All listed office locations (multi-office companies)
`employeeCount`	`string`	Employee band as Built In reports it (e.g. `1,000-5,000`, `5000+`)
`foundedYear`	`string`	Four-digit founding year
`logoUrl`	`string`	URL of the company logo from `meta[property="og:image"]`

Tech stack (the unique field)

Field	Type	Description
`techStack`	`array<object>`	Flat list of every technology, each item: `{ "name": "MongoDB", "category": "DATABASES", "iconUrl": "https://..." }`
`techStackByCategory`	`object`	Grouped object: `{ "LANGUAGES": ["JavaScript", "Java"], "DATABASES": ["MongoDB"], "FRAMEWORKS": ["Django", "Kubernetes"] }`
`techStackSize`	`integer`	Total count of tech items extracted across all categories

Talent & culture

Field	Type	Description
`jobs`	`array<object>`	Up to 30 recent job postings, each: `{ "title": "Senior Backend Engineer", "jobId": "123456", "slug": "senior-backend-engineer", "jobUrl": "https://builtin.com/job/..." }`
`jobCount`	`integer`	Total unique jobs extracted (post-dedup by `jobId`)
`perks`	`array<string>`	Up to 50 perks/benefits as Built In lists them (e.g. `Unlimited PTO`, `401(k) matching`, `Remote-friendly`)
`socialLinks`	`array<string>`	LinkedIn, Twitter/X, Facebook, Instagram, YouTube, TikTok URLs

Provenance

Field	Type	Description
`sourceUrl`	`string`	The exact URL fetched (matches `profileUrl`)
`scrapedAt`	`string`	ISO-8601 timestamp captured at the start of the run

Example: MongoDB record (verified against live page)

{
  "name": "MongoDB",
  "slug": "mongodb",
  "profileUrl": "https://builtin.com/company/mongodb",
  "website": "https://www.mongodb.com",
  "description": "MongoDB is the world's leading modern database platform...",
  "industry": "Big Data · Cloud · Database · Software",
  "industries": ["Big Data", "Cloud", "Database", "Software"],
  "location": "New York, NY",
  "offices": ["New York, NY", "Palo Alto, CA", "Austin, TX", "Dublin, Ireland", "Sydney, Australia"],
  "employeeCount": "1,000-5,000",
  "foundedYear": "2007",
  "logoUrl": "https://cdn.builtin.com/logos/mongodb.png",
  "techStack": [
    { "name": "C++",         "category": "LANGUAGES",   "iconUrl": "https://cdn.builtin.com/tech/cpp.svg" },
    { "name": "Java",        "category": "LANGUAGES",   "iconUrl": "https://cdn.builtin.com/tech/java.svg" },
    { "name": "JavaScript",  "category": "LANGUAGES",   "iconUrl": "https://cdn.builtin.com/tech/javascript.svg" },
    { "name": "Golang",      "category": "LANGUAGES",   "iconUrl": "https://cdn.builtin.com/tech/golang.svg" },
    { "name": "Django",      "category": "FRAMEWORKS",  "iconUrl": "https://cdn.builtin.com/tech/django.svg" },
    { "name": "GraphQL",     "category": "FRAMEWORKS",  "iconUrl": "https://cdn.builtin.com/tech/graphql.svg" },
    { "name": "Kubernetes",  "category": "FRAMEWORKS",  "iconUrl": "https://cdn.builtin.com/tech/kubernetes.svg" },
    { "name": "MongoDB",     "category": "DATABASES",   "iconUrl": "https://cdn.builtin.com/tech/mongodb.svg" }
  ],
  "techStackByCategory": {
    "LANGUAGES":  ["C++", "Java", "JavaScript", "Golang"],
    "FRAMEWORKS": ["Django", "GraphQL", "Kubernetes"],
    "DATABASES":  ["MongoDB"]
  },
  "techStackSize": 8,
  "perks": ["Unlimited PTO", "401(k) matching", "Equity", "Remote-friendly", "Health insurance"],
  "jobs": [
    {
      "title": "Senior Backend Engineer, Atlas",
      "jobId": "234567",
      "slug": "senior-backend-engineer-atlas",
      "jobUrl": "https://builtin.com/job/senior-backend-engineer-atlas/234567"
    },
    {
      "title": "Staff Site Reliability Engineer",
      "jobId": "234568",
      "slug": "staff-site-reliability-engineer",
      "jobUrl": "https://builtin.com/job/staff-site-reliability-engineer/234568"
    }
  ],
  "jobCount": 12,
  "socialLinks": [
    "https://www.linkedin.com/company/mongodb",
    "https://twitter.com/MongoDB",
    "https://www.youtube.com/user/MongoDB"
  ],
  "sourceUrl": "https://builtin.com/company/mongodb",
  "scrapedAt": "2026-05-18T10:00:00.000Z"
}

Example: Slim record (small startup with no tech stack listed)

{
  "name": "ExampleCo",
  "slug": "exampleco",
  "profileUrl": "https://builtin.com/company/exampleco",
  "website": "https://www.example.co",
  "description": "Series A fintech startup focused on consumer credit.",
  "industry": "Fintech",
  "industries": null,
  "location": "Austin, TX",
  "offices": null,
  "employeeCount": "11-50",
  "foundedYear": "2022",
  "logoUrl": "https://cdn.builtin.com/logos/exampleco.png",
  "techStack": null,
  "techStackByCategory": null,
  "techStackSize": null,
  "perks": ["Equity", "Remote-friendly"],
  "jobs": [
    {
      "title": "Founding Engineer",
      "jobId": "99999",
      "slug": "founding-engineer",
      "jobUrl": "https://builtin.com/job/founding-engineer/99999"
    }
  ],
  "jobCount": 1,
  "socialLinks": ["https://www.linkedin.com/company/exampleco"],
  "sourceUrl": "https://builtin.com/company/exampleco",
  "scrapedAt": "2026-05-18T10:00:00.000Z"
}

Tech Stack Category Reference

Built In groups technologies under a closed set of UPPERCASE category labels. Knowing the canonical set helps you write downstream WHERE category IN (...) queries with confidence:

Category	What goes there	Examples (verified on live company pages)
`LANGUAGES`	Programming languages	C++, Java, JavaScript, TypeScript, Python, Go (Golang), Ruby, Scala, Kotlin, Swift, Rust, PHP, C#, R
`FRAMEWORKS`	App / web / orchestration frameworks	React, Vue, Angular, Django, Flask, Rails, Spring, Express, Next.js, Node.js, Kubernetes, GraphQL, gRPC
`DATABASES`	OLTP / OLAP / NoSQL / cache	MongoDB, Postgres, MySQL, Redis, Cassandra, DynamoDB, Elasticsearch, Snowflake, BigQuery, Redshift
`DEVOPS`	CI/CD, observability, IaC	Terraform, Ansible, Jenkins, CircleCI, GitHub Actions, Datadog, New Relic, Splunk, PagerDuty
`CLOUD SERVICES`	Hyperscalers + managed services	AWS, GCP, Azure, Heroku, Vercel, Cloudflare, DigitalOcean
`SALESFORCE`	Salesforce ecosystem	Salesforce Sales Cloud, Service Cloud, Marketing Cloud, Pardot, Apex
`ANALYTICS`	BI / product analytics	Looker, Tableau, Mixpanel, Amplitude, Heap, Segment
`DESIGN`	Design tooling	Figma, Sketch, Adobe XD, InVision
`SEARCH ENGINES`	Dedicated search	Elasticsearch, Algolia, Solr, OpenSearch
`COLLABORATION`	Internal team tools	Slack, Asana, Jira, Linear, Notion, Confluence
`OTHER`	Anything not category-tagged on the page	Catch-all fallback (the scraper emits `category: "OTHER"` rather than dropping the item)

The exact set of categories rendered for a given company depends on what its recruiting team chose to publish. Empty categories are not emitted to techStackByCategory.

Use Cases

B2B SaaS Prospecting & Tech-Stack-Triggered Outbound

The single highest-ROI use of category-tagged tech stack data is technology-trigger sales. Sales teams use this actor to:

Build "every company on technology X" lists — pull all 100+ Built In companies using Snowflake to pitch DBT Cloud, all companies on Kubernetes to pitch managed K8s, all Datadog users to pitch a Datadog alternative
Build "every company NOT on technology Y" lists — every company in DATABASES that uses Postgres but not Snowflake = a perfect Snowflake migration pipeline
Score accounts by tech maturity — companies with Terraform + Kubernetes + Datadog signal a mature DevOps practice and qualify for higher-ACV plans
Auto-segment your CRM — pipe scraped data into Salesforce/HubSpot and tag accounts with custom tech-stack-derived fields (uses_mongodb=true, cloud_provider=aws)
Time outreach to job-posting signals — pair tech-stack triggers with the jobs[] field ("posted a Senior Platform Engineer role and uses Kubernetes" == active buyer)
Replace expensive sources — BuiltWith / HG Insights / TheirStack subscriptions start at $5K/year for the same fingerprint data Built In publishes openly

Technical Recruiting & Talent Pipeline Intel

Recruiters and talent acquisition teams use the dataset to:

Find every company using a target stack — "all Built In companies running PyTorch + AWS + Kubernetes" for ML-engineer placement
Source competitor talent — when placing a candidate from Stripe, pull every Built In company on Stripe's stack as a target shortlist
Build candidate-to-company matchmaking — given a candidate's resume tech list, rank Built In companies by stack-overlap percentage
Monitor competitor hiring velocity — jobCount per company per week is a leading indicator of growth / layoffs
Identify niche-stack employers — companies with Elixir, Rust, Clojure, or Haskell in LANGUAGES for hard-to-source talent
Benchmark perks by stack — compare perks[] arrays across companies with comparable tech stacks for offer-package research

Competitive Vendor-Landscape Mapping

Product, marketing, and CI teams use this to map adoption across categories:

Observability landscape — Datadog vs New Relic vs Honeycomb vs Splunk vs Grafana share-of-stack across the Built In universe
Data warehouse landscape — Snowflake vs Databricks vs BigQuery vs Redshift adoption by company size
CI/CD landscape — CircleCI vs GitHub Actions vs Jenkins vs GitLab CI penetration
Frontend framework landscape — React vs Vue vs Angular vs Svelte across new vs established companies
Quarterly trend reports — re-run the actor against a fixed slug list every quarter and diff results to spot adoption swings

Investor / VC Portfolio Analytics

Venture capital firms and growth-equity investors use tech-stack data as a leading indicator:

Technical-maturity scoring — portfolio companies adopting Terraform + IaC + observability stacks are scaling readiness signals
Stack-overlap analysis — for buy-side deals, compare a target's stack to the firm's existing portfolio to estimate integration cost
Trend-spot emerging tooling — track quarter-over-quarter adoption of new dev tools across 200 portfolio companies
Comp-set construction — build a peer comp set from Built In companies with the same employee band + same primary tech category
Sector-thesis validation — confirm "AI infra spend is up" by counting growth in PyTorch / Triton / vLLM adoption across the dataset

Conference organizers and B2B events teams use this to target outreach:

KubeCon — pull every company with Kubernetes + Istio in their stack as a sponsor / attendee target list
Snowflake Summit — every Built In company with Snowflake in DATABASES becomes a registrant CRM record
Re:Invent / Google Cloud Next — segment by CLOUD SERVICES adoption for hyperscaler-specific events
DevOps Days — pull every company with DEVOPS category items for regional event marketing
Stack-specific meetups — Rust meetup organizers can pull every Built In company with Rust in LANGUAGES

DevRel & Open-Source Community Building

Developer-relations teams use tech stack data to grow communities:

Find new adopters — schedule the actor weekly and diff techStack arrays to detect companies that newly adopted your OSS framework (LangChain, Next.js, Drizzle ORM, etc.)
Build customer-story pipelines — identify companies running your stack at scale as case-study candidates
Power developer-marketing campaigns — generate "X companies are building on [your framework]" landing-page social proof, auto-updated weekly
Find prospective contributors — companies running your project commercially are the most likely source of upstream contributions
Targeted ad creative — Built In employer logos make for high-trust social-proof ad units

Executive Search & Headhunting

Headhunters and executive-search firms use the stack + jobs fingerprint to:

Identify CTO targets by exact tech-stack overlap with the hiring brief
Map VP of Engineering candidates at companies in a comparable employee band running similar infrastructure
Build longlist + shortlist for Director-level technical roles in days, not weeks
Track when an exec is hiring — jobCount spikes correlate with leadership-team expansion windows
Cross-reference with perks[] to identify companies offering equity / remote / unlimited PTO as part of the pitch

M&A Scouting & Acquisition Due Diligence

Corporate development and M&A advisors use stack data for build-vs-buy and integration planning:

Tech-stack compatibility filter — score acquisition targets by stack overlap with the acquirer
Integration cost modelling — incompatibility-heavy targets (Python shop acquiring a Java shop) increase post-deal integration timelines
Talent-retention modelling — engineering teams with niche stacks (Erlang, OCaml) are higher flight risks post-acquisition
Tuck-in candidate sourcing — find small Built In companies in DATABASES with a complementary stack to your portfolio company
Counter-bid intelligence — when a competitor acquires, scrape every comparable Built In company to identify likely next targets

Market Research & Analyst Reports

Industry analysts, equity research, and tech-trade journalists use the dataset to:

Quantify adoption trends — produce "what % of Built In companies run Snowflake in 2026" data points
Year-over-year shift reporting — re-run quarterly and produce category-level migration narratives ("share of companies running Postgres dropped 8% as Aurora adoption rose")
Geographic adoption maps — group techStackByCategory results by location to map regional stack preferences
Employee-band-segmented reports — compare stack composition for 11–50 vs 1,000+ employee companies
Source quotable data for white papers, investor decks, and trade-publication articles

Programmatic Ad Targeting & Custom Audiences

Marketing teams use stack data to build hyper-relevant ad audiences:

LinkedIn Matched Audiences — upload Built In companies + their LinkedIn URLs (from socialLinks[]) as company-targeting audiences
Facebook Custom Audiences — same approach for B2B awareness campaigns
Programmatic display retargeting — segment ad creative by detected tech stack (different ad to AWS shops vs Azure shops)
Account-based-marketing (ABM) — power the top of an ABM funnel with stack-defined target accounts that map exactly to your ICP
Email-warming sequences — pair the company list with tech-stack-personalized subject lines and lead-magnet content

Sample Queries & Recipes

Recipe 1: Single-company deep pull

{
  "companySlugs": ["mongodb"],
  "scrapeJobs": true,
  "maxRecords": 1
}

Goal: get one perfect record for schema validation, dashboard prototyping, or demo screenshots.

Recipe 2: Snowflake-adoption hit list

{
  "companySlugs": [
    "stripe", "plaid", "discord", "instacart",
    "doordash", "robinhood", "coinbase", "airtable",
    "notion", "linear-app", "figma", "vercel"
  ],
  "scrapeJobs": false,
  "maxRecords": 50
}

Then filter downstream:

snowflake_users = [c for c in items
    if c.get("techStackByCategory", {}).get("DATABASES")
    and "Snowflake" in c["techStackByCategory"]["DATABASES"]]

Goal: a Snowflake sales rep pulls their territory of warm tech-trigger leads in one run.

Recipe 3: Kubernetes + DevOps maturity scoring

{
  "startUrls": [
    { "url": "https://builtin.com/company/datadog" },
    { "url": "https://builtin.com/company/snyk" },
    { "url": "https://builtin.com/company/honeycomb-io" }
  ],
  "maxRecords": 20
}

Goal: every company on K8s + a DEVOPS category entry gets a +1 maturity score in a DevOps-tooling vendor's lead-scoring model.

Recipe 4: ML / AI infrastructure prospects

{
  "companySlugs": [
    "openai", "anthropic", "hugging-face", "scale-ai",
    "weights-biases", "databricks", "pinecone", "weaviate"
  ],
  "scrapeJobs": true,
  "maxRecords": 30
}

Goal: identify ML platform spend signals — PyTorch + AWS + Kubernetes in the stack PLUS active ML-engineer roles in jobs[] = active buyer for vector DBs, GPU clouds, or MLOps tooling.

Recipe 5: Multi-office expansion targets

{
  "companySlugs": ["mongodb", "stripe", "snowflake-computing-inc"],
  "scrapeJobs": false,
  "maxRecords": 50
}

Goal: any company with 3+ entries in offices[] qualifies as a multi-region target for enterprise contract software (HRIS, payroll, global benefits).

Recipe 6: New-adopter tracking (weekly diff)

{
  "companySlugs": [
    "vercel", "linear-app", "supabase",
    "planetscale", "neon-tech", "railway-app"
  ],
  "scrapeJobs": false,
  "maxRecords": 50
}

Schedule weekly. Diff techStack[] arrays week-over-week to find companies that newly adopted your DevRel team's framework.

Recipe 7: High-throughput crawl with politeness

{
  "companySlugs": ["mongodb", "datadog", "stripe", "snowflake-computing-inc",
                    "plaid", "robinhood", "coinbase", "airbnb", "doordash"],
  "scrapeJobs": true,
  "maxRecords": 500,
  "requestDelay": 1500,
  "maxConcurrency": 2
}

Goal: large-batch overnight run, optimized for "never get blocked" rather than for raw speed.

Integration Examples

Google Sheets (via Apify Integration)

Schedule the actor (e.g. weekly Sunday 22:00 UTC) with a fixed slug list
Add the "Export to Google Sheets" integration to the schedule
Receive a fresh Built In company sheet every week, including the JSON-stringified techStackByCategory field for spreadsheet-side filtering

Make.com / Zapier / n8n

Use the Apify connector on any major automation platform. Trigger downstream workflows on:

New companies appearing in your watch list
New tech-stack items added since last run (auto-detect adoption events)
New jobs posted since last run (auto-route to sales / recruiter Slack channels)
Office count changes (geographic-expansion signal)
Employee-count band changes (growth signal)

Postgres / Snowflake / BigQuery

Recommended schema for warehouse ingestion:

CREATE TABLE builtin_companies (
  scraped_at         TIMESTAMP,
  slug               TEXT PRIMARY KEY,
  name               TEXT,
  industry           TEXT,
  location           TEXT,
  employee_count     TEXT,
  founded_year       TEXT,
  tech_stack_size    INTEGER,
  job_count          INTEGER,
  tech_stack         JSONB,        -- raw array of {name, category, iconUrl}
  tech_by_category   JSONB,        -- grouped object
  perks              JSONB,
  offices            JSONB,
  social_links       JSONB,
  source_url         TEXT
);

Use the Apify webhook to POST run results to a small ingest endpoint after every scheduled run.

Power BI / Tableau / Looker

Connect the Apify REST API as a data source. Build dashboards covering:

Top 20 most-adopted technologies across the dataset
Category share-of-wallet (LANGUAGES mix, DATABASES mix, CLOUD SERVICES mix)
Adoption growth quarter-over-quarter for a target technology
Geographic heat map of company HQs
Employee-band × stack-category cross-tab

Salesforce / HubSpot CRM Enrichment

Run the actor on your CRM-account-domain → Built In-slug map nightly. Upsert against Account records keyed on slug. Custom-field examples:

uses_mongodb__c (boolean) — derived from techStack containing MongoDB
cloud_provider__c (picklist) — derived from techStackByCategory.CLOUD SERVICES
engineering_team_size__c (number) — proxy via employeeCount band
active_engineering_roles__c (number) — from jobCount
tech_stack_fingerprint__c (longtext) — JSON-stringified techStackByCategory

Webhooks for Real-Time Triggers

Wire Apify run-complete webhooks into your internal automation:

// In your webhook handler
for (const company of newItems) {
  if (company.techStack?.some(t => t.name === 'Snowflake')
      && company.jobCount > 0) {
    notifySalesRep('snowflake-team', company);
  }
}

Major Markets & Tech Hubs at a Glance

Built In's company directory skews toward US tech hubs plus selected international cities. The actor returns whatever HQ a company self-reports, so you get a global footprint when scraping global companies:

Tech Hub	Built In Presence	Notes
San Francisco / Bay Area	Very high	Headquarters for OpenAI, Stripe, Airbnb, Databricks, Anthropic and most YC-funded growth-stage startups
New York, NY	Very high	Finance + media-tech: MongoDB, Datadog, Peloton, Squarespace, Etsy
Austin, TX	Very high	Crypto + B2B SaaS: Indeed, Bumble, RetailMeNot
Seattle, WA	High	Cloud-adjacent: Amazon-orbit + Smartsheet, Outreach, Highspot
Boston, MA	High	Biotech + enterprise: HubSpot, Wayfair, DraftKings
Los Angeles, CA	High	Entertainment-tech + creator economy
Chicago, IL	High	Built In's birthplace — Sprout Social, Coinbase Chicago, Tock
Denver / Boulder, CO	High	Climate-tech + SaaS
Atlanta, GA	Medium-high	Fintech + supply-chain
Miami, FL	Medium	Crypto + LATAM-facing tech
Toronto, Canada	Medium	Shopify-orbit
London, UK	Medium	European tech HQs
Dublin, Ireland	Medium	European hubs of US companies (MongoDB Dublin, Stripe Dublin)
Berlin, Germany	Medium	European startup ecosystem
Sydney, Australia	Medium	APAC offices
Singapore	Low-medium	APAC regional offices

The dataset is as global as Built In is — coverage depth follows where Built In's editorial team and recruiting customer base focus.

Cost & Performance

Metric	Value
Engine	HTTP-only — `got-scraping` + `cheerio`
Runtime per company	~1–2 seconds (default delay + parsing)
Runtime for 100 companies	~3–5 minutes (with default 1000 ms delay, concurrency 3)
Runtime for 1,000 companies	~30–50 minutes (recommended overnight)
Cost per company	Fractions of a cent in Compute Units
Pricing model	Pay-per-event — only pay when you run
Data freshness	Live at run time — exactly what Built In is serving the public right now
Auth required	None
Proxy required	Optional — disabled by default
Concurrency	Default 3; safe range 2–4; ceiling 10
Memory footprint	256 MB ample; 512 MB if scraping 1,000+ companies in one run
Failure mode	`Actor.fail()` on zero records — no silent empty datasets

Compliance, Privacy & Legal Notes

Public data only — every field returned is published openly by Built In at https://builtin.com/company/{slug} and rendered to any unauthenticated visitor
No PII / no PHI — the dataset contains zero patient health information, no personal identifiers beyond what companies voluntarily publish on their own recruiting pages (company name, office addresses, jobs)
No emails or phone numbers of individuals are extracted
No login / no scraping behind paywalls — the actor never authenticates to Built In
Polite by default — per-request delay + low concurrency + realistic browser headers + 3-attempt retry with backoff respect Built In's infrastructure
robots.txt deference — operators using this actor should review Built In's current robots.txt and the site's Terms of Service; the actor itself does not embed any robots-bypass logic
GDPR / CCPA — compliance with downstream data-protection regulation is the responsibility of the data consumer. Company-level firmographic data is generally outside GDPR's personal-data scope but always confirm with counsel
CAN-SPAM, TCPA — if you use scraped data for outbound marketing, compliance with anti-spam and call-restriction laws is your responsibility

Important: Built In data may not be used for unlawful purposes. Read Built In's Terms of Service and use the data only for the legitimate business, research, recruiting, and journalism purposes for which the company publishes it.

Frequently Asked Questions

How fresh is the data?

Live at run time. Every run fetches the current HTML directly from builtin.com/company/{slug}. There is no caching layer between Built In's web server and the data you receive. If a company updated its tech stack 10 minutes before your run, you will see the new entries.

How many companies can I scrape in one run?

There is no hard cap from the actor itself — the maxRecords parameter is your limit. Practically, 100 companies takes ~3–5 minutes, 1,000 takes ~30–50 minutes at default politeness settings. For larger crawls, schedule multiple runs or increase maxConcurrency modestly.

Does this require a Built In account or API key?

No. Built In does not require authentication to view company profiles. The actor only needs your Apify token.

Why is the tech stack the key feature?

Because no other major company-data provider publishes the per-company tech stack labelled by category in a structured, openly-scrapable form. LinkedIn, Crunchbase, AngelList, and PitchBook either omit it entirely or hide it behind enterprise contracts costing tens of thousands of dollars per year. Built In publishes it openly because companies use Built In to recruit engineers, and engineers want to know what stack they'd be working on.

Which categories does Built In use?

The common set we have verified across live profiles: LANGUAGES, FRAMEWORKS, DATABASES, DEVOPS, CLOUD SERVICES, SALESFORCE, ANALYTICS, DESIGN, SEARCH ENGINES, COLLABORATION. The actor preserves whatever UPPERCASE label Built In renders. Anything without an explicit category label is grouped under OTHER so you never lose data.

How do I find a company's slug?

The slug is the last path segment of the Built In URL. For https://builtin.com/company/snowflake-computing-inc, the slug is snowflake-computing-inc. You can also paste the full URL into startUrls and the actor will parse the slug automatically.

What happens if a slug is wrong / 404?

The fetch returns either no HTML or a thin "not found" page; the actor's body-size check (> 1000 bytes) and selector validation (must find <h1> and tech section) drop empty rows rather than push garbage. If every slug 404s, the actor calls Actor.fail() so your pipeline sees an explicit failure.

Can I run this against startUrls and companySlugs at the same time?

Yes. Both inputs are merged into a single queue with URL-level deduplication. startUrls win when both refer to the same slug.

Does the actor handle multi-office companies?

Yes. The offices[] array captures every <address> element under the Offices section, so global companies like MongoDB, Stripe, and Snowflake come back with all their regional offices intact.

Can I disable job scraping?

Yes — set scrapeJobs: false. The actor skips the /job/... link extraction loop, shaving a small amount of parsing time when you only care about firmographics + tech stack.

Does this work for international companies?

Yes, anywhere Built In has a profile. While Built In's coverage skews to US tech hubs, it lists companies headquartered in Canada, the UK, Ireland, Germany, Australia, Singapore, and beyond. The actor's parsing logic is country-agnostic.

Is residential / proxy required?

No, not by default. Built In's anti-bot is light enough that direct datacenter IPs work for moderate loads. Enable Apify Residential US if you crawl thousands of companies in a single run or hit rate-limit responses.

Does this work on the Apify Free Plan?

Yes — full functionality. Small runs (10–50 companies) typically cost only fractions of a cent in Compute Units, well within the free monthly allowance.

How is this different from BuiltWith / Wappalyzer / TheirStack?

Those products use inferred tech stack — they look at HTTP headers, JS fingerprints, and DNS records to guess what tech a company runs. This actor extracts what companies explicitly publish about their stack to recruit engineers. It is far more accurate for backend services (databases, frameworks, languages) that don't leak to the public web, but only covers companies that have a Built In profile.

Can I schedule this to run automatically?

Yes — Apify's built-in Scheduler supports hourly, daily, weekly, or arbitrary-cron schedules. A weekly run is the sweet spot for tracking tech-stack adoption changes; daily for active sales/recruiting watchlists.

What output formats are supported?

JSON, CSV, Excel (XLSX), HTML, XML, RSS, and JSON Lines — directly from the Apify dataset view or via the dataset items API.

How do I detect when a company adopts a new technology?

Schedule the actor weekly on a fixed slug list. After each run, compute the set difference between this run's techStack[] names and last run's. Any new entry is a freshly-adopted technology — these are gold for sales/DevRel outreach.

Why might `techStack` come back null for some companies?

Smaller companies, recently-claimed profiles, or non-engineering companies often choose not to publish a tech stack. The actor returns null rather than an empty array so you can distinguish "no data published" from "we tried and got zero items."

How do I report a bug or request a feature?

Use the Issues tab on the Apify Store actor page, or contact the developer directly through the Apify Console.

If you're building a tech-talent / B2B SaaS intel stack, these complementary actors plug in cleanly:

H1B Visa Database Scraper — every US visa-friendly employer (perfect cross-reference: which Built In companies actively H1B-sponsor?)
YCombinator Companies Scraper — 5,900+ funded startups (overlay YC × Built In to find well-funded tech-stack-disclosed targets)
Levels.fyi Scraper — compensation data at top tech companies (pair tech stack + comp for recruiter pitches)
Salary.com Scraper — US salary benchmarks across roles and metros
SEEK Scraper (Australia / NZ) — APAC tech-job listings for cross-region pipelines
BBB Business Scraper — US business directory for firmographic enrichment
SAM.gov Federal Contractor Scraper — government contractors for public-sector tech-sales pipelines
ProductHunt Launches & Makers Scraper — daily startup launches, makers, votes & reviews — VC/founder/recruiter intel
Texas Pharmacy License Scraper — TSBP — example of a high-volume regulated-data extraction
Arizona ROC Contractor License Scraper — same pattern for contractor licensing
California DCA Professional License Scraper — California professional licensing data
Ohio eLicense Scraper — Ohio-wide professional license database

Comparison vs. Alternatives

Approach	Setup time	Tech-stack granularity	Category labels	Updated	Cost (1,000 companies)
This actor	< 5 minutes	Per-company exact stack	Yes — `LANGUAGES`, `DATABASES`, etc.	Live at run time	Fractions of a cent in CUs
Manual copy-paste from Built In	Hours/days	High	Yes (manual)	Stale by hour 1	Free + analyst hours
Custom Cheerio scraper (DIY)	8–20 hours dev	Same	Same (if you debug it)	Live	Free + ongoing maintenance
BuiltWith / HG Insights	Days (sales cycle)	Inferred (fingerprint-based)	Some	Daily-ish	$5K–30K/year
TheirStack	Days	Inferred	Some	Daily-ish	$3K–15K/year
LinkedIn Sales Navigator	Hours	Partial — no stack	None	Live	$1K+/year/seat
Crunchbase Enterprise	Days	Minimal	None	Daily	$30K+/year

Why Pay-Per-Event Pricing?

Most data scrapers either charge a flat monthly subscription (you pay even on weeks you don't run) or per-Compute-Unit (unpredictable on long jobs). This actor uses pay-per-event, which means:

You only pay when the actor actually runs
Charges scale with how many companies you actually consume
Transparent, line-item billing inside Apify
No monthly minimums and no annual commitments
Free to evaluate — sample with maxRecords: 5 for pennies before any larger crawl
Predictable per-record cost makes ROI modelling easy

Changelog

Version	Date	Notes
1.0.0	2026-05	Initial public release — HTTP-only `got-scraping` + `cheerio`, full category-tagged tech stack, recent jobs, perks, offices, multi-social-link extraction, configurable politeness, optional proxy, `Actor.fail()` on zero records

Keywords

BuiltIn scraper · BuiltIn tech stack scraper · builtin.com company scraper · builtin.com tech stack · tech stack database · company tech stack lookup · category-tagged tech stack · developer recruiting database · technical recruiter intel · B2B SaaS prospecting · tech vendor landscape · competitive intel scraper · tech company directory scraper · company technology fingerprint · BuiltIn company profile scraper · tech stack fingerprint · technology adoption signals · tech-trigger sales leads · LANGUAGES FRAMEWORKS DATABASES scraper · MongoDB user list · Datadog user list · Snowflake adopter list · Kubernetes user database · AWS GCP Azure adopter lookup · React Django Rails framework adoption · ML engineer recruiting database · DevOps tooling vendor landscape · CTO target sourcing · M&A tech-stack compatibility · YC startup tech stack · investor portfolio technical maturity · KubeCon sponsor list · Snowflake Summit prospect list · DevRel adoption tracking · OSS framework new-adopter detection · ABM target accounts · technology-triggered outbound · tech company directory API · BuiltIn Apify actor · Built In tech company scraper · BuiltIn job listings scraper · BuiltIn perks scraper · BuiltIn office locations · startup technology fingerprint · scale-up tech stack database · BuiltIn data extraction · BuiltIn HTML scraper · BuiltIn cheerio scraper · BuiltIn HTTP scraper

Support

Bug reports: Use the Issues tab on the Apify Store page
Feature requests: Same place — please describe your use case so we can prioritize correctly
Direct contact: Through the Apify developer profile

If this actor saves your sales / recruiting / DevRel team hours of manual research, a 5-star rating on the Apify Store helps other tech-go-to-market teams discover it. Thank you!

📅 Changelog & Maintenance

Last updated: 2026-07-02 — Actor verified and maintained. Data pipeline tested for quality, structure and freshness; selectors/endpoints confirmed against the live site.

Website Tech API

vivid_astronaut/website-tech

Fabio Suizu

Website Tech Stack Detector

rationalistic_counsel/website-tech-stack-detector

J N

🔬 Website Tech Stack Detector - See the Tech Stack

benthepythondev/website-tech-detector

Detect technologies, frameworks, CMS, analytics, hosting, and more used by any website. Perfect for competitive analysis, sales intelligence, and tech research.

Ben

Company Tech-Stack & Domain Intelligence API

technicaldost/company-intelligence-api

Domain to company tech stack, founded year, headcount, industry, funding, socials & logo. Open-data company enrichment API. A BuiltWith + Clearbit alternative.

Technical Dost Solutions

CutShort Scraper: Tech Jobs, Salary & Company Tech Stacks

getascraper/cutshort-scraper

Scrape tech and startup jobs from CutShort.io. Extract highly structured salary bounds, experience, required skills, company tech stack, funding, size, and even recruiter contact details. Built for recruiters, sales agencies, and developers.

GetAScraper

Built In Tech Jobs Scraper — Startup Jobs, No Login

logiover/built-in-tech-jobs-scraper

Built In jobs API alternative: scrape builtin.com tech & startup listings without login. Export jobs to CSV or JSON dataset; no API key needed.

Logiover

Website Tech & Contact Audit — Tech Stack + Emails API

nexgendata/local-business-tech-contact-audit

Audit any website in one call: detect the tech stack (CMS, frameworks, analytics, hosting) and extract contact info (emails, phones, social profiles). Lead-qualification for agencies and sales.

NexGenData

B2B Lead Scraper — Company Emails, Phones & Tech Stack

junipr/b2b-lead-scraper

Generate public B2B lead records from company websites with emails, phones, social links, tech stack, team signals, job posts, and quality scoring.

junipr

BuiltIn Scraper

crawlerbros/builtin-scraper

Scrape BuiltIn.com - tech company intelligence platform. Get company profiles including industry, employee count, benefits, office locations, description, and open job listings.

Crawler Bros