BuiltIn.com Tech Companies & Tech Stack Scraper avatar

BuiltIn.com Tech Companies & Tech Stack Scraper

Pricing

from $3.00 / 1,000 results

Go to Apify Store
BuiltIn.com Tech Companies & Tech Stack Scraper

BuiltIn.com Tech Companies & Tech Stack Scraper

Scrape Built In company profiles — name, industry, location, recent jobs, full tech stack with category labels (LANGUAGES / FRAMEWORKS / DATABASES / etc.). Unique technology-fingerprint data for B2B SaaS prospecting, recruiter intel and competitive analysis. HTTP-only.

Pricing

from $3.00 / 1,000 results

Rating

0.0

(0)

Developer

Haketa

Haketa

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

a day ago

Last modified

Share

BuiltIn.com Tech Companies & Tech Stack Scraper — Category-Tagged Technology Fingerprints for Every Startup on Built In

The fastest way to extract category-labelled tech stacks (LANGUAGES / FRAMEWORKS / DATABASES / DEVOPS / CLOUD SERVICES / SALESFORCE) from every company profile on builtin.com. Pull the unique Technology We Use fingerprint that LinkedIn, Crunchbase, AngelList and PitchBook do not publish — and turn it into a B2B SaaS prospecting list, recruiter pipeline, or competitive-vendor landscape in minutes. HTTP-only, no browser, no login.

Apify Actor


What This Actor Does

The BuiltIn.com Tech Companies & Tech Stack Scraper is a production-grade Apify Actor that turns any public Built In company profile (builtin.com/company/{slug}) into a clean, structured JSON record — including the single most valuable field Built In publishes that no other major company database carries in structured form: the per-company Technology We Use panel, tagged by category (LANGUAGES, FRAMEWORKS, DATABASES, DEVOPS, CLOUD SERVICES, SALESFORCE, ANALYTICS, DESIGN, etc.).

Built In is a tech-focused career platform used by thousands of startups, growth-stage scale-ups, and public tech companies to recruit engineering talent. To attract developers, these companies voluntarily publish a granular breakdown of the languages, frameworks, databases, cloud providers, observability stacks, and design tools they actually run in production — the kind of fingerprint that competitive-intelligence platforms typically gate behind $30,000-a-year contracts.

This actor extracts that fingerprint at HTTP speed (~1–2 seconds per company) so you can build:

  • B2B SaaS prospect lists segmented by exact technology adoption ("every Built In company using MongoDB but not Snowflake")
  • Recruiter pipelines filtered by stack ("every startup running PyTorch + AWS + Kubernetes")
  • Competitive vendor-landscape maps ("Datadog vs New Relic vs Honeycomb adoption across the Built In ecosystem")
  • Investor portfolio dashboards tracking technical maturity signals (DevOps + CI/CD adoption == scaling)
  • Conference / sponsor outreach lists targeting users of a specific category-tagged technology

Every record returned includes the company's identity, industry, location, employee band, founded year, multi-office presence, full perks list, social links, recently posted jobs, and the full category-grouped tech stack — ready to drop into Postgres, Snowflake, BigQuery, Salesforce, HubSpot, or Google Sheets.

Entity types returned per company

  • Identityname, slug, profileUrl, website, logoUrl, description
  • Firmographicsindustry, industries[], location, offices[], employeeCount, foundedYear
  • Tech stacktechStack[] (flat list with {name, category, iconUrl}), techStackByCategory (grouped object: { LANGUAGES: [...], FRAMEWORKS: [...], DATABASES: [...] }), techStackSize
  • Talent signalsjobs[] (recent postings: title, jobId, slug, jobUrl), jobCount, perks[]
  • Social presencesocialLinks[] (LinkedIn, Twitter/X, Facebook, Instagram, YouTube, TikTok)
  • ProvenancesourceUrl, scrapedAt

Why scrape Built In yourself when this exists?

Built In's public HTML looks deceptively simple — until you actually try to parse 500 company profiles in a row. Teams who try the DIY route consistently hit the same wall:

  • The Technology We Use panel is rendered inside repeating <div class="tech-icon-container"> blocks with no JSON-LD or microdata fallback — every tech item is an <img alt="..."> plus two sibling <div> elements, one for the tech name and one for the UPPERCASE category label. Naïve scrapers grab the names and lose the category entirely.
  • Category labels are positional, not semantic — you cannot rely on a CSS class to know whether MongoDB belongs to DATABASES or to LANGUAGES; you have to walk the sibling structure and detect "is this child an UPPERCASE category vs a tech name?" logic.
  • Tabs split the stack by department (Engineering / Data / Design / Marketing) — a single-pass scraper that doesn't traverse all tab containers will under-count tech stack by 40–70%.
  • Industry, employee count, founded year, and HQ live inside a fact list with inconsistent labels — sometimes Total Employees, sometimes Team Size, sometimes Employees; same for Headquarters vs Location vs HQ.
  • Multi-office companies (Stripe, Snowflake, MongoDB) list 3–10 office locations inside <address> tags scattered across the page — manual parsers usually capture only the first.
  • Job links use the /job/{slug}/{numericId} pattern, and the same job link can appear multiple times on the page (hero banner + recent jobs list + footer) — deduplication by jobId is required.
  • Social links are mixed into a generic outbound-anchor blob alongside affiliate pixels, support links, and Built In's own internal https://builtin.com/... URLs — you need a filter on (facebook|linkedin|twitter|x|youtube|instagram|tiktok).com and an explicit exclusion of the Built In domain.
  • 404 slugs return a styled "company not found" page (which still returns HTTP 200 in some cases) — your parser must validate that an <h1> and a tech-icon-container actually exist before pushing a row.
  • Header generation matters — Built In aggressively serves a minimalist fallback page to bot-fingerprinted requests (no User-Agent rotation, no Accept-Language, no Sec-CH-UA hints).
  • Politeness matters more — 50+ requests per minute from a single IP will trigger a soft block within a few minutes.

This actor solves all of that: realistic browser headers via got-scraping, 3-attempt exponential-backoff retries, polite per-request delay + concurrency limiter, full category-grouped tech-stack normalization, dedup by jobId, social-link filter, and Actor.fail() when zero records come back so you never silently ship an empty dataset to your downstream warehouse.


Quick Start

One-Click Run

  1. Click "Try for free" on the Apify Store page
  2. Add a handful of company slugs (e.g. mongodb, datadog, stripe, plaid)
  3. Hit Start — typical run is under 30 seconds for 10 companies
  4. Download the dataset as JSON, CSV, Excel, HTML, XML, or RSS directly from the Apify dataset view

API Run (Python)

from apify_client import ApifyClient
client = ApifyClient("YOUR_APIFY_TOKEN")
run = client.actor("haketa/builtin-tech-companies-scraper").call(run_input={
"companySlugs": ["mongodb", "datadog", "stripe", "snowflake-computing-inc", "plaid"],
"scrapeJobs": True,
"maxRecords": 100,
"requestDelay": 1000,
"maxConcurrency": 3
})
for company in client.dataset(run["defaultDatasetId"]).iterate_items():
print(company["name"], "→", company["techStackSize"], "tech items")
if company.get("techStackByCategory"):
for category, items in company["techStackByCategory"].items():
print(f" {category}: {', '.join(items)}")

API Run (Node.js / TypeScript)

import { ApifyClient } from 'apify-client';
const client = new ApifyClient({ token: 'YOUR_APIFY_TOKEN' });
const run = await client.actor('haketa/builtin-tech-companies-scraper').call({
startUrls: [
{ url: 'https://builtin.com/company/mongodb' },
{ url: 'https://builtin.com/company/datadog' }
],
scrapeJobs: true,
maxRecords: 50
});
const { items } = await client.dataset(run.defaultDatasetId).listItems();
// Find every company running Kubernetes
const k8sUsers = items.filter(c =>
c.techStack?.some(t => t.name === 'Kubernetes')
);
console.log(`${k8sUsers.length} companies on Kubernetes`);

API Run (cURL)

curl -X POST "https://api.apify.com/v2/acts/haketa~builtin-tech-companies-scraper/runs?token=YOUR_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"companySlugs": ["mongodb", "datadog", "stripe"],
"scrapeJobs": true,
"maxRecords": 100
}'

How It Works

Built In serves fully server-rendered HTML for every /company/{slug} URL — no SPA, no GraphQL, no client-side hydration required to see the data. That makes this actor HTTP-only via got-scraping + cheerio, which keeps cost and runtime an order of magnitude lower than browser-based equivalents.

Source endpoints

Source URL patternPurposeExample
https://builtin.com/company/{slug}Single company profile pagehttps://builtin.com/company/mongodb
https://builtin.com/job/{slug}/{numericId}Individual job posting (extracted as link only)https://builtin.com/job/senior-backend-engineer/123456

Architecture

  • Direct HTTPS GET with realistic Chrome 120+ desktop headers (User-Agent, Accept-Language, Sec-CH-UA), generated per request by got-scraping's header generator
  • No headless browser — no Puppeteer, no Playwright, no Chrome — keeps runtime ~1–2 seconds per company and memory < 256 MB
  • Cheerio HTML parsing with targeted selectors per data section
  • 3-attempt retry with exponential backoff + jitter (2s × attempt + random 0–1500ms) on HTTP failures or thin-body responses
  • Polite request pacing — configurable requestDelay (default 1000 ms) per fetch + jitter to avoid burst-pattern detection
  • Concurrency limiter — async worker pool (maxConcurrency, default 3) processes the slug queue in parallel without overwhelming Built In
  • Tech stack normalizer — walks .tech-icon-container blocks, extracts each <img alt="..."> as the tech name, finds the sibling UPPERCASE <div> as the category, and emits both a flat techStack[] array and a grouped techStackByCategory object
  • Job deduplication — multiple appearances of the same /job/{slug}/{id} link on a page collapse to a single entry keyed on jobId
  • Social-link filter — matches (facebook|instagram|twitter|x|linkedin|youtube|tiktok).com while excluding any builtin.com URL
  • Office multi-extraction — captures every <address> element within the Offices section so multi-location companies (Stripe, Snowflake, MongoDB) come through fully
  • Hard fail on emptyActor.fail() triggers if zero rows are written, so downstream pipelines never silently consume an empty dataset

Proxy

Proxy is optional and disabled by default. Built In's anti-bot is light enough that direct datacenter IPs work for moderate loads. If you scale to thousands of companies per run or your IP ends up rate-limited, enable Apify Residential US through the standard proxy configuration block.


Input Parameters

{
"startUrls": [
{ "url": "https://builtin.com/company/mongodb" },
{ "url": "https://builtin.com/company/datadog" }
],
"companySlugs": ["stripe", "snowflake-computing-inc", "plaid"],
"scrapeJobs": true,
"maxRecords": 100,
"requestDelay": 1000,
"maxConcurrency": 3,
"proxyConfiguration": { "useApifyProxy": false }
}

Parameter reference

ParameterTypeDefaultDescription
startUrlsarray<object|string>[]Paste any Built In company URL — https://builtin.com/company/{slug}. When provided, overrides companySlugs. Mix-and-match is allowed; the actor parses the slug out of each URL.
companySlugsarray<string>["mongodb"]Built In company URL slugs. Examples: mongodb, datadog, stripe, snowflake-computing-inc, plaid. Each runs as a separate task. Slugs are normalized to lower-case and stripped of any leading /company/ prefix.
scrapeJobsbooleantrueWhen true, the parser extracts up to 30 unique recent job postings per company (title, jobId, slug, jobUrl). Disable to shave a small amount of parsing time when you only care about firmographics + tech stack.
maxRecordsinteger100Hard cap on total companies saved. Set 0 for unlimited. Useful for sampling.
requestDelayinteger (ms)1000Delay between company-page fetches (plus 0–500 ms jitter). 800–2000 ms is the polite zone.
maxConcurrencyinteger3Parallel company-page fetches. 2–4 is safe; values > 5 risk triggering soft blocks.
proxyConfigurationobject{ "useApifyProxy": false }Optional. Built In has light anti-bot — proxy is generally not required for moderate use. Enable Apify Residential US if you scale to thousands of companies per run.

Tip: If you provide both startUrls and companySlugs, the actor merges and de-duplicates the queue, so you can mix paste-in URLs from a teammate with a programmatic list from your data warehouse without writing dedup logic yourself.


Output Schema

Every row is one company. All fields are nullable so you can ingest the dataset into a strict schema (Postgres, BigQuery) without per-record branching.

Identity & firmographics

FieldTypeDescription
namestringCompany name as displayed in the <h1> of the profile page (e.g. MongoDB, Datadog)
slugstringURL slug (mongodb, snowflake-computing-inc)
profileUrlstringCanonical Built In URL: https://builtin.com/company/{slug}
websitestringCompany's own website (first external non-Built In link on the page)
descriptionstringOne-paragraph company description from meta[name="description"]
industrystringPrimary industry (e.g. Cloud · Information Technology · Software)
industriesarray<string>All industries when more than one is listed
locationstringHeadquarters string (e.g. New York, NY)
officesarray<string>All listed office locations (multi-office companies)
employeeCountstringEmployee band as Built In reports it (e.g. 1,000-5,000, 5000+)
foundedYearstringFour-digit founding year
logoUrlstringURL of the company logo from meta[property="og:image"]

Tech stack (the unique field)

FieldTypeDescription
techStackarray<object>Flat list of every technology, each item: { "name": "MongoDB", "category": "DATABASES", "iconUrl": "https://..." }
techStackByCategoryobjectGrouped object: { "LANGUAGES": ["JavaScript", "Java"], "DATABASES": ["MongoDB"], "FRAMEWORKS": ["Django", "Kubernetes"] }
techStackSizeintegerTotal count of tech items extracted across all categories

Talent & culture

FieldTypeDescription
jobsarray<object>Up to 30 recent job postings, each: { "title": "Senior Backend Engineer", "jobId": "123456", "slug": "senior-backend-engineer", "jobUrl": "https://builtin.com/job/..." }
jobCountintegerTotal unique jobs extracted (post-dedup by jobId)
perksarray<string>Up to 50 perks/benefits as Built In lists them (e.g. Unlimited PTO, 401(k) matching, Remote-friendly)
socialLinksarray<string>LinkedIn, Twitter/X, Facebook, Instagram, YouTube, TikTok URLs

Provenance

FieldTypeDescription
sourceUrlstringThe exact URL fetched (matches profileUrl)
scrapedAtstringISO-8601 timestamp captured at the start of the run

Example: MongoDB record (verified against live page)

{
"name": "MongoDB",
"slug": "mongodb",
"profileUrl": "https://builtin.com/company/mongodb",
"website": "https://www.mongodb.com",
"description": "MongoDB is the world's leading modern database platform...",
"industry": "Big Data · Cloud · Database · Software",
"industries": ["Big Data", "Cloud", "Database", "Software"],
"location": "New York, NY",
"offices": ["New York, NY", "Palo Alto, CA", "Austin, TX", "Dublin, Ireland", "Sydney, Australia"],
"employeeCount": "1,000-5,000",
"foundedYear": "2007",
"logoUrl": "https://cdn.builtin.com/logos/mongodb.png",
"techStack": [
{ "name": "C++", "category": "LANGUAGES", "iconUrl": "https://cdn.builtin.com/tech/cpp.svg" },
{ "name": "Java", "category": "LANGUAGES", "iconUrl": "https://cdn.builtin.com/tech/java.svg" },
{ "name": "JavaScript", "category": "LANGUAGES", "iconUrl": "https://cdn.builtin.com/tech/javascript.svg" },
{ "name": "Golang", "category": "LANGUAGES", "iconUrl": "https://cdn.builtin.com/tech/golang.svg" },
{ "name": "Django", "category": "FRAMEWORKS", "iconUrl": "https://cdn.builtin.com/tech/django.svg" },
{ "name": "GraphQL", "category": "FRAMEWORKS", "iconUrl": "https://cdn.builtin.com/tech/graphql.svg" },
{ "name": "Kubernetes", "category": "FRAMEWORKS", "iconUrl": "https://cdn.builtin.com/tech/kubernetes.svg" },
{ "name": "MongoDB", "category": "DATABASES", "iconUrl": "https://cdn.builtin.com/tech/mongodb.svg" }
],
"techStackByCategory": {
"LANGUAGES": ["C++", "Java", "JavaScript", "Golang"],
"FRAMEWORKS": ["Django", "GraphQL", "Kubernetes"],
"DATABASES": ["MongoDB"]
},
"techStackSize": 8,
"perks": ["Unlimited PTO", "401(k) matching", "Equity", "Remote-friendly", "Health insurance"],
"jobs": [
{
"title": "Senior Backend Engineer, Atlas",
"jobId": "234567",
"slug": "senior-backend-engineer-atlas",
"jobUrl": "https://builtin.com/job/senior-backend-engineer-atlas/234567"
},
{
"title": "Staff Site Reliability Engineer",
"jobId": "234568",
"slug": "staff-site-reliability-engineer",
"jobUrl": "https://builtin.com/job/staff-site-reliability-engineer/234568"
}
],
"jobCount": 12,
"socialLinks": [
"https://www.linkedin.com/company/mongodb",
"https://twitter.com/MongoDB",
"https://www.youtube.com/user/MongoDB"
],
"sourceUrl": "https://builtin.com/company/mongodb",
"scrapedAt": "2026-05-18T10:00:00.000Z"
}

Example: Slim record (small startup with no tech stack listed)

{
"name": "ExampleCo",
"slug": "exampleco",
"profileUrl": "https://builtin.com/company/exampleco",
"website": "https://www.example.co",
"description": "Series A fintech startup focused on consumer credit.",
"industry": "Fintech",
"industries": null,
"location": "Austin, TX",
"offices": null,
"employeeCount": "11-50",
"foundedYear": "2022",
"logoUrl": "https://cdn.builtin.com/logos/exampleco.png",
"techStack": null,
"techStackByCategory": null,
"techStackSize": null,
"perks": ["Equity", "Remote-friendly"],
"jobs": [
{
"title": "Founding Engineer",
"jobId": "99999",
"slug": "founding-engineer",
"jobUrl": "https://builtin.com/job/founding-engineer/99999"
}
],
"jobCount": 1,
"socialLinks": ["https://www.linkedin.com/company/exampleco"],
"sourceUrl": "https://builtin.com/company/exampleco",
"scrapedAt": "2026-05-18T10:00:00.000Z"
}

Tech Stack Category Reference

Built In groups technologies under a closed set of UPPERCASE category labels. Knowing the canonical set helps you write downstream WHERE category IN (...) queries with confidence:

CategoryWhat goes thereExamples (verified on live company pages)
LANGUAGESProgramming languagesC++, Java, JavaScript, TypeScript, Python, Go (Golang), Ruby, Scala, Kotlin, Swift, Rust, PHP, C#, R
FRAMEWORKSApp / web / orchestration frameworksReact, Vue, Angular, Django, Flask, Rails, Spring, Express, Next.js, Node.js, Kubernetes, GraphQL, gRPC
DATABASESOLTP / OLAP / NoSQL / cacheMongoDB, Postgres, MySQL, Redis, Cassandra, DynamoDB, Elasticsearch, Snowflake, BigQuery, Redshift
DEVOPSCI/CD, observability, IaCTerraform, Ansible, Jenkins, CircleCI, GitHub Actions, Datadog, New Relic, Splunk, PagerDuty
CLOUD SERVICESHyperscalers + managed servicesAWS, GCP, Azure, Heroku, Vercel, Cloudflare, DigitalOcean
SALESFORCESalesforce ecosystemSalesforce Sales Cloud, Service Cloud, Marketing Cloud, Pardot, Apex
ANALYTICSBI / product analyticsLooker, Tableau, Mixpanel, Amplitude, Heap, Segment
DESIGNDesign toolingFigma, Sketch, Adobe XD, InVision
SEARCH ENGINESDedicated searchElasticsearch, Algolia, Solr, OpenSearch
COLLABORATIONInternal team toolsSlack, Asana, Jira, Linear, Notion, Confluence
OTHERAnything not category-tagged on the pageCatch-all fallback (the scraper emits category: "OTHER" rather than dropping the item)

The exact set of categories rendered for a given company depends on what its recruiting team chose to publish. Empty categories are not emitted to techStackByCategory.


Use Cases

B2B SaaS Prospecting & Tech-Stack-Triggered Outbound

The single highest-ROI use of category-tagged tech stack data is technology-trigger sales. Sales teams use this actor to:

  • Build "every company on technology X" lists — pull all 100+ Built In companies using Snowflake to pitch DBT Cloud, all companies on Kubernetes to pitch managed K8s, all Datadog users to pitch a Datadog alternative
  • Build "every company NOT on technology Y" lists — every company in DATABASES that uses Postgres but not Snowflake = a perfect Snowflake migration pipeline
  • Score accounts by tech maturity — companies with Terraform + Kubernetes + Datadog signal a mature DevOps practice and qualify for higher-ACV plans
  • Auto-segment your CRM — pipe scraped data into Salesforce/HubSpot and tag accounts with custom tech-stack-derived fields (uses_mongodb=true, cloud_provider=aws)
  • Time outreach to job-posting signals — pair tech-stack triggers with the jobs[] field ("posted a Senior Platform Engineer role and uses Kubernetes" == active buyer)
  • Replace expensive sources — BuiltWith / HG Insights / TheirStack subscriptions start at $5K/year for the same fingerprint data Built In publishes openly

Technical Recruiting & Talent Pipeline Intel

Recruiters and talent acquisition teams use the dataset to:

  • Find every company using a target stack — "all Built In companies running PyTorch + AWS + Kubernetes" for ML-engineer placement
  • Source competitor talent — when placing a candidate from Stripe, pull every Built In company on Stripe's stack as a target shortlist
  • Build candidate-to-company matchmaking — given a candidate's resume tech list, rank Built In companies by stack-overlap percentage
  • Monitor competitor hiring velocityjobCount per company per week is a leading indicator of growth / layoffs
  • Identify niche-stack employers — companies with Elixir, Rust, Clojure, or Haskell in LANGUAGES for hard-to-source talent
  • Benchmark perks by stack — compare perks[] arrays across companies with comparable tech stacks for offer-package research

Competitive Vendor-Landscape Mapping

Product, marketing, and CI teams use this to map adoption across categories:

  • Observability landscape — Datadog vs New Relic vs Honeycomb vs Splunk vs Grafana share-of-stack across the Built In universe
  • Data warehouse landscape — Snowflake vs Databricks vs BigQuery vs Redshift adoption by company size
  • CI/CD landscape — CircleCI vs GitHub Actions vs Jenkins vs GitLab CI penetration
  • Frontend framework landscape — React vs Vue vs Angular vs Svelte across new vs established companies
  • Quarterly trend reports — re-run the actor against a fixed slug list every quarter and diff results to spot adoption swings

Investor / VC Portfolio Analytics

Venture capital firms and growth-equity investors use tech-stack data as a leading indicator:

  • Technical-maturity scoring — portfolio companies adopting Terraform + IaC + observability stacks are scaling readiness signals
  • Stack-overlap analysis — for buy-side deals, compare a target's stack to the firm's existing portfolio to estimate integration cost
  • Trend-spot emerging tooling — track quarter-over-quarter adoption of new dev tools across 200 portfolio companies
  • Comp-set construction — build a peer comp set from Built In companies with the same employee band + same primary tech category
  • Sector-thesis validation — confirm "AI infra spend is up" by counting growth in PyTorch / Triton / vLLM adoption across the dataset

Conference, Event & Sponsor Sales

Conference organizers and B2B events teams use this to target outreach:

  • KubeCon — pull every company with Kubernetes + Istio in their stack as a sponsor / attendee target list
  • Snowflake Summit — every Built In company with Snowflake in DATABASES becomes a registrant CRM record
  • Re:Invent / Google Cloud Next — segment by CLOUD SERVICES adoption for hyperscaler-specific events
  • DevOps Days — pull every company with DEVOPS category items for regional event marketing
  • Stack-specific meetups — Rust meetup organizers can pull every Built In company with Rust in LANGUAGES

DevRel & Open-Source Community Building

Developer-relations teams use tech stack data to grow communities:

  • Find new adopters — schedule the actor weekly and diff techStack arrays to detect companies that newly adopted your OSS framework (LangChain, Next.js, Drizzle ORM, etc.)
  • Build customer-story pipelines — identify companies running your stack at scale as case-study candidates
  • Power developer-marketing campaigns — generate "X companies are building on [your framework]" landing-page social proof, auto-updated weekly
  • Find prospective contributors — companies running your project commercially are the most likely source of upstream contributions
  • Targeted ad creative — Built In employer logos make for high-trust social-proof ad units

Executive Search & Headhunting

Headhunters and executive-search firms use the stack + jobs fingerprint to:

  • Identify CTO targets by exact tech-stack overlap with the hiring brief
  • Map VP of Engineering candidates at companies in a comparable employee band running similar infrastructure
  • Build longlist + shortlist for Director-level technical roles in days, not weeks
  • Track when an exec is hiringjobCount spikes correlate with leadership-team expansion windows
  • Cross-reference with perks[] to identify companies offering equity / remote / unlimited PTO as part of the pitch

M&A Scouting & Acquisition Due Diligence

Corporate development and M&A advisors use stack data for build-vs-buy and integration planning:

  • Tech-stack compatibility filter — score acquisition targets by stack overlap with the acquirer
  • Integration cost modelling — incompatibility-heavy targets (Python shop acquiring a Java shop) increase post-deal integration timelines
  • Talent-retention modelling — engineering teams with niche stacks (Erlang, OCaml) are higher flight risks post-acquisition
  • Tuck-in candidate sourcing — find small Built In companies in DATABASES with a complementary stack to your portfolio company
  • Counter-bid intelligence — when a competitor acquires, scrape every comparable Built In company to identify likely next targets

Market Research & Analyst Reports

Industry analysts, equity research, and tech-trade journalists use the dataset to:

  • Quantify adoption trends — produce "what % of Built In companies run Snowflake in 2026" data points
  • Year-over-year shift reporting — re-run quarterly and produce category-level migration narratives ("share of companies running Postgres dropped 8% as Aurora adoption rose")
  • Geographic adoption maps — group techStackByCategory results by location to map regional stack preferences
  • Employee-band-segmented reports — compare stack composition for 11–50 vs 1,000+ employee companies
  • Source quotable data for white papers, investor decks, and trade-publication articles

Programmatic Ad Targeting & Custom Audiences

Marketing teams use stack data to build hyper-relevant ad audiences:

  • LinkedIn Matched Audiences — upload Built In companies + their LinkedIn URLs (from socialLinks[]) as company-targeting audiences
  • Facebook Custom Audiences — same approach for B2B awareness campaigns
  • Programmatic display retargeting — segment ad creative by detected tech stack (different ad to AWS shops vs Azure shops)
  • Account-based-marketing (ABM) — power the top of an ABM funnel with stack-defined target accounts that map exactly to your ICP
  • Email-warming sequences — pair the company list with tech-stack-personalized subject lines and lead-magnet content

Sample Queries & Recipes

Recipe 1: Single-company deep pull

{
"companySlugs": ["mongodb"],
"scrapeJobs": true,
"maxRecords": 1
}

Goal: get one perfect record for schema validation, dashboard prototyping, or demo screenshots.

Recipe 2: Snowflake-adoption hit list

{
"companySlugs": [
"stripe", "plaid", "discord", "instacart",
"doordash", "robinhood", "coinbase", "airtable",
"notion", "linear-app", "figma", "vercel"
],
"scrapeJobs": false,
"maxRecords": 50
}

Then filter downstream:

snowflake_users = [c for c in items
if c.get("techStackByCategory", {}).get("DATABASES")
and "Snowflake" in c["techStackByCategory"]["DATABASES"]]

Goal: a Snowflake sales rep pulls their territory of warm tech-trigger leads in one run.

Recipe 3: Kubernetes + DevOps maturity scoring

{
"startUrls": [
{ "url": "https://builtin.com/company/datadog" },
{ "url": "https://builtin.com/company/snyk" },
{ "url": "https://builtin.com/company/honeycomb-io" }
],
"maxRecords": 20
}

Goal: every company on K8s + a DEVOPS category entry gets a +1 maturity score in a DevOps-tooling vendor's lead-scoring model.

Recipe 4: ML / AI infrastructure prospects

{
"companySlugs": [
"openai", "anthropic", "hugging-face", "scale-ai",
"weights-biases", "databricks", "pinecone", "weaviate"
],
"scrapeJobs": true,
"maxRecords": 30
}

Goal: identify ML platform spend signals — PyTorch + AWS + Kubernetes in the stack PLUS active ML-engineer roles in jobs[] = active buyer for vector DBs, GPU clouds, or MLOps tooling.

Recipe 5: Multi-office expansion targets

{
"companySlugs": ["mongodb", "stripe", "snowflake-computing-inc"],
"scrapeJobs": false,
"maxRecords": 50
}

Goal: any company with 3+ entries in offices[] qualifies as a multi-region target for enterprise contract software (HRIS, payroll, global benefits).

Recipe 6: New-adopter tracking (weekly diff)

{
"companySlugs": [
"vercel", "linear-app", "supabase",
"planetscale", "neon-tech", "railway-app"
],
"scrapeJobs": false,
"maxRecords": 50
}

Schedule weekly. Diff techStack[] arrays week-over-week to find companies that newly adopted your DevRel team's framework.

Recipe 7: High-throughput crawl with politeness

{
"companySlugs": ["mongodb", "datadog", "stripe", "snowflake-computing-inc",
"plaid", "robinhood", "coinbase", "airbnb", "doordash"],
"scrapeJobs": true,
"maxRecords": 500,
"requestDelay": 1500,
"maxConcurrency": 2
}

Goal: large-batch overnight run, optimized for "never get blocked" rather than for raw speed.


Integration Examples

Google Sheets (via Apify Integration)

  1. Schedule the actor (e.g. weekly Sunday 22:00 UTC) with a fixed slug list
  2. Add the "Export to Google Sheets" integration to the schedule
  3. Receive a fresh Built In company sheet every week, including the JSON-stringified techStackByCategory field for spreadsheet-side filtering

Make.com / Zapier / n8n

Use the Apify connector on any major automation platform. Trigger downstream workflows on:

  • New companies appearing in your watch list
  • New tech-stack items added since last run (auto-detect adoption events)
  • New jobs posted since last run (auto-route to sales / recruiter Slack channels)
  • Office count changes (geographic-expansion signal)
  • Employee-count band changes (growth signal)

Postgres / Snowflake / BigQuery

Recommended schema for warehouse ingestion:

CREATE TABLE builtin_companies (
scraped_at TIMESTAMP,
slug TEXT PRIMARY KEY,
name TEXT,
industry TEXT,
location TEXT,
employee_count TEXT,
founded_year TEXT,
tech_stack_size INTEGER,
job_count INTEGER,
tech_stack JSONB, -- raw array of {name, category, iconUrl}
tech_by_category JSONB, -- grouped object
perks JSONB,
offices JSONB,
social_links JSONB,
source_url TEXT
);

Use the Apify webhook to POST run results to a small ingest endpoint after every scheduled run.

Power BI / Tableau / Looker

Connect the Apify REST API as a data source. Build dashboards covering:

  • Top 20 most-adopted technologies across the dataset
  • Category share-of-wallet (LANGUAGES mix, DATABASES mix, CLOUD SERVICES mix)
  • Adoption growth quarter-over-quarter for a target technology
  • Geographic heat map of company HQs
  • Employee-band × stack-category cross-tab

Salesforce / HubSpot CRM Enrichment

Run the actor on your CRM-account-domain → Built In-slug map nightly. Upsert against Account records keyed on slug. Custom-field examples:

  • uses_mongodb__c (boolean) — derived from techStack containing MongoDB
  • cloud_provider__c (picklist) — derived from techStackByCategory.CLOUD SERVICES
  • engineering_team_size__c (number) — proxy via employeeCount band
  • active_engineering_roles__c (number) — from jobCount
  • tech_stack_fingerprint__c (longtext) — JSON-stringified techStackByCategory

Webhooks for Real-Time Triggers

Wire Apify run-complete webhooks into your internal automation:

// In your webhook handler
for (const company of newItems) {
if (company.techStack?.some(t => t.name === 'Snowflake')
&& company.jobCount > 0) {
notifySalesRep('snowflake-team', company);
}
}

Major Markets & Tech Hubs at a Glance

Built In's company directory skews toward US tech hubs plus selected international cities. The actor returns whatever HQ a company self-reports, so you get a global footprint when scraping global companies:

Tech HubBuilt In PresenceNotes
San Francisco / Bay AreaVery highHeadquarters for OpenAI, Stripe, Airbnb, Databricks, Anthropic and most YC-funded growth-stage startups
New York, NYVery highFinance + media-tech: MongoDB, Datadog, Peloton, Squarespace, Etsy
Austin, TXVery highCrypto + B2B SaaS: Indeed, Bumble, RetailMeNot
Seattle, WAHighCloud-adjacent: Amazon-orbit + Smartsheet, Outreach, Highspot
Boston, MAHighBiotech + enterprise: HubSpot, Wayfair, DraftKings
Los Angeles, CAHighEntertainment-tech + creator economy
Chicago, ILHighBuilt In's birthplace — Sprout Social, Coinbase Chicago, Tock
Denver / Boulder, COHighClimate-tech + SaaS
Atlanta, GAMedium-highFintech + supply-chain
Miami, FLMediumCrypto + LATAM-facing tech
Toronto, CanadaMediumShopify-orbit
London, UKMediumEuropean tech HQs
Dublin, IrelandMediumEuropean hubs of US companies (MongoDB Dublin, Stripe Dublin)
Berlin, GermanyMediumEuropean startup ecosystem
Sydney, AustraliaMediumAPAC offices
SingaporeLow-mediumAPAC regional offices

The dataset is as global as Built In is — coverage depth follows where Built In's editorial team and recruiting customer base focus.


Cost & Performance

MetricValue
EngineHTTP-only — got-scraping + cheerio
Runtime per company~1–2 seconds (default delay + parsing)
Runtime for 100 companies~3–5 minutes (with default 1000 ms delay, concurrency 3)
Runtime for 1,000 companies~30–50 minutes (recommended overnight)
Cost per companyFractions of a cent in Compute Units
Pricing modelPay-per-event — only pay when you run
Data freshnessLive at run time — exactly what Built In is serving the public right now
Auth requiredNone
Proxy requiredOptional — disabled by default
ConcurrencyDefault 3; safe range 2–4; ceiling 10
Memory footprint256 MB ample; 512 MB if scraping 1,000+ companies in one run
Failure modeActor.fail() on zero records — no silent empty datasets

  • Public data only — every field returned is published openly by Built In at https://builtin.com/company/{slug} and rendered to any unauthenticated visitor
  • No PII / no PHI — the dataset contains zero patient health information, no personal identifiers beyond what companies voluntarily publish on their own recruiting pages (company name, office addresses, jobs)
  • No emails or phone numbers of individuals are extracted
  • No login / no scraping behind paywalls — the actor never authenticates to Built In
  • Polite by default — per-request delay + low concurrency + realistic browser headers + 3-attempt retry with backoff respect Built In's infrastructure
  • robots.txt deference — operators using this actor should review Built In's current robots.txt and the site's Terms of Service; the actor itself does not embed any robots-bypass logic
  • GDPR / CCPA — compliance with downstream data-protection regulation is the responsibility of the data consumer. Company-level firmographic data is generally outside GDPR's personal-data scope but always confirm with counsel
  • CAN-SPAM, TCPA — if you use scraped data for outbound marketing, compliance with anti-spam and call-restriction laws is your responsibility

Important: Built In data may not be used for unlawful purposes. Read Built In's Terms of Service and use the data only for the legitimate business, research, recruiting, and journalism purposes for which the company publishes it.


Frequently Asked Questions

How fresh is the data?

Live at run time. Every run fetches the current HTML directly from builtin.com/company/{slug}. There is no caching layer between Built In's web server and the data you receive. If a company updated its tech stack 10 minutes before your run, you will see the new entries.

How many companies can I scrape in one run?

There is no hard cap from the actor itself — the maxRecords parameter is your limit. Practically, 100 companies takes ~3–5 minutes, 1,000 takes ~30–50 minutes at default politeness settings. For larger crawls, schedule multiple runs or increase maxConcurrency modestly.

Does this require a Built In account or API key?

No. Built In does not require authentication to view company profiles. The actor only needs your Apify token.

Why is the tech stack the key feature?

Because no other major company-data provider publishes the per-company tech stack labelled by category in a structured, openly-scrapable form. LinkedIn, Crunchbase, AngelList, and PitchBook either omit it entirely or hide it behind enterprise contracts costing tens of thousands of dollars per year. Built In publishes it openly because companies use Built In to recruit engineers, and engineers want to know what stack they'd be working on.

Which categories does Built In use?

The common set we have verified across live profiles: LANGUAGES, FRAMEWORKS, DATABASES, DEVOPS, CLOUD SERVICES, SALESFORCE, ANALYTICS, DESIGN, SEARCH ENGINES, COLLABORATION. The actor preserves whatever UPPERCASE label Built In renders. Anything without an explicit category label is grouped under OTHER so you never lose data.

How do I find a company's slug?

The slug is the last path segment of the Built In URL. For https://builtin.com/company/snowflake-computing-inc, the slug is snowflake-computing-inc. You can also paste the full URL into startUrls and the actor will parse the slug automatically.

What happens if a slug is wrong / 404?

The fetch returns either no HTML or a thin "not found" page; the actor's body-size check (> 1000 bytes) and selector validation (must find <h1> and tech section) drop empty rows rather than push garbage. If every slug 404s, the actor calls Actor.fail() so your pipeline sees an explicit failure.

Can I run this against startUrls and companySlugs at the same time?

Yes. Both inputs are merged into a single queue with URL-level deduplication. startUrls win when both refer to the same slug.

Does the actor handle multi-office companies?

Yes. The offices[] array captures every <address> element under the Offices section, so global companies like MongoDB, Stripe, and Snowflake come back with all their regional offices intact.

Can I disable job scraping?

Yes — set scrapeJobs: false. The actor skips the /job/... link extraction loop, shaving a small amount of parsing time when you only care about firmographics + tech stack.

Does this work for international companies?

Yes, anywhere Built In has a profile. While Built In's coverage skews to US tech hubs, it lists companies headquartered in Canada, the UK, Ireland, Germany, Australia, Singapore, and beyond. The actor's parsing logic is country-agnostic.

Is residential / proxy required?

No, not by default. Built In's anti-bot is light enough that direct datacenter IPs work for moderate loads. Enable Apify Residential US if you crawl thousands of companies in a single run or hit rate-limit responses.

Does this work on the Apify Free Plan?

Yes — full functionality. Small runs (10–50 companies) typically cost only fractions of a cent in Compute Units, well within the free monthly allowance.

How is this different from BuiltWith / Wappalyzer / TheirStack?

Those products use inferred tech stack — they look at HTTP headers, JS fingerprints, and DNS records to guess what tech a company runs. This actor extracts what companies explicitly publish about their stack to recruit engineers. It is far more accurate for backend services (databases, frameworks, languages) that don't leak to the public web, but only covers companies that have a Built In profile.

Can I schedule this to run automatically?

Yes — Apify's built-in Scheduler supports hourly, daily, weekly, or arbitrary-cron schedules. A weekly run is the sweet spot for tracking tech-stack adoption changes; daily for active sales/recruiting watchlists.

What output formats are supported?

JSON, CSV, Excel (XLSX), HTML, XML, RSS, and JSON Lines — directly from the Apify dataset view or via the dataset items API.

How do I detect when a company adopts a new technology?

Schedule the actor weekly on a fixed slug list. After each run, compute the set difference between this run's techStack[] names and last run's. Any new entry is a freshly-adopted technology — these are gold for sales/DevRel outreach.

Why might techStack come back null for some companies?

Smaller companies, recently-claimed profiles, or non-engineering companies often choose not to publish a tech stack. The actor returns null rather than an empty array so you can distinguish "no data published" from "we tried and got zero items."

How do I report a bug or request a feature?

Use the Issues tab on the Apify Store actor page, or contact the developer directly through the Apify Console.


If you're building a tech-talent / B2B SaaS intel stack, these complementary actors plug in cleanly:


Comparison vs. Alternatives

ApproachSetup timeTech-stack granularityCategory labelsUpdatedCost (1,000 companies)
This actor< 5 minutesPer-company exact stackYes — LANGUAGES, DATABASES, etc.Live at run timeFractions of a cent in CUs
Manual copy-paste from Built InHours/daysHighYes (manual)Stale by hour 1Free + analyst hours
Custom Cheerio scraper (DIY)8–20 hours devSameSame (if you debug it)LiveFree + ongoing maintenance
BuiltWith / HG InsightsDays (sales cycle)Inferred (fingerprint-based)SomeDaily-ish$5K–30K/year
TheirStackDaysInferredSomeDaily-ish$3K–15K/year
LinkedIn Sales NavigatorHoursPartial — no stackNoneLive$1K+/year/seat
Crunchbase EnterpriseDaysMinimalNoneDaily$30K+/year

Why Pay-Per-Event Pricing?

Most data scrapers either charge a flat monthly subscription (you pay even on weeks you don't run) or per-Compute-Unit (unpredictable on long jobs). This actor uses pay-per-event, which means:

  • You only pay when the actor actually runs
  • Charges scale with how many companies you actually consume
  • Transparent, line-item billing inside Apify
  • No monthly minimums and no annual commitments
  • Free to evaluate — sample with maxRecords: 5 for pennies before any larger crawl
  • Predictable per-record cost makes ROI modelling easy

Changelog

VersionDateNotes
1.0.02026-05Initial public release — HTTP-only got-scraping + cheerio, full category-tagged tech stack, recent jobs, perks, offices, multi-social-link extraction, configurable politeness, optional proxy, Actor.fail() on zero records

Keywords

BuiltIn scraper · BuiltIn tech stack scraper · builtin.com company scraper · builtin.com tech stack · tech stack database · company tech stack lookup · category-tagged tech stack · developer recruiting database · technical recruiter intel · B2B SaaS prospecting · tech vendor landscape · competitive intel scraper · tech company directory scraper · company technology fingerprint · BuiltIn company profile scraper · tech stack fingerprint · technology adoption signals · tech-trigger sales leads · LANGUAGES FRAMEWORKS DATABASES scraper · MongoDB user list · Datadog user list · Snowflake adopter list · Kubernetes user database · AWS GCP Azure adopter lookup · React Django Rails framework adoption · ML engineer recruiting database · DevOps tooling vendor landscape · CTO target sourcing · M&A tech-stack compatibility · YC startup tech stack · investor portfolio technical maturity · KubeCon sponsor list · Snowflake Summit prospect list · DevRel adoption tracking · OSS framework new-adopter detection · ABM target accounts · technology-triggered outbound · tech company directory API · BuiltIn Apify actor · Built In tech company scraper · BuiltIn job listings scraper · BuiltIn perks scraper · BuiltIn office locations · startup technology fingerprint · scale-up tech stack database · BuiltIn data extraction · BuiltIn HTML scraper · BuiltIn cheerio scraper · BuiltIn HTTP scraper


Support

  • Bug reports: Use the Issues tab on the Apify Store page
  • Feature requests: Same place — please describe your use case so we can prioritize correctly
  • Direct contact: Through the Apify developer profile

If this actor saves your sales / recruiting / DevRel team hours of manual research, a 5-star rating on the Apify Store helps other tech-go-to-market teams discover it. Thank you!