BuiltIn.com Tech Companies & Tech Stack Scraper
Pricing
from $3.00 / 1,000 results
BuiltIn.com Tech Companies & Tech Stack Scraper
Scrape Built In company profiles — name, industry, location, recent jobs, full tech stack with category labels (LANGUAGES / FRAMEWORKS / DATABASES / etc.). Unique technology-fingerprint data for B2B SaaS prospecting, recruiter intel and competitive analysis. HTTP-only.
Pricing
from $3.00 / 1,000 results
Rating
0.0
(0)
Developer
Haketa
Maintained by CommunityActor stats
0
Bookmarked
2
Total users
1
Monthly active users
a day ago
Last modified
Categories
Share
BuiltIn.com Tech Companies & Tech Stack Scraper — Category-Tagged Technology Fingerprints for Every Startup on Built In
The fastest way to extract category-labelled tech stacks (LANGUAGES / FRAMEWORKS / DATABASES / DEVOPS / CLOUD SERVICES / SALESFORCE) from every company profile on builtin.com. Pull the unique Technology We Use fingerprint that LinkedIn, Crunchbase, AngelList and PitchBook do not publish — and turn it into a B2B SaaS prospecting list, recruiter pipeline, or competitive-vendor landscape in minutes. HTTP-only, no browser, no login.
What This Actor Does
The BuiltIn.com Tech Companies & Tech Stack Scraper is a production-grade Apify Actor that turns any public Built In company profile (builtin.com/company/{slug}) into a clean, structured JSON record — including the single most valuable field Built In publishes that no other major company database carries in structured form: the per-company Technology We Use panel, tagged by category (LANGUAGES, FRAMEWORKS, DATABASES, DEVOPS, CLOUD SERVICES, SALESFORCE, ANALYTICS, DESIGN, etc.).
Built In is a tech-focused career platform used by thousands of startups, growth-stage scale-ups, and public tech companies to recruit engineering talent. To attract developers, these companies voluntarily publish a granular breakdown of the languages, frameworks, databases, cloud providers, observability stacks, and design tools they actually run in production — the kind of fingerprint that competitive-intelligence platforms typically gate behind $30,000-a-year contracts.
This actor extracts that fingerprint at HTTP speed (~1–2 seconds per company) so you can build:
- B2B SaaS prospect lists segmented by exact technology adoption ("every Built In company using MongoDB but not Snowflake")
- Recruiter pipelines filtered by stack ("every startup running PyTorch + AWS + Kubernetes")
- Competitive vendor-landscape maps ("Datadog vs New Relic vs Honeycomb adoption across the Built In ecosystem")
- Investor portfolio dashboards tracking technical maturity signals (DevOps + CI/CD adoption == scaling)
- Conference / sponsor outreach lists targeting users of a specific category-tagged technology
Every record returned includes the company's identity, industry, location, employee band, founded year, multi-office presence, full perks list, social links, recently posted jobs, and the full category-grouped tech stack — ready to drop into Postgres, Snowflake, BigQuery, Salesforce, HubSpot, or Google Sheets.
Entity types returned per company
- Identity —
name,slug,profileUrl,website,logoUrl,description - Firmographics —
industry,industries[],location,offices[],employeeCount,foundedYear - Tech stack —
techStack[](flat list with{name, category, iconUrl}),techStackByCategory(grouped object:{ LANGUAGES: [...], FRAMEWORKS: [...], DATABASES: [...] }),techStackSize - Talent signals —
jobs[](recent postings: title, jobId, slug, jobUrl),jobCount,perks[] - Social presence —
socialLinks[](LinkedIn, Twitter/X, Facebook, Instagram, YouTube, TikTok) - Provenance —
sourceUrl,scrapedAt
Why scrape Built In yourself when this exists?
Built In's public HTML looks deceptively simple — until you actually try to parse 500 company profiles in a row. Teams who try the DIY route consistently hit the same wall:
- The Technology We Use panel is rendered inside repeating
<div class="tech-icon-container">blocks with no JSON-LD or microdata fallback — every tech item is an<img alt="...">plus two sibling<div>elements, one for the tech name and one for the UPPERCASE category label. Naïve scrapers grab the names and lose the category entirely. - Category labels are positional, not semantic — you cannot rely on a CSS class to know whether
MongoDBbelongs to DATABASES or to LANGUAGES; you have to walk the sibling structure and detect "is this child an UPPERCASE category vs a tech name?" logic. - Tabs split the stack by department (Engineering / Data / Design / Marketing) — a single-pass scraper that doesn't traverse all tab containers will under-count tech stack by 40–70%.
- Industry, employee count, founded year, and HQ live inside a fact list with inconsistent labels — sometimes
Total Employees, sometimesTeam Size, sometimesEmployees; same forHeadquartersvsLocationvsHQ. - Multi-office companies (Stripe, Snowflake, MongoDB) list 3–10 office locations inside
<address>tags scattered across the page — manual parsers usually capture only the first. - Job links use the
/job/{slug}/{numericId}pattern, and the same job link can appear multiple times on the page (hero banner + recent jobs list + footer) — deduplication byjobIdis required. - Social links are mixed into a generic outbound-anchor blob alongside affiliate pixels, support links, and Built In's own internal
https://builtin.com/...URLs — you need a filter on(facebook|linkedin|twitter|x|youtube|instagram|tiktok).comand an explicit exclusion of the Built In domain. - 404 slugs return a styled "company not found" page (which still returns HTTP 200 in some cases) — your parser must validate that an
<h1>and atech-icon-containeractually exist before pushing a row. - Header generation matters — Built In aggressively serves a minimalist fallback page to bot-fingerprinted requests (no User-Agent rotation, no Accept-Language, no Sec-CH-UA hints).
- Politeness matters more — 50+ requests per minute from a single IP will trigger a soft block within a few minutes.
This actor solves all of that: realistic browser headers via got-scraping, 3-attempt exponential-backoff retries, polite per-request delay + concurrency limiter, full category-grouped tech-stack normalization, dedup by jobId, social-link filter, and Actor.fail() when zero records come back so you never silently ship an empty dataset to your downstream warehouse.
Quick Start
One-Click Run
- Click "Try for free" on the Apify Store page
- Add a handful of company slugs (e.g.
mongodb,datadog,stripe,plaid) - Hit Start — typical run is under 30 seconds for 10 companies
- Download the dataset as JSON, CSV, Excel, HTML, XML, or RSS directly from the Apify dataset view
API Run (Python)
from apify_client import ApifyClientclient = ApifyClient("YOUR_APIFY_TOKEN")run = client.actor("haketa/builtin-tech-companies-scraper").call(run_input={"companySlugs": ["mongodb", "datadog", "stripe", "snowflake-computing-inc", "plaid"],"scrapeJobs": True,"maxRecords": 100,"requestDelay": 1000,"maxConcurrency": 3})for company in client.dataset(run["defaultDatasetId"]).iterate_items():print(company["name"], "→", company["techStackSize"], "tech items")if company.get("techStackByCategory"):for category, items in company["techStackByCategory"].items():print(f" {category}: {', '.join(items)}")
API Run (Node.js / TypeScript)
import { ApifyClient } from 'apify-client';const client = new ApifyClient({ token: 'YOUR_APIFY_TOKEN' });const run = await client.actor('haketa/builtin-tech-companies-scraper').call({startUrls: [{ url: 'https://builtin.com/company/mongodb' },{ url: 'https://builtin.com/company/datadog' }],scrapeJobs: true,maxRecords: 50});const { items } = await client.dataset(run.defaultDatasetId).listItems();// Find every company running Kubernetesconst k8sUsers = items.filter(c =>c.techStack?.some(t => t.name === 'Kubernetes'));console.log(`${k8sUsers.length} companies on Kubernetes`);
API Run (cURL)
curl -X POST "https://api.apify.com/v2/acts/haketa~builtin-tech-companies-scraper/runs?token=YOUR_TOKEN" \-H "Content-Type: application/json" \-d '{"companySlugs": ["mongodb", "datadog", "stripe"],"scrapeJobs": true,"maxRecords": 100}'
How It Works
Built In serves fully server-rendered HTML for every /company/{slug} URL — no SPA, no GraphQL, no client-side hydration required to see the data. That makes this actor HTTP-only via got-scraping + cheerio, which keeps cost and runtime an order of magnitude lower than browser-based equivalents.
Source endpoints
| Source URL pattern | Purpose | Example |
|---|---|---|
https://builtin.com/company/{slug} | Single company profile page | https://builtin.com/company/mongodb |
https://builtin.com/job/{slug}/{numericId} | Individual job posting (extracted as link only) | https://builtin.com/job/senior-backend-engineer/123456 |
Architecture
- Direct HTTPS GET with realistic Chrome 120+ desktop headers (User-Agent, Accept-Language, Sec-CH-UA), generated per request by
got-scraping's header generator - No headless browser — no Puppeteer, no Playwright, no Chrome — keeps runtime ~1–2 seconds per company and memory < 256 MB
- Cheerio HTML parsing with targeted selectors per data section
- 3-attempt retry with exponential backoff + jitter (2s × attempt + random 0–1500ms) on HTTP failures or thin-body responses
- Polite request pacing — configurable
requestDelay(default 1000 ms) per fetch + jitter to avoid burst-pattern detection - Concurrency limiter — async worker pool (
maxConcurrency, default 3) processes the slug queue in parallel without overwhelming Built In - Tech stack normalizer — walks
.tech-icon-containerblocks, extracts each<img alt="...">as the tech name, finds the sibling UPPERCASE<div>as the category, and emits both a flattechStack[]array and a groupedtechStackByCategoryobject - Job deduplication — multiple appearances of the same
/job/{slug}/{id}link on a page collapse to a single entry keyed onjobId - Social-link filter — matches
(facebook|instagram|twitter|x|linkedin|youtube|tiktok).comwhile excluding anybuiltin.comURL - Office multi-extraction — captures every
<address>element within the Offices section so multi-location companies (Stripe, Snowflake, MongoDB) come through fully - Hard fail on empty —
Actor.fail()triggers if zero rows are written, so downstream pipelines never silently consume an empty dataset
Proxy
Proxy is optional and disabled by default. Built In's anti-bot is light enough that direct datacenter IPs work for moderate loads. If you scale to thousands of companies per run or your IP ends up rate-limited, enable Apify Residential US through the standard proxy configuration block.
Input Parameters
{"startUrls": [{ "url": "https://builtin.com/company/mongodb" },{ "url": "https://builtin.com/company/datadog" }],"companySlugs": ["stripe", "snowflake-computing-inc", "plaid"],"scrapeJobs": true,"maxRecords": 100,"requestDelay": 1000,"maxConcurrency": 3,"proxyConfiguration": { "useApifyProxy": false }}
Parameter reference
| Parameter | Type | Default | Description |
|---|---|---|---|
startUrls | array<object|string> | [] | Paste any Built In company URL — https://builtin.com/company/{slug}. When provided, overrides companySlugs. Mix-and-match is allowed; the actor parses the slug out of each URL. |
companySlugs | array<string> | ["mongodb"] | Built In company URL slugs. Examples: mongodb, datadog, stripe, snowflake-computing-inc, plaid. Each runs as a separate task. Slugs are normalized to lower-case and stripped of any leading /company/ prefix. |
scrapeJobs | boolean | true | When true, the parser extracts up to 30 unique recent job postings per company (title, jobId, slug, jobUrl). Disable to shave a small amount of parsing time when you only care about firmographics + tech stack. |
maxRecords | integer | 100 | Hard cap on total companies saved. Set 0 for unlimited. Useful for sampling. |
requestDelay | integer (ms) | 1000 | Delay between company-page fetches (plus 0–500 ms jitter). 800–2000 ms is the polite zone. |
maxConcurrency | integer | 3 | Parallel company-page fetches. 2–4 is safe; values > 5 risk triggering soft blocks. |
proxyConfiguration | object | { "useApifyProxy": false } | Optional. Built In has light anti-bot — proxy is generally not required for moderate use. Enable Apify Residential US if you scale to thousands of companies per run. |
Tip: If you provide both
startUrlsandcompanySlugs, the actor merges and de-duplicates the queue, so you can mix paste-in URLs from a teammate with a programmatic list from your data warehouse without writing dedup logic yourself.
Output Schema
Every row is one company. All fields are nullable so you can ingest the dataset into a strict schema (Postgres, BigQuery) without per-record branching.
Identity & firmographics
| Field | Type | Description |
|---|---|---|
name | string | Company name as displayed in the <h1> of the profile page (e.g. MongoDB, Datadog) |
slug | string | URL slug (mongodb, snowflake-computing-inc) |
profileUrl | string | Canonical Built In URL: https://builtin.com/company/{slug} |
website | string | Company's own website (first external non-Built In link on the page) |
description | string | One-paragraph company description from meta[name="description"] |
industry | string | Primary industry (e.g. Cloud · Information Technology · Software) |
industries | array<string> | All industries when more than one is listed |
location | string | Headquarters string (e.g. New York, NY) |
offices | array<string> | All listed office locations (multi-office companies) |
employeeCount | string | Employee band as Built In reports it (e.g. 1,000-5,000, 5000+) |
foundedYear | string | Four-digit founding year |
logoUrl | string | URL of the company logo from meta[property="og:image"] |
Tech stack (the unique field)
| Field | Type | Description |
|---|---|---|
techStack | array<object> | Flat list of every technology, each item: { "name": "MongoDB", "category": "DATABASES", "iconUrl": "https://..." } |
techStackByCategory | object | Grouped object: { "LANGUAGES": ["JavaScript", "Java"], "DATABASES": ["MongoDB"], "FRAMEWORKS": ["Django", "Kubernetes"] } |
techStackSize | integer | Total count of tech items extracted across all categories |
Talent & culture
| Field | Type | Description |
|---|---|---|
jobs | array<object> | Up to 30 recent job postings, each: { "title": "Senior Backend Engineer", "jobId": "123456", "slug": "senior-backend-engineer", "jobUrl": "https://builtin.com/job/..." } |
jobCount | integer | Total unique jobs extracted (post-dedup by jobId) |
perks | array<string> | Up to 50 perks/benefits as Built In lists them (e.g. Unlimited PTO, 401(k) matching, Remote-friendly) |
socialLinks | array<string> | LinkedIn, Twitter/X, Facebook, Instagram, YouTube, TikTok URLs |
Provenance
| Field | Type | Description |
|---|---|---|
sourceUrl | string | The exact URL fetched (matches profileUrl) |
scrapedAt | string | ISO-8601 timestamp captured at the start of the run |
Example: MongoDB record (verified against live page)
{"name": "MongoDB","slug": "mongodb","profileUrl": "https://builtin.com/company/mongodb","website": "https://www.mongodb.com","description": "MongoDB is the world's leading modern database platform...","industry": "Big Data · Cloud · Database · Software","industries": ["Big Data", "Cloud", "Database", "Software"],"location": "New York, NY","offices": ["New York, NY", "Palo Alto, CA", "Austin, TX", "Dublin, Ireland", "Sydney, Australia"],"employeeCount": "1,000-5,000","foundedYear": "2007","logoUrl": "https://cdn.builtin.com/logos/mongodb.png","techStack": [{ "name": "C++", "category": "LANGUAGES", "iconUrl": "https://cdn.builtin.com/tech/cpp.svg" },{ "name": "Java", "category": "LANGUAGES", "iconUrl": "https://cdn.builtin.com/tech/java.svg" },{ "name": "JavaScript", "category": "LANGUAGES", "iconUrl": "https://cdn.builtin.com/tech/javascript.svg" },{ "name": "Golang", "category": "LANGUAGES", "iconUrl": "https://cdn.builtin.com/tech/golang.svg" },{ "name": "Django", "category": "FRAMEWORKS", "iconUrl": "https://cdn.builtin.com/tech/django.svg" },{ "name": "GraphQL", "category": "FRAMEWORKS", "iconUrl": "https://cdn.builtin.com/tech/graphql.svg" },{ "name": "Kubernetes", "category": "FRAMEWORKS", "iconUrl": "https://cdn.builtin.com/tech/kubernetes.svg" },{ "name": "MongoDB", "category": "DATABASES", "iconUrl": "https://cdn.builtin.com/tech/mongodb.svg" }],"techStackByCategory": {"LANGUAGES": ["C++", "Java", "JavaScript", "Golang"],"FRAMEWORKS": ["Django", "GraphQL", "Kubernetes"],"DATABASES": ["MongoDB"]},"techStackSize": 8,"perks": ["Unlimited PTO", "401(k) matching", "Equity", "Remote-friendly", "Health insurance"],"jobs": [{"title": "Senior Backend Engineer, Atlas","jobId": "234567","slug": "senior-backend-engineer-atlas","jobUrl": "https://builtin.com/job/senior-backend-engineer-atlas/234567"},{"title": "Staff Site Reliability Engineer","jobId": "234568","slug": "staff-site-reliability-engineer","jobUrl": "https://builtin.com/job/staff-site-reliability-engineer/234568"}],"jobCount": 12,"socialLinks": ["https://www.linkedin.com/company/mongodb","https://twitter.com/MongoDB","https://www.youtube.com/user/MongoDB"],"sourceUrl": "https://builtin.com/company/mongodb","scrapedAt": "2026-05-18T10:00:00.000Z"}
Example: Slim record (small startup with no tech stack listed)
{"name": "ExampleCo","slug": "exampleco","profileUrl": "https://builtin.com/company/exampleco","website": "https://www.example.co","description": "Series A fintech startup focused on consumer credit.","industry": "Fintech","industries": null,"location": "Austin, TX","offices": null,"employeeCount": "11-50","foundedYear": "2022","logoUrl": "https://cdn.builtin.com/logos/exampleco.png","techStack": null,"techStackByCategory": null,"techStackSize": null,"perks": ["Equity", "Remote-friendly"],"jobs": [{"title": "Founding Engineer","jobId": "99999","slug": "founding-engineer","jobUrl": "https://builtin.com/job/founding-engineer/99999"}],"jobCount": 1,"socialLinks": ["https://www.linkedin.com/company/exampleco"],"sourceUrl": "https://builtin.com/company/exampleco","scrapedAt": "2026-05-18T10:00:00.000Z"}
Tech Stack Category Reference
Built In groups technologies under a closed set of UPPERCASE category labels. Knowing the canonical set helps you write downstream WHERE category IN (...) queries with confidence:
| Category | What goes there | Examples (verified on live company pages) |
|---|---|---|
LANGUAGES | Programming languages | C++, Java, JavaScript, TypeScript, Python, Go (Golang), Ruby, Scala, Kotlin, Swift, Rust, PHP, C#, R |
FRAMEWORKS | App / web / orchestration frameworks | React, Vue, Angular, Django, Flask, Rails, Spring, Express, Next.js, Node.js, Kubernetes, GraphQL, gRPC |
DATABASES | OLTP / OLAP / NoSQL / cache | MongoDB, Postgres, MySQL, Redis, Cassandra, DynamoDB, Elasticsearch, Snowflake, BigQuery, Redshift |
DEVOPS | CI/CD, observability, IaC | Terraform, Ansible, Jenkins, CircleCI, GitHub Actions, Datadog, New Relic, Splunk, PagerDuty |
CLOUD SERVICES | Hyperscalers + managed services | AWS, GCP, Azure, Heroku, Vercel, Cloudflare, DigitalOcean |
SALESFORCE | Salesforce ecosystem | Salesforce Sales Cloud, Service Cloud, Marketing Cloud, Pardot, Apex |
ANALYTICS | BI / product analytics | Looker, Tableau, Mixpanel, Amplitude, Heap, Segment |
DESIGN | Design tooling | Figma, Sketch, Adobe XD, InVision |
SEARCH ENGINES | Dedicated search | Elasticsearch, Algolia, Solr, OpenSearch |
COLLABORATION | Internal team tools | Slack, Asana, Jira, Linear, Notion, Confluence |
OTHER | Anything not category-tagged on the page | Catch-all fallback (the scraper emits category: "OTHER" rather than dropping the item) |
The exact set of categories rendered for a given company depends on what its recruiting team chose to publish. Empty categories are not emitted to
techStackByCategory.
Use Cases
B2B SaaS Prospecting & Tech-Stack-Triggered Outbound
The single highest-ROI use of category-tagged tech stack data is technology-trigger sales. Sales teams use this actor to:
- Build "every company on technology X" lists — pull all 100+ Built In companies using Snowflake to pitch DBT Cloud, all companies on Kubernetes to pitch managed K8s, all Datadog users to pitch a Datadog alternative
- Build "every company NOT on technology Y" lists — every company in DATABASES that uses Postgres but not Snowflake = a perfect Snowflake migration pipeline
- Score accounts by tech maturity — companies with Terraform + Kubernetes + Datadog signal a mature DevOps practice and qualify for higher-ACV plans
- Auto-segment your CRM — pipe scraped data into Salesforce/HubSpot and tag accounts with custom tech-stack-derived fields (
uses_mongodb=true,cloud_provider=aws) - Time outreach to job-posting signals — pair tech-stack triggers with the
jobs[]field ("posted a Senior Platform Engineer role and uses Kubernetes" == active buyer) - Replace expensive sources — BuiltWith / HG Insights / TheirStack subscriptions start at $5K/year for the same fingerprint data Built In publishes openly
Technical Recruiting & Talent Pipeline Intel
Recruiters and talent acquisition teams use the dataset to:
- Find every company using a target stack — "all Built In companies running PyTorch + AWS + Kubernetes" for ML-engineer placement
- Source competitor talent — when placing a candidate from Stripe, pull every Built In company on Stripe's stack as a target shortlist
- Build candidate-to-company matchmaking — given a candidate's resume tech list, rank Built In companies by stack-overlap percentage
- Monitor competitor hiring velocity —
jobCountper company per week is a leading indicator of growth / layoffs - Identify niche-stack employers — companies with Elixir, Rust, Clojure, or Haskell in
LANGUAGESfor hard-to-source talent - Benchmark perks by stack — compare
perks[]arrays across companies with comparable tech stacks for offer-package research
Competitive Vendor-Landscape Mapping
Product, marketing, and CI teams use this to map adoption across categories:
- Observability landscape — Datadog vs New Relic vs Honeycomb vs Splunk vs Grafana share-of-stack across the Built In universe
- Data warehouse landscape — Snowflake vs Databricks vs BigQuery vs Redshift adoption by company size
- CI/CD landscape — CircleCI vs GitHub Actions vs Jenkins vs GitLab CI penetration
- Frontend framework landscape — React vs Vue vs Angular vs Svelte across new vs established companies
- Quarterly trend reports — re-run the actor against a fixed slug list every quarter and diff results to spot adoption swings
Investor / VC Portfolio Analytics
Venture capital firms and growth-equity investors use tech-stack data as a leading indicator:
- Technical-maturity scoring — portfolio companies adopting Terraform + IaC + observability stacks are scaling readiness signals
- Stack-overlap analysis — for buy-side deals, compare a target's stack to the firm's existing portfolio to estimate integration cost
- Trend-spot emerging tooling — track quarter-over-quarter adoption of new dev tools across 200 portfolio companies
- Comp-set construction — build a peer comp set from Built In companies with the same employee band + same primary tech category
- Sector-thesis validation — confirm "AI infra spend is up" by counting growth in PyTorch / Triton / vLLM adoption across the dataset
Conference, Event & Sponsor Sales
Conference organizers and B2B events teams use this to target outreach:
- KubeCon — pull every company with Kubernetes + Istio in their stack as a sponsor / attendee target list
- Snowflake Summit — every Built In company with Snowflake in DATABASES becomes a registrant CRM record
- Re:Invent / Google Cloud Next — segment by
CLOUD SERVICESadoption for hyperscaler-specific events - DevOps Days — pull every company with
DEVOPScategory items for regional event marketing - Stack-specific meetups — Rust meetup organizers can pull every Built In company with Rust in
LANGUAGES
DevRel & Open-Source Community Building
Developer-relations teams use tech stack data to grow communities:
- Find new adopters — schedule the actor weekly and diff
techStackarrays to detect companies that newly adopted your OSS framework (LangChain, Next.js, Drizzle ORM, etc.) - Build customer-story pipelines — identify companies running your stack at scale as case-study candidates
- Power developer-marketing campaigns — generate "X companies are building on [your framework]" landing-page social proof, auto-updated weekly
- Find prospective contributors — companies running your project commercially are the most likely source of upstream contributions
- Targeted ad creative — Built In employer logos make for high-trust social-proof ad units
Executive Search & Headhunting
Headhunters and executive-search firms use the stack + jobs fingerprint to:
- Identify CTO targets by exact tech-stack overlap with the hiring brief
- Map VP of Engineering candidates at companies in a comparable employee band running similar infrastructure
- Build longlist + shortlist for Director-level technical roles in days, not weeks
- Track when an exec is hiring —
jobCountspikes correlate with leadership-team expansion windows - Cross-reference with
perks[]to identify companies offering equity / remote / unlimited PTO as part of the pitch
M&A Scouting & Acquisition Due Diligence
Corporate development and M&A advisors use stack data for build-vs-buy and integration planning:
- Tech-stack compatibility filter — score acquisition targets by stack overlap with the acquirer
- Integration cost modelling — incompatibility-heavy targets (Python shop acquiring a Java shop) increase post-deal integration timelines
- Talent-retention modelling — engineering teams with niche stacks (Erlang, OCaml) are higher flight risks post-acquisition
- Tuck-in candidate sourcing — find small Built In companies in DATABASES with a complementary stack to your portfolio company
- Counter-bid intelligence — when a competitor acquires, scrape every comparable Built In company to identify likely next targets
Market Research & Analyst Reports
Industry analysts, equity research, and tech-trade journalists use the dataset to:
- Quantify adoption trends — produce "what % of Built In companies run Snowflake in 2026" data points
- Year-over-year shift reporting — re-run quarterly and produce category-level migration narratives ("share of companies running Postgres dropped 8% as Aurora adoption rose")
- Geographic adoption maps — group
techStackByCategoryresults bylocationto map regional stack preferences - Employee-band-segmented reports — compare stack composition for 11–50 vs 1,000+ employee companies
- Source quotable data for white papers, investor decks, and trade-publication articles
Programmatic Ad Targeting & Custom Audiences
Marketing teams use stack data to build hyper-relevant ad audiences:
- LinkedIn Matched Audiences — upload Built In companies + their LinkedIn URLs (from
socialLinks[]) as company-targeting audiences - Facebook Custom Audiences — same approach for B2B awareness campaigns
- Programmatic display retargeting — segment ad creative by detected tech stack (different ad to AWS shops vs Azure shops)
- Account-based-marketing (ABM) — power the top of an ABM funnel with stack-defined target accounts that map exactly to your ICP
- Email-warming sequences — pair the company list with tech-stack-personalized subject lines and lead-magnet content
Sample Queries & Recipes
Recipe 1: Single-company deep pull
{"companySlugs": ["mongodb"],"scrapeJobs": true,"maxRecords": 1}
Goal: get one perfect record for schema validation, dashboard prototyping, or demo screenshots.
Recipe 2: Snowflake-adoption hit list
{"companySlugs": ["stripe", "plaid", "discord", "instacart","doordash", "robinhood", "coinbase", "airtable","notion", "linear-app", "figma", "vercel"],"scrapeJobs": false,"maxRecords": 50}
Then filter downstream:
snowflake_users = [c for c in itemsif c.get("techStackByCategory", {}).get("DATABASES")and "Snowflake" in c["techStackByCategory"]["DATABASES"]]
Goal: a Snowflake sales rep pulls their territory of warm tech-trigger leads in one run.
Recipe 3: Kubernetes + DevOps maturity scoring
{"startUrls": [{ "url": "https://builtin.com/company/datadog" },{ "url": "https://builtin.com/company/snyk" },{ "url": "https://builtin.com/company/honeycomb-io" }],"maxRecords": 20}
Goal: every company on K8s + a DEVOPS category entry gets a +1 maturity score in a DevOps-tooling vendor's lead-scoring model.
Recipe 4: ML / AI infrastructure prospects
{"companySlugs": ["openai", "anthropic", "hugging-face", "scale-ai","weights-biases", "databricks", "pinecone", "weaviate"],"scrapeJobs": true,"maxRecords": 30}
Goal: identify ML platform spend signals — PyTorch + AWS + Kubernetes in the stack PLUS active ML-engineer roles in jobs[] = active buyer for vector DBs, GPU clouds, or MLOps tooling.
Recipe 5: Multi-office expansion targets
{"companySlugs": ["mongodb", "stripe", "snowflake-computing-inc"],"scrapeJobs": false,"maxRecords": 50}
Goal: any company with 3+ entries in offices[] qualifies as a multi-region target for enterprise contract software (HRIS, payroll, global benefits).
Recipe 6: New-adopter tracking (weekly diff)
{"companySlugs": ["vercel", "linear-app", "supabase","planetscale", "neon-tech", "railway-app"],"scrapeJobs": false,"maxRecords": 50}
Schedule weekly. Diff techStack[] arrays week-over-week to find companies that newly adopted your DevRel team's framework.
Recipe 7: High-throughput crawl with politeness
{"companySlugs": ["mongodb", "datadog", "stripe", "snowflake-computing-inc","plaid", "robinhood", "coinbase", "airbnb", "doordash"],"scrapeJobs": true,"maxRecords": 500,"requestDelay": 1500,"maxConcurrency": 2}
Goal: large-batch overnight run, optimized for "never get blocked" rather than for raw speed.
Integration Examples
Google Sheets (via Apify Integration)
- Schedule the actor (e.g. weekly Sunday 22:00 UTC) with a fixed slug list
- Add the "Export to Google Sheets" integration to the schedule
- Receive a fresh Built In company sheet every week, including the JSON-stringified
techStackByCategoryfield for spreadsheet-side filtering
Make.com / Zapier / n8n
Use the Apify connector on any major automation platform. Trigger downstream workflows on:
- New companies appearing in your watch list
- New tech-stack items added since last run (auto-detect adoption events)
- New jobs posted since last run (auto-route to sales / recruiter Slack channels)
- Office count changes (geographic-expansion signal)
- Employee-count band changes (growth signal)
Postgres / Snowflake / BigQuery
Recommended schema for warehouse ingestion:
CREATE TABLE builtin_companies (scraped_at TIMESTAMP,slug TEXT PRIMARY KEY,name TEXT,industry TEXT,location TEXT,employee_count TEXT,founded_year TEXT,tech_stack_size INTEGER,job_count INTEGER,tech_stack JSONB, -- raw array of {name, category, iconUrl}tech_by_category JSONB, -- grouped objectperks JSONB,offices JSONB,social_links JSONB,source_url TEXT);
Use the Apify webhook to POST run results to a small ingest endpoint after every scheduled run.
Power BI / Tableau / Looker
Connect the Apify REST API as a data source. Build dashboards covering:
- Top 20 most-adopted technologies across the dataset
- Category share-of-wallet (
LANGUAGESmix,DATABASESmix,CLOUD SERVICESmix) - Adoption growth quarter-over-quarter for a target technology
- Geographic heat map of company HQs
- Employee-band × stack-category cross-tab
Salesforce / HubSpot CRM Enrichment
Run the actor on your CRM-account-domain → Built In-slug map nightly. Upsert against Account records keyed on slug. Custom-field examples:
uses_mongodb__c(boolean) — derived fromtechStackcontainingMongoDBcloud_provider__c(picklist) — derived fromtechStackByCategory.CLOUD SERVICESengineering_team_size__c(number) — proxy viaemployeeCountbandactive_engineering_roles__c(number) — fromjobCounttech_stack_fingerprint__c(longtext) — JSON-stringifiedtechStackByCategory
Webhooks for Real-Time Triggers
Wire Apify run-complete webhooks into your internal automation:
// In your webhook handlerfor (const company of newItems) {if (company.techStack?.some(t => t.name === 'Snowflake')&& company.jobCount > 0) {notifySalesRep('snowflake-team', company);}}
Major Markets & Tech Hubs at a Glance
Built In's company directory skews toward US tech hubs plus selected international cities. The actor returns whatever HQ a company self-reports, so you get a global footprint when scraping global companies:
| Tech Hub | Built In Presence | Notes |
|---|---|---|
| San Francisco / Bay Area | Very high | Headquarters for OpenAI, Stripe, Airbnb, Databricks, Anthropic and most YC-funded growth-stage startups |
| New York, NY | Very high | Finance + media-tech: MongoDB, Datadog, Peloton, Squarespace, Etsy |
| Austin, TX | Very high | Crypto + B2B SaaS: Indeed, Bumble, RetailMeNot |
| Seattle, WA | High | Cloud-adjacent: Amazon-orbit + Smartsheet, Outreach, Highspot |
| Boston, MA | High | Biotech + enterprise: HubSpot, Wayfair, DraftKings |
| Los Angeles, CA | High | Entertainment-tech + creator economy |
| Chicago, IL | High | Built In's birthplace — Sprout Social, Coinbase Chicago, Tock |
| Denver / Boulder, CO | High | Climate-tech + SaaS |
| Atlanta, GA | Medium-high | Fintech + supply-chain |
| Miami, FL | Medium | Crypto + LATAM-facing tech |
| Toronto, Canada | Medium | Shopify-orbit |
| London, UK | Medium | European tech HQs |
| Dublin, Ireland | Medium | European hubs of US companies (MongoDB Dublin, Stripe Dublin) |
| Berlin, Germany | Medium | European startup ecosystem |
| Sydney, Australia | Medium | APAC offices |
| Singapore | Low-medium | APAC regional offices |
The dataset is as global as Built In is — coverage depth follows where Built In's editorial team and recruiting customer base focus.
Cost & Performance
| Metric | Value |
|---|---|
| Engine | HTTP-only — got-scraping + cheerio |
| Runtime per company | ~1–2 seconds (default delay + parsing) |
| Runtime for 100 companies | ~3–5 minutes (with default 1000 ms delay, concurrency 3) |
| Runtime for 1,000 companies | ~30–50 minutes (recommended overnight) |
| Cost per company | Fractions of a cent in Compute Units |
| Pricing model | Pay-per-event — only pay when you run |
| Data freshness | Live at run time — exactly what Built In is serving the public right now |
| Auth required | None |
| Proxy required | Optional — disabled by default |
| Concurrency | Default 3; safe range 2–4; ceiling 10 |
| Memory footprint | 256 MB ample; 512 MB if scraping 1,000+ companies in one run |
| Failure mode | Actor.fail() on zero records — no silent empty datasets |
Compliance, Privacy & Legal Notes
- Public data only — every field returned is published openly by Built In at
https://builtin.com/company/{slug}and rendered to any unauthenticated visitor - No PII / no PHI — the dataset contains zero patient health information, no personal identifiers beyond what companies voluntarily publish on their own recruiting pages (company name, office addresses, jobs)
- No emails or phone numbers of individuals are extracted
- No login / no scraping behind paywalls — the actor never authenticates to Built In
- Polite by default — per-request delay + low concurrency + realistic browser headers + 3-attempt retry with backoff respect Built In's infrastructure
- robots.txt deference — operators using this actor should review Built In's current
robots.txtand the site's Terms of Service; the actor itself does not embed any robots-bypass logic - GDPR / CCPA — compliance with downstream data-protection regulation is the responsibility of the data consumer. Company-level firmographic data is generally outside GDPR's personal-data scope but always confirm with counsel
- CAN-SPAM, TCPA — if you use scraped data for outbound marketing, compliance with anti-spam and call-restriction laws is your responsibility
Important: Built In data may not be used for unlawful purposes. Read Built In's Terms of Service and use the data only for the legitimate business, research, recruiting, and journalism purposes for which the company publishes it.
Frequently Asked Questions
How fresh is the data?
Live at run time. Every run fetches the current HTML directly from builtin.com/company/{slug}. There is no caching layer between Built In's web server and the data you receive. If a company updated its tech stack 10 minutes before your run, you will see the new entries.
How many companies can I scrape in one run?
There is no hard cap from the actor itself — the maxRecords parameter is your limit. Practically, 100 companies takes ~3–5 minutes, 1,000 takes ~30–50 minutes at default politeness settings. For larger crawls, schedule multiple runs or increase maxConcurrency modestly.
Does this require a Built In account or API key?
No. Built In does not require authentication to view company profiles. The actor only needs your Apify token.
Why is the tech stack the key feature?
Because no other major company-data provider publishes the per-company tech stack labelled by category in a structured, openly-scrapable form. LinkedIn, Crunchbase, AngelList, and PitchBook either omit it entirely or hide it behind enterprise contracts costing tens of thousands of dollars per year. Built In publishes it openly because companies use Built In to recruit engineers, and engineers want to know what stack they'd be working on.
Which categories does Built In use?
The common set we have verified across live profiles: LANGUAGES, FRAMEWORKS, DATABASES, DEVOPS, CLOUD SERVICES, SALESFORCE, ANALYTICS, DESIGN, SEARCH ENGINES, COLLABORATION. The actor preserves whatever UPPERCASE label Built In renders. Anything without an explicit category label is grouped under OTHER so you never lose data.
How do I find a company's slug?
The slug is the last path segment of the Built In URL. For https://builtin.com/company/snowflake-computing-inc, the slug is snowflake-computing-inc. You can also paste the full URL into startUrls and the actor will parse the slug automatically.
What happens if a slug is wrong / 404?
The fetch returns either no HTML or a thin "not found" page; the actor's body-size check (> 1000 bytes) and selector validation (must find <h1> and tech section) drop empty rows rather than push garbage. If every slug 404s, the actor calls Actor.fail() so your pipeline sees an explicit failure.
Can I run this against startUrls and companySlugs at the same time?
Yes. Both inputs are merged into a single queue with URL-level deduplication. startUrls win when both refer to the same slug.
Does the actor handle multi-office companies?
Yes. The offices[] array captures every <address> element under the Offices section, so global companies like MongoDB, Stripe, and Snowflake come back with all their regional offices intact.
Can I disable job scraping?
Yes — set scrapeJobs: false. The actor skips the /job/... link extraction loop, shaving a small amount of parsing time when you only care about firmographics + tech stack.
Does this work for international companies?
Yes, anywhere Built In has a profile. While Built In's coverage skews to US tech hubs, it lists companies headquartered in Canada, the UK, Ireland, Germany, Australia, Singapore, and beyond. The actor's parsing logic is country-agnostic.
Is residential / proxy required?
No, not by default. Built In's anti-bot is light enough that direct datacenter IPs work for moderate loads. Enable Apify Residential US if you crawl thousands of companies in a single run or hit rate-limit responses.
Does this work on the Apify Free Plan?
Yes — full functionality. Small runs (10–50 companies) typically cost only fractions of a cent in Compute Units, well within the free monthly allowance.
How is this different from BuiltWith / Wappalyzer / TheirStack?
Those products use inferred tech stack — they look at HTTP headers, JS fingerprints, and DNS records to guess what tech a company runs. This actor extracts what companies explicitly publish about their stack to recruit engineers. It is far more accurate for backend services (databases, frameworks, languages) that don't leak to the public web, but only covers companies that have a Built In profile.
Can I schedule this to run automatically?
Yes — Apify's built-in Scheduler supports hourly, daily, weekly, or arbitrary-cron schedules. A weekly run is the sweet spot for tracking tech-stack adoption changes; daily for active sales/recruiting watchlists.
What output formats are supported?
JSON, CSV, Excel (XLSX), HTML, XML, RSS, and JSON Lines — directly from the Apify dataset view or via the dataset items API.
How do I detect when a company adopts a new technology?
Schedule the actor weekly on a fixed slug list. After each run, compute the set difference between this run's techStack[] names and last run's. Any new entry is a freshly-adopted technology — these are gold for sales/DevRel outreach.
Why might techStack come back null for some companies?
Smaller companies, recently-claimed profiles, or non-engineering companies often choose not to publish a tech stack. The actor returns null rather than an empty array so you can distinguish "no data published" from "we tried and got zero items."
How do I report a bug or request a feature?
Use the Issues tab on the Apify Store actor page, or contact the developer directly through the Apify Console.
Related Apify Actors by Haketa
If you're building a tech-talent / B2B SaaS intel stack, these complementary actors plug in cleanly:
- H1B Visa Database Scraper — every US visa-friendly employer (perfect cross-reference: which Built In companies actively H1B-sponsor?)
- YCombinator Companies Scraper — 5,900+ funded startups (overlay YC × Built In to find well-funded tech-stack-disclosed targets)
- Levels.fyi Scraper — compensation data at top tech companies (pair tech stack + comp for recruiter pitches)
- Salary.com Scraper — US salary benchmarks across roles and metros
- SEEK Scraper (Australia / NZ) — APAC tech-job listings for cross-region pipelines
- BBB Business Scraper — US business directory for firmographic enrichment
- SAM.gov Federal Contractor Scraper — government contractors for public-sector tech-sales pipelines
- ProductHunt Launches & Makers Scraper — daily startup launches, makers, votes & reviews — VC/founder/recruiter intel
- Texas Pharmacy License Scraper — TSBP — example of a high-volume regulated-data extraction
- Arizona ROC Contractor License Scraper — same pattern for contractor licensing
- California DCA Professional License Scraper — California professional licensing data
- Ohio eLicense Scraper — Ohio-wide professional license database
Comparison vs. Alternatives
| Approach | Setup time | Tech-stack granularity | Category labels | Updated | Cost (1,000 companies) |
|---|---|---|---|---|---|
| This actor | < 5 minutes | Per-company exact stack | Yes — LANGUAGES, DATABASES, etc. | Live at run time | Fractions of a cent in CUs |
| Manual copy-paste from Built In | Hours/days | High | Yes (manual) | Stale by hour 1 | Free + analyst hours |
| Custom Cheerio scraper (DIY) | 8–20 hours dev | Same | Same (if you debug it) | Live | Free + ongoing maintenance |
| BuiltWith / HG Insights | Days (sales cycle) | Inferred (fingerprint-based) | Some | Daily-ish | $5K–30K/year |
| TheirStack | Days | Inferred | Some | Daily-ish | $3K–15K/year |
| LinkedIn Sales Navigator | Hours | Partial — no stack | None | Live | $1K+/year/seat |
| Crunchbase Enterprise | Days | Minimal | None | Daily | $30K+/year |
Why Pay-Per-Event Pricing?
Most data scrapers either charge a flat monthly subscription (you pay even on weeks you don't run) or per-Compute-Unit (unpredictable on long jobs). This actor uses pay-per-event, which means:
- You only pay when the actor actually runs
- Charges scale with how many companies you actually consume
- Transparent, line-item billing inside Apify
- No monthly minimums and no annual commitments
- Free to evaluate — sample with
maxRecords: 5for pennies before any larger crawl - Predictable per-record cost makes ROI modelling easy
Changelog
| Version | Date | Notes |
|---|---|---|
| 1.0.0 | 2026-05 | Initial public release — HTTP-only got-scraping + cheerio, full category-tagged tech stack, recent jobs, perks, offices, multi-social-link extraction, configurable politeness, optional proxy, Actor.fail() on zero records |
Keywords
BuiltIn scraper · BuiltIn tech stack scraper · builtin.com company scraper · builtin.com tech stack · tech stack database · company tech stack lookup · category-tagged tech stack · developer recruiting database · technical recruiter intel · B2B SaaS prospecting · tech vendor landscape · competitive intel scraper · tech company directory scraper · company technology fingerprint · BuiltIn company profile scraper · tech stack fingerprint · technology adoption signals · tech-trigger sales leads · LANGUAGES FRAMEWORKS DATABASES scraper · MongoDB user list · Datadog user list · Snowflake adopter list · Kubernetes user database · AWS GCP Azure adopter lookup · React Django Rails framework adoption · ML engineer recruiting database · DevOps tooling vendor landscape · CTO target sourcing · M&A tech-stack compatibility · YC startup tech stack · investor portfolio technical maturity · KubeCon sponsor list · Snowflake Summit prospect list · DevRel adoption tracking · OSS framework new-adopter detection · ABM target accounts · technology-triggered outbound · tech company directory API · BuiltIn Apify actor · Built In tech company scraper · BuiltIn job listings scraper · BuiltIn perks scraper · BuiltIn office locations · startup technology fingerprint · scale-up tech stack database · BuiltIn data extraction · BuiltIn HTML scraper · BuiltIn cheerio scraper · BuiltIn HTTP scraper
Support
- Bug reports: Use the Issues tab on the Apify Store page
- Feature requests: Same place — please describe your use case so we can prioritize correctly
- Direct contact: Through the Apify developer profile
If this actor saves your sales / recruiting / DevRel team hours of manual research, a 5-star rating on the Apify Store helps other tech-go-to-market teams discover it. Thank you!