Pricing

from $3.50 / 1,000 results

H1B Visa Database Scraper | DOL Disclosure Salaries

H-1B visa database scraper & API: search US H-1B (LCA) filings by employer, job title, year or location and export employer, title, wage, worksite, case status, dates and visa class. Salary benchmarking, immigration and labor-market data — fast, no login.

Pricing

from $3.50 / 1,000 results

Rating

0.0

(0)

Developer

Haketa

Actor stats

Bookmarked

Total users

Monthly active users

7 days ago

Last modified

What This Actor Does

The H1B Visa Database Scraper is a production-grade Apify Actor that turns the entire public US H1B disclosure dataset into structured, filterable JSON. It queries h1bdata.info — a community-maintained mirror of the US Department of Labor's (DOL) Office of Foreign Labor Certification public disclosure feed — and returns one row per CERTIFIED Labor Condition Application (LCA) / H1B petition.

Under federal law (Title 20 CFR §655.760), the DOL must publish every approved foreign-worker petition with the petitioning employer, the job title, the base wage offered, the city/state of work, the application submit date, and the intended start date. h1bdata.info ingests the DOL's quarterly disclosure files and exposes them via a fast, paginated, ungated search UI. This actor turns that UI into an API.

In a single search (e.g. employer=google, job=software engineer, year=2024) the actor will return 30,000+ rows for popular queries. Across the full back catalog (FY 2014 → present) the underlying dataset contains 8M+ certified petitions.

Each record returned includes:

Employer — petitioning company exactly as filed with the DOL (e.g. GOOGLE LLC, META PLATFORMS INC, JPMORGAN CHASE & CO)
Job title — the title on the LCA (e.g. SOFTWARE ENGINEER, DATA SCIENTIST, INVESTMENT BANKING ANALYST)
Base salary — the annual wage offered to the foreign worker (USD, the wage the DOL approved)
Work location — city + 2-letter state of the intended job site
Submit date — when the LCA was filed with the DOL
Start date — the proposed employment start date
Year — DOL fiscal year of the petition
Case status — always CERTIFIED (the DOL only publishes approved petitions; denied/withdrawn cases are not disclosed)
Provenance — exact search URL the row came from + ISO scrape timestamp

The dataset powers immigration-attorney case research, visa-dependent job-seeker discovery, recruiter intelligence, comp benchmarking, investigative journalism, and labor-economics research — all from a source no closed API can match for breadth or cost.

Why scrape h1bdata.info yourself when this exists?

The H1B disclosure feed is public, but turning it into a usable dataset is non-trivial. Teams that try the DIY route run into these obstacles fast:

The raw DOL files are quarterly Excel/CSV dumps with 100+ columns, inconsistent headers per quarter, and a 12-week publication lag
h1bdata.info pages return up to ~18 MB of HTML per query (38k+ rows in a single response) — naive requests.get() calls without a 60-second timeout will silently truncate
The site's HTML table structure is unlabeled — you have to parse column order positionally, not by header
Salary strings arrive as $112,000 style text and must be normalized to numeric USD
Dates arrive as MM/DD/YYYY (US format) and need ISO conversion for SQL/BI tools
Location is a combined CITY, ST string that needs splitting for analytics
Empty querys are rejected silently — at least one of employer/job/city is required
Different searches return wildly different row counts (10 rows for a niche role, 30k+ for Google + Software Engineer) — your scraper must tolerate both extremes
The DOL's own performance.dol.gov portal cannot be queried — it only offers full-quarter downloads
Building a per-employer search loop (1,000 employers × 5 job titles × 10 years = 50,000 queries) requires retry/backoff/dedup logic that's tedious to maintain

This actor solves every one of those: it generates the cross-product of your filters as separate tasks, retries each request 3× with exponential backoff and a 60-second timeout, normalizes salary/date/location fields, deduplicates rows, and emits clean JSON ready for SQL, Pandas, Sheets, or your BI tool.

Quick Start

One-Click Run

Click "Try for free" on the Apify Store page
Enter at least one employer (e.g. google), one job title (e.g. software engineer), and a year (e.g. 2024)
Hit Start — petitions stream into the dataset in seconds
Download as JSON, CSV, Excel, JSONL, HTML, XML, or RSS directly from the Apify dataset view

API Run (Python)

from apify_client import ApifyClient

client = ApifyClient("YOUR_APIFY_TOKEN")

run = client.actor("haketa/h1b-visa-database-scraper").call(run_input={
    "employers": ["google", "meta", "amazon", "microsoft", "apple"],
    "jobTitles": ["software engineer", "machine learning engineer", "data scientist"],
    "cities": [],                  # nationwide
    "year": "2024",
    "minSalary": 150000,           # only show $150k+ offers
    "maxRecords": 5000,
})

for row in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(
        f"{row['employer']:30s} | {row['jobTitle']:35s} | "
        f"${row['baseSalary']:>8,} | {row['city']}, {row['state']} | "
        f"{row['submitDate']}"
    )

API Run (Node.js / TypeScript)

import { ApifyClient } from 'apify-client';

const client = new ApifyClient({ token: 'YOUR_APIFY_TOKEN' });

const run = await client.actor('haketa/h1b-visa-database-scraper').call({
    employers: ['goldman sachs', 'morgan stanley', 'jpmorgan chase'],
    jobTitles: ['quantitative analyst', 'investment banking analyst'],
    cities: ['new york'],
    year: '2024',
    maxRecords: 1000,
});

const { items } = await client.dataset(run.defaultDatasetId).listItems();
console.log(`Pulled ${items.length} certified H1B petitions on Wall Street`);

const avgWage =
    items.reduce((s, r) => s + (r.baseSalary || 0), 0) / items.length;
console.log(`Average base wage: $${Math.round(avgWage).toLocaleString()}`);

API Run (cURL)

curl -X POST "https://api.apify.com/v2/acts/haketa~h1b-visa-database-scraper/runs?token=YOUR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "employers": ["openai"],
    "jobTitles": ["research engineer", "software engineer"],
    "year": "2024",
    "maxRecords": 500
  }'

API Run (paste a raw search URL)

run = client.actor("haketa/h1b-visa-database-scraper").call(run_input={
    "startUrls": [
        "https://h1bdata.info/index.php?em=nvidia&job=ai+engineer&city=&year=2024"
    ],
    "maxRecords": 1000,
})

When startUrls is provided it overrides the structured filters — useful when you have a search you've already built interactively on h1bdata.info and want to replay it programmatically.

How It Works

The actor takes a Cartesian product of employers × jobTitles × cities (plus the chosen year) and turns each combination into a single GET request against h1bdata.info/index.php. Each response is parsed with cheerio, normalized, salary-band-filtered, deduplicated, and pushed into the Apify Dataset.

Source endpoint

Endpoint	Method	Pagination	Notes
`https://h1bdata.info/index.php?em={emp}&job={job}&city={city}&year={year}`	GET	None — one request returns every match	Server may return 18MB+ HTML for popular queries

The query string accepts any combination of the four parameters; at least one of em, job, or city is required (empty queries are rejected). year accepts a 4-digit DOL fiscal year (2014 → present) or All Years.

Engine

HTTP-only via got-scraping — no Playwright, no Puppeteer, no headless Chromium overhead
Realistic browser headers auto-generated by got-scraping's headerGeneratorOptions (Chrome 120+, desktop, US locale, Windows/macOS)
60-second per-request timeout — needed because popular queries return 18MB+ payloads
3-attempt retry with exponential backoff (2s × attempt + jitter) — survives intermittent network blips
Response sanity check — rejects empty bodies and non-200 statuses
Polite delay between requests (requestDelay, default 1000ms + 0-500ms jitter)

Parsing

cheerio loads the HTML and walks table#myTable tbody tr (with a positional fallback for any other table tbody tr shape)
Columns are parsed by position: EMPLOYER | JOB TITLE | BASE SALARY | LOCATION | SUBMIT DATE | START DATE
Salary normalization — $112,000 → 112000 (integer); the original display string is preserved as baseSalaryDisplay
Date normalization — MM/DD/YYYY → YYYY-MM-DD (ISO-8601, sort-safe)
Location split — "MOUNTAIN VIEW, CA" → {city: "MOUNTAIN VIEW", state: "CA"}; the raw string is preserved as location
Year inference — derived from submitDate when present

Output pipeline

Salary-band post-filter — minSalary / maxSalary applied before push
Deduplication — (employer | jobTitle | baseSalary | submitDate | city) is hashed; duplicates within a run are dropped
Hard cap — maxRecords stops the loop early (set 0 for unlimited)
Empty-result safety net — if zero records are scraped after all tasks complete, the run is marked FAILED via Actor.fail() so monitoring/scheduling integrations can alert

Proxy

Proxy is disabled by default. h1bdata.info has no rate-limit, no CAPTCHA, and no IP-based throttle. You only need to enable Apify Proxy if you're running massive parallel jobs (e.g. 10,000+ employer queries in one run) and want to be courteous about IP diversity.

Input Parameters

{
  "employers": ["google", "meta", "amazon"],
  "jobTitles": ["software engineer", "machine learning engineer"],
  "cities": ["mountain view", "menlo park", "seattle"],
  "year": "2024",
  "minSalary": 150000,
  "maxSalary": 0,
  "maxRecords": 5000,
  "requestDelay": 1000,
  "startUrls": [],
  "proxyConfiguration": { "useApifyProxy": false }
}

Parameter reference

Parameter	Type	Default	Description
`startUrls`	`array<string\|object>`	`[]`	Paste any `h1bdata.info` search URL directly (e.g. from your browser). When non-empty, overrides all other filter fields.
`employers`	`array<string>`	`["google"]`	Free-text employer names, partial match, case-insensitive. Examples: `"google"`, `"goldman sachs"`, `"infosys"`. Each entry runs as its own task.
`jobTitles`	`array<string>`	`["software engineer"]`	Free-text job-title queries, partial match. Each combines with each employer × city as a separate task.
`cities`	`array<string>`	`[]`	US-city filter, case-insensitive. Empty array = nationwide. Cross-products with employers and job titles.
`year`	`enum<string>`	`"All Years"`	DOL fiscal year. `"All Years"` for full 2014-present history or `"2026"` → `"2014"` for a single year.
`minSalary`	`integer`	`0`	Post-filter: drop rows where `baseSalary < minSalary`. `0` = no lower bound.
`maxSalary`	`integer`	`0`	Post-filter: drop rows where `baseSalary > maxSalary`. `0` = no upper bound.
`maxRecords`	`integer`	`500`	Hard cap across all tasks. `0` = unlimited. Popular queries can yield 30,000+ rows — set generously.
`requestDelay`	`integer`	`1000`	Milliseconds between requests. h1bdata has no rate-limit but 500–2000 ms is polite.
`proxyConfiguration`	`object`	`{ useApifyProxy: false }`	Optional. h1bdata.info has no anti-bot, so proxy is not required. Only enable for very large multi-thousand-task runs.

Tip: Provide an empty employers array + a non-empty jobTitles or cities to search across all employers for that role/city. Just remember the URL needs at least one of em / job / city to return results.

Output Schema

Every row in the dataset uses the same flat shape — easy to flatten into a relational table, Google Sheet, or DataFrame.

Core petition fields

Field	Type	Description
`employer`	`string`	Petitioning employer as filed with DOL, exactly as published (often ALL CAPS, includes legal suffix — e.g. `GOOGLE LLC`, `META PLATFORMS INC`)
`jobTitle`	`string`	Job title on the LCA (e.g. `SOFTWARE ENGINEER`, `DATA SCIENTIST III`, `INVESTMENT BANKING ANALYST`)
`baseSalary`	`number\|null`	Annual base wage offered, in USD, normalized to an integer
`baseSalaryDisplay`	`string\|null`	Original salary string as shown on h1bdata.info (e.g. `$112,000`)
`city`	`string\|null`	City of the work location
`state`	`string\|null`	Two-letter US state code
`location`	`string\|null`	Raw combined location string (e.g. `MOUNTAIN VIEW, CA`)
`submitDate`	`string\|null`	LCA submit date in `YYYY-MM-DD`
`startDate`	`string\|null`	Intended employment start date in `YYYY-MM-DD`
`year`	`integer\|null`	DOL fiscal year inferred from `submitDate`
`caseStatus`	`string`	Always `"CERTIFIED"` — the DOL only publishes approved petitions

Provenance / search-echo fields

Field	Type	Description
`searchEmployer`	`string\|null`	Employer query that produced this row (or `null` if `startUrls` was used)
`searchJobTitle`	`string\|null`	Job-title query that produced this row
`searchCity`	`string\|null`	City query that produced this row
`searchYear`	`string`	Year query that produced this row (e.g. `"2024"`, `"All Years"`)
`sourceUrl`	`string`	Exact `h1bdata.info` URL the row was scraped from — fully reproducible
`scrapedAt`	`string`	ISO-8601 timestamp of extraction (UTC)

Example: Google software engineer petition

{
  "employer": "GOOGLE LLC",
  "jobTitle": "SOFTWARE ENGINEER",
  "baseSalary": 112000,
  "baseSalaryDisplay": "$112,000",
  "city": "DURHAM",
  "state": "NC",
  "location": "DURHAM, NC",
  "submitDate": "2024-03-04",
  "startDate": "2024-08-15",
  "year": 2024,
  "caseStatus": "CERTIFIED",
  "searchEmployer": "google",
  "searchJobTitle": "software engineer",
  "searchCity": null,
  "searchYear": "2024",
  "sourceUrl": "https://h1bdata.info/index.php?em=google&job=software+engineer&city=&year=2024",
  "scrapedAt": "2026-05-18T09:14:22.318Z"
}

Example: Goldman Sachs investment-banking analyst petition

{
  "employer": "GOLDMAN SACHS & CO. LLC",
  "jobTitle": "INVESTMENT BANKING ANALYST",
  "baseSalary": 110000,
  "baseSalaryDisplay": "$110,000",
  "city": "NEW YORK",
  "state": "NY",
  "location": "NEW YORK, NY",
  "submitDate": "2024-01-22",
  "startDate": "2024-07-08",
  "year": 2024,
  "caseStatus": "CERTIFIED",
  "searchEmployer": "goldman sachs",
  "searchJobTitle": "investment banking analyst",
  "searchCity": "new york",
  "searchYear": "2024",
  "sourceUrl": "https://h1bdata.info/index.php?em=goldman+sachs&job=investment+banking+analyst&city=new+york&year=2024",
  "scrapedAt": "2026-05-18T09:14:24.402Z"
}

Case Status & Visa Type Reference

The DOL only publishes approved petitions in the public disclosure file, so every row this actor returns has caseStatus = "CERTIFIED". Denied, withdrawn, returned, or under-review cases are not disclosed and therefore cannot appear in this dataset.

Visa categories covered

Visa	Description	In dataset?
H1B	Specialty occupation worker (Bachelor's+ degree role)	Yes — majority
H1B1	Singapore / Chile free-trade-agreement specialty worker	Yes
E3	Australian specialty occupation worker	Yes
H2A / H2B	Seasonal agricultural / non-agricultural workers	No (separate DOL feed)
L1A / L1B	Intra-company transferee	No (USCIS-only, not DOL)
O1 / O3	Extraordinary ability	No (USCIS-only)
Green Card (PERM)	Permanent labor cert	No (separate DOL feed — see roadmap below)

DOL fiscal-year coverage

Year	Status
2014	Earliest year in h1bdata.info index
2015 → 2023	Full coverage
2024	Full coverage
2025	Rolling (DOL publishes quarterly)
2026	Partial — Q1 / Q2 typically available by mid-year

Use Cases

Immigration Law & LCA Case Research

Immigration attorneys, paralegals, and corporate immigration teams use this dataset to:

Pull employer petition history for prevailing-wage attack arguments and RFE responses
Benchmark Level 1 vs Level 4 wage offers for a given SOC code and metro to support LCA filings
Document an employer's H1B sponsorship pattern for I-140 / I-485 case files
Track concurrent / amended petitions by tracing repeated submit dates for one employer + job
Build evidence packets for Department of Labor audits and Wage & Hour investigations
Cross-reference an employer's claimed wage vs. what they've previously filed with DOL

Visa-Dependent Job Seekers (Students, F1, OPT, H1B Holders)

International students on F1/OPT and current H1B holders use the dataset to:

Identify visa-friendly employers by ranking who actually sponsors in their target city + role
Set realistic salary expectations by looking up the median LCA wage for their job title at the company they're interviewing with
Discover smaller sponsors beyond the headline FAANG names — most petitions are filed by mid-cap firms
Time job changes around H1B transfer windows using the start-date column
Avoid serial wage-suppressor employers by flagging companies whose LCA wages sit consistently below the BLS Level 1
Negotiate offers with real-world bid data from the same employer in the same role and metro

Recruiter & Talent-Intel Teams

Internal recruiters, RPO firms, and exec-search teams use the dataset for:

Competitor sponsorship analysis — who in your industry is bringing in foreign talent, at what scale, in which functions?
Hot-title detection — track quarter-over-quarter growth in titles like AI Engineer, ML Researcher, Prompt Engineer to spot category shifts
Sponsor-friendliness scorecards for candidate-facing materials
Sourcing pools — every employer in the dataset has, by definition, hired internationally before and may have a current opening profile
Pay-band benchmarking against direct competitors using actual DOL filings (not survey medians)

HR Total-Rewards & Compensation Benchmarking

Comp & ben teams blend H1B disclosure data into compensation studies because the LCA base wage is the offered base wage — not a self-reported survey response:

Calibrate base salary structures against peer companies in your metro
Build a free, real-world alternative to Radford, Mercer, and WTW surveys for tech / finance / pharma roles
Quantify metro premiums — same role in Bay Area vs Austin vs Atlanta, sourced from the same employer
Validate offer competitiveness in retention reviews
Detect inadvertent wage compression between long-tenured employees and incoming H1B hires
Inform pay-equity audits by comparing internal salaries to external LCA-filed wages for matching titles

Investigative Journalism & Wage-Suppression Reporting

Reporters and data desks use H1B disclosure as a primary source for:

Wage-suppression investigations — flag employers whose LCA wages cluster at DOL Level 1 even for senior titles
Body-shop / consultancy exposés — identify outsourcing firms filing thousands of low-wage petitions per year
Geographic-arbitrage reporting — companies headquartering filings in low-prevailing-wage metros while work is performed elsewhere
Tech-layoff coverage — track post-layoff sponsorship pivots
Policy-impact stories — quantify the real on-the-ground effect of every USCIS rule change

Government, Think Tank & Labor-Economics Research

Academic economists, policy shops, and public-sector analysts use this data to:

Estimate H1B labor-market effects at MSA granularity
Inform STEM-workforce policy with empirical wage and headcount data
Track industry shifts — banking → big tech → AI → fintech sponsor mix evolution
Model wage-elasticity of H1B supply by SOC code
Support immigration-policy testimony with real disclosure data
Build replication datasets for peer-reviewed labor-economics papers

Tech-Employer Ranking & Industry-Trend Newsletters

Trade publications and data-newsletter operators use the dataset to:

Publish "Top H1B Sponsors of {YEAR}" rankings by total petitions and median wage
Run year-over-year comparisons for FAANG, MAANG, AI labs, fintech, big pharma, and consulting
Build interactive dashboards for paid subscribers (e.g. company search, role search, metro search)
Detect emerging hiring centers — petitions in Austin, Miami, Raleigh, Bellevue growing faster than NYC/SF
Publish quarterly "AI Engineer wage tracker" style data drops

University Career Services & International Student Advising

College career centers and graduate-school career offices use the dataset to:

Show students which employers in their field have historically sponsored international hires
Benchmark offered salaries for graduating MS/PhD students by program and metro
Build alumni-employer connections by surfacing alumni-heavy sponsors
Justify program ROI with concrete post-graduation sponsorship outcomes
Coach students on which employers are realistic sponsorship targets vs. long shots

Compliance, Audit & M&A Due Diligence

Corporate-development teams and external auditors use H1B disclosure as a verifiable third-party data point:

Verify an acquisition target's sponsorship history during M&A due diligence — undisclosed H1B obligations are post-close liabilities
Detect undisclosed offshore-staffing arrangements by cross-checking petition volume vs. headcount disclosures
Validate prevailing-wage compliance when the target is a federal contractor
Audit subcontractor labor practices in supply-chain due diligence
Support post-acquisition integration planning by mapping H1B-dependent talent that needs visa transfers

Real Estate & Location Intelligence

Site-selection analysts and CRE teams treat H1B petition density as a leading indicator of high-income knowledge-worker housing demand:

Forecast luxury-rental demand in metros with rising H1B petition counts
Validate corporate-relocation rumors before HQ announcements (filings shift months ahead of press releases)
Build neighborhood comps that account for international-hire population growth
Inform retail / hospitality investment in emerging tech-talent corridors

Sample Queries & Recipes

Recipe 1: All Google software-engineer petitions for FY 2024 (the verified smoke test)

{
  "employers": ["google"],
  "jobTitles": ["software engineer"],
  "year": "2024",
  "maxRecords": 1000
}

Recipe 2: Top FAANG ML / AI hiring across 2024–2025

{
  "employers": ["google", "meta", "amazon", "apple", "microsoft", "nvidia", "openai", "anthropic"],
  "jobTitles": ["machine learning engineer", "research engineer", "applied scientist", "ai engineer"],
  "year": "All Years",
  "minSalary": 200000,
  "maxRecords": 10000
}

Recipe 3: Wall Street quant & banking analyst comp benchmark, NYC only

{
  "employers": ["goldman sachs", "morgan stanley", "jpmorgan chase", "citi", "bank of america", "jane street", "citadel", "two sigma"],
  "jobTitles": ["quantitative analyst", "investment banking analyst", "software engineer"],
  "cities": ["new york"],
  "year": "2024"
}

Recipe 4: Indian-IT body-shop volume tracker

{
  "employers": ["infosys", "tata consultancy services", "wipro", "cognizant", "hcl", "tech mahindra", "capgemini"],
  "jobTitles": ["programmer analyst", "systems analyst", "consultant"],
  "year": "All Years",
  "maxRecords": 50000
}

{
  "employers": [],
  "jobTitles": ["data scientist"],
  "year": "2024",
  "minSalary": 120000,
  "maxRecords": 20000
}

Recipe 6: Pharma & biotech R&D petitions

{
  "employers": ["pfizer", "moderna", "merck", "genentech", "regeneron", "vertex", "eli lilly"],
  "jobTitles": ["scientist", "research associate", "bioinformatics scientist"],
  "year": "All Years"
}

Recipe 7: Tiny test run — 10 rows to validate your pipeline before a big scrape

{
  "employers": ["amazon"],
  "jobTitles": ["software development engineer"],
  "year": "2024",
  "maxRecords": 10
}

Recipe 8: Direct URL replay — paste a search you built in your browser

{
  "startUrls": [
    "https://h1bdata.info/index.php?em=stripe&job=&city=san+francisco&year=2024"
  ]
}

Integration Examples

Google Sheets

Schedule the actor daily, attach Apify's "Save to Google Sheets" integration, and your team has a living view of (for example) every petition your competitors filed last quarter — refreshed without anyone touching a spreadsheet.

Make.com / Zapier / n8n

Trigger downstream workflows on each new run:

New rows where baseSalary > $250,000 → send to Slack #comp-intel
New petitions from any competitor in your tracked list → create a HubSpot deal task
New employer first-time-sponsor detected → send to your sales / recruiter pipeline

Power BI / Tableau / Looker / Mode

Pull Apify's run results into your BI tool of choice via the Apify REST API and build:

Top-100 H1B sponsors by year league tables
Median LCA wage by SOC + metro heat maps
Year-over-year petition growth for any company
Wage-band distribution by employer + role

Postgres / Snowflake / BigQuery / Databricks

POST run results to your warehouse via Apify's webhook integration. Suggested schema:

CREATE TABLE h1b_petitions (
    id                BIGSERIAL PRIMARY KEY,
    employer          TEXT NOT NULL,
    job_title         TEXT,
    base_salary       INTEGER,
    city              TEXT,
    state             CHAR(2),
    submit_date       DATE,
    start_date        DATE,
    fiscal_year       SMALLINT,
    case_status       TEXT DEFAULT 'CERTIFIED',
    source_url        TEXT,
    scraped_at        TIMESTAMPTZ,
    UNIQUE (employer, job_title, base_salary, submit_date, city)
);

CREATE INDEX idx_h1b_employer    ON h1b_petitions (employer);
CREATE INDEX idx_h1b_city_state  ON h1b_petitions (city, state);
CREATE INDEX idx_h1b_year_title  ON h1b_petitions (fiscal_year, job_title);

Salesforce / HubSpot CRM Enrichment

For staffing, recruiting, and corporate immigration firms: nightly-run the actor against your tracked-employer list, then upsert against Account records — H1B_Petitions_Last_12mo__c becomes a high-signal lead-scoring field.

Webhook → Slack / Discord / Email

Trigger a Make/Zapier webhook on Apify's ACTOR.RUN.SUCCEEDED event, parse the dataset, and post highlights:

"Stripe filed 47 new H1B petitions this quarter — 31 are SF, 12 are NYC, median base $182k. Top role: Software Engineer."

Major US Metros for H1B Activity

Metro	State	Why it matters for H1B data
San Francisco / Bay Area	CA	Highest median LCA wage in the country; FAANG, AI labs, fintech
New York / NYC	NY	Wall Street, consulting (McKinsey/BCG/Bain), big-law support roles
Seattle / Bellevue	WA	Amazon, Microsoft — two largest H1B sponsors by volume historically
Austin	TX	Fastest-growing tech metro; Apple, Oracle, Tesla expansions
Boston / Cambridge	MA	Pharma + biotech (Moderna, Vertex, Genentech), Big Tech Cambridge campuses
Chicago	IL	Trading firms (Citadel, Jump, IMC, DRW), consulting back offices
Atlanta	GA	Coca-Cola, Delta, Truist, fintech (NCR, Equifax)
Dallas / Plano	TX	JPMorgan, AT&T, Toyota, healthcare IT
Houston	TX	Energy majors (ExxonMobil, Chevron, Shell), healthcare
Washington DC / NoVa	DC / VA	Federal contractors, AWS GovCloud HQ, defense primes
Raleigh-Durham	NC	RTP corridor — IBM, Cisco, Apple expansion
Phoenix / Tempe	AZ	TSMC fab, semiconductor expansion, financial services
Miami	FL	Crypto, hedge funds relocating from NY/SF
Mountain View / Sunnyvale / Menlo Park	CA	Google, Meta, LinkedIn HQ campuses
San Jose / Santa Clara	CA	NVIDIA, Cisco, Adobe, Intel

Cost & Performance

Metric	Value
Engine	HTTP-only (`got-scraping` + `cheerio`) — no browser
Runtime, single small query (10 rows)	2 – 5 seconds
Runtime, single popular query (30,000 rows / 18MB)	10 – 30 seconds
Runtime, 50 employer × 5 job × 1 year cross-product	1 – 5 minutes (with default 1s polite delay)
Cost per typical run	a few cents (pay-per-event)
Pricing model	Pay-per-event — actor start + per dataset item
Data freshness	Live at run time — h1bdata.info refreshes with each DOL disclosure release
Auth required	None
Proxy required	No (optional, disabled by default)
Concurrency	Safe to run many parallel filtered configurations
Memory footprint	256 MB sufficient for most runs; 1024 MB for huge multi-thousand-task jobs
Retry	3 attempts per request, exponential backoff (`2s × attempt + jitter`)
Timeout	60-second per-request HTTP timeout
Failure mode	`Actor.fail()` if zero records scraped (alerts your monitoring)

Compliance, Privacy & Legal Notes

Public-record data only. Every field this actor returns is published by the US Department of Labor under the public-disclosure requirements of Title 20 CFR §655.760. The DOL publishes the data; h1bdata.info re-publishes it; this actor structures it. Nothing in the output is private, leaked, or non-public.
No PII beyond what the DOL already published. Employer and job title are corporate identifiers, not personal. The disclosure file does not include the foreign worker's name, passport number, or contact info.
No PHI. Pharma / biotech petitions are listed by employer + job title only; there is no patient data anywhere in the dataset.
No SSNs, passport numbers, or visa-petition USCIS receipt numbers.
Source attribution is preserved in every row (sourceUrl) — useful for journalists and academics who need to cite primary sources.
Respect h1bdata.info's terms of service and load profile. The default 1-second polite delay between requests exists for that reason — do not lower it unnecessarily.
GDPR / CCPA are not implicated for the petitioning employer (corporate entity); the foreign worker is not personally identified in the public file.
Permissible uses include: research, journalism, recruiting / sourcing intelligence, immigration-law case work, comp benchmarking, policy analysis, and competitive intelligence.
Do not use this data for: harassment, doxxing, discriminatory employment decisions targeting visa status (which would violate 8 USC §1324b), or any deceptive marketing claim that misrepresents data freshness or origin.

Important: Disclosure data shows the wage offered on the LCA — not necessarily the wage actually paid, not signing bonuses, not RSUs, not deferred comp. Treat it as a floor / benchmark, not as a complete-compensation figure.

Frequently Asked Questions

How fresh is the data?

The actor scrapes h1bdata.info live at run time. h1bdata.info ingests new petitions every time the DOL publishes a new quarterly disclosure file (typically 8–12 weeks after the quarter closes). So FY 2024 Q4 petitions become visible roughly mid-2025, FY 2025 Q1 petitions in mid-to-late 2025, and so on. There is no faster public source for this data.

How many records exist in total?

The DOL has certified 8 million+ H1B / H1B1 / E3 petitions since fiscal year 2014. The exact number visible on any given day depends on which quarters h1bdata.info has ingested. A single popular query like Google + Software Engineer (all years) can return 30,000+ rows on its own.

Why is `caseStatus` always `CERTIFIED`?

The US Department of Labor only publishes approved petitions in the public disclosure file. Denied, withdrawn, returned-for-correction, and under-review cases are not disclosed and therefore cannot appear in this dataset.

No. h1bdata.info is fully public, has no login, no CAPTCHA, no anti-bot system. You only need an Apify account to run the actor.

Do I need to use a proxy?

No. Proxy is disabled by default. h1bdata.info has no rate-limit or IP-based throttle. The proxy option exists only for very large parallel runs where IP diversity is desirable for politeness.

Why does my run sometimes take 20–30 seconds for a single query?

Popular queries (a famous employer + common title across all years) return massive HTML payloads — sometimes 18MB+ with 30,000+ rows. The 60-second per-request timeout is calibrated for exactly this case. Smaller queries return in 2–5 seconds.

What happens if I supply an empty query?

h1bdata.info rejects empty queries (returns no results). The actor builds its task list from the cross-product of employers × jobTitles × cities and skips combinations where all three are empty. If zero tasks survive, the run fails fast with a clear message; if every task returns zero rows, Actor.fail() is called so your monitoring/scheduling integration can alert.

Does the actor return denied / withdrawn / pending petitions?

No — see above. Only DOL-certified petitions are in the public file.

Does the dataset include the foreign worker's name?

No. The DOL public-disclosure file deliberately omits the foreign worker's personal identity. The published fields are: petitioning employer, job title, base wage offered, work location, submit date, and start date.

Does it include SOC codes, prevailing-wage level, or worksite ZIP?

Not in the h1bdata.info interface. h1bdata.info publishes the user-friendly subset of the DOL file. For the full raw file (with SOC code, prevailing-wage level, full address, agent attorney, etc.) download the quarterly DOL disclosure files directly from dol.gov/agencies/eta/foreign-labor/performance and join on employer + submit date.

Can I get green-card / PERM disclosure data?

PERM (permanent labor certification) is a separate DOL disclosure feed, not on h1bdata.info. It is on the roadmap as a separate Apify actor — open a feature request if you need it sooner.

Can I filter to only H1B (excluding H1B1 / E3)?

Not directly — h1bdata.info does not expose visa subtype on the result row. The vast majority of petitions in the file are H1B; H1B1 (Singapore/Chile) and E3 (Australia) together are <2% of volume.

How do I get every petition from a specific employer across all years?

{ "employers": ["openai"], "jobTitles": [""], "year": "All Years", "maxRecords": 0 }

A blank job title combined with an employer will return every role that employer has ever sponsored. Set maxRecords: 0 for unlimited.

What about employer-name variations (e.g. "Google" vs "Google LLC" vs "Alphabet Inc")?

h1bdata.info does case-insensitive partial matching on the employer string. "google" will match GOOGLE LLC, GOOGLE INC., GOOGLE PAYMENT CORP, etc. For maximum recall, also try the parent name (alphabet) and any DBA / acquired-subsidiary names you know.

Why are some salary cells null?

Very rarely a row on h1bdata.info has a missing or malformed salary cell (an edge case in older 2014–2015 data). The actor parses what it can and emits baseSalary: null rather than dropping the row, so you can decide downstream how to handle them.

Pivot history — what was this actor before?

This actor was previously published as salary-com-scraper. After a usefulness audit it became clear that the underlying Salary.com data largely duplicated freely-available BLS Occupational Employment Statistics and Glassdoor content — not a niche worth maintaining. The actor was repivoted to the H1B disclosure niche in May 2026 because the source data is genuinely unique, primary-source, federally-mandated, and very high-value for immigration law, recruiting, journalism, and policy research. The actor ID is unchanged; only the source target, schema, and engine differ.

Does this work on the Apify Free Plan?

Yes — full functionality on the free tier. A typical filtered run costs a few cents in compute units.

Can I schedule this to run daily / weekly / monthly?

Yes — Apify's built-in Scheduler lets you trigger this actor on any cron expression. Combine with webhook outputs for fully automated H1B-intel pipelines.

What formats can I export the data in?

JSON, JSONL (streaming), CSV, Excel (XLSX), HTML, XML, RSS — directly from the Apify dataset view, or via the Apify REST API for programmatic consumers.

Are there competing data sources?

The two main competing surfaces are MyVisaJobs and H1BGrader. Both ultimately source from the same DOL disclosure file; h1bdata.info is the longest-running, fastest-querying, and least-monetized of the three, which is why it was chosen as the scrape target.

How do I report a bug or request a feature?

Open an issue on the Apify Store page or contact the developer directly through the Apify Console profile.

If you're building a US labor-market, jobs, or federal-disclosure intelligence stack, these companion actors pair well with the H1B Visa Database Scraper:

SEEK Scraper (Australia / NZ) — live job listings from APAC's largest job board
Levels.fyi Scraper — self-reported tech compensation (base + equity + bonus) — the perfect complement to LCA wage data
ProductHunt Launches & Makers Scraper — daily startup launches, makers, votes & reviews — VC/founder/recruiter intel
TTB Alcohol Permittee Scraper — another federal public-disclosure dataset (Treasury / TTB) in the same legal family
SAM.gov Federal Contractor Entity Scraper — every entity registered to do business with the US federal government
Texas Pharmacy License Scraper — TSBP — state-licensed pharmacist / pharmacy directory
Ohio eLicense Scraper — Ohio professional licenses
Illinois IDFPR License Scraper — Illinois licensed professionals
California DCA Professional License Scraper — California consumer-affairs licensees
Colorado Professional License Scraper — Colorado DORA licenses
BBB Business Scraper — Better Business Bureau company profiles

Comparison vs. Alternatives

Approach	Setup time	Data freshness	Cost (10k rows)	Schema normalization	Filtering	Provenance
This actor	< 1 minute	Live at run	a few cents	Yes — built-in	Employer × Job × City × Year + salary band	Per-row source URL
Manual h1bdata.info browsing	Hours / days	Live	Free	None	UI only	None
DOL raw quarterly CSV download	4–8 hours dev	8–12 weeks lagged	Free + infra	DIY	DIY	Manual
MyVisaJobs paid subscription	Minutes	Live	$50–500+/mo	Yes	Limited UI	None
Custom Python + requests + BeautifulSoup	1–2 days dev	Live	Free + infra	DIY	DIY	DIY
Hand-built per-employer cron + S3 + Athena	1–2 weeks dev	Quarterly	$$$	DIY	SQL	Manual

Why Pay-Per-Event Pricing?

Most data products either lock you into a monthly seat license (you pay even when idle) or charge per Compute Unit (unpredictable bills). This actor uses Apify's pay-per-event model:

You only pay when the actor actually runs
Charges scale linearly with how many rows you actually consume
Transparent line-item billing in the Apify console
No monthly minimums, no annual contracts
Free to evaluate — set maxRecords: 10 and validate the schema before scaling up
Perfect for both one-off research projects and high-frequency production scrapers

Changelog

Version	Date	Notes
1.0.4	2026-07-02	Maintenance release — actor verified and maintained; data pipeline tested for quality, structure and freshness; selectors/endpoints confirmed against the live site
1.0.0	2026-05-18	Initial public release of the H1B Visa Database Scraper — HTTP-only via `got-scraping` + `cheerio`, full h1bdata.info filter parity, salary-band post-filter, deduplication, `Actor.fail()` on empty results
(pre-1.0)	2024–2026	Same actor ID was previously published as `salary-com-scraper`; repivoted to H1B disclosure data because Salary.com largely duplicated freely-available BLS/Glassdoor content while the H1B niche is genuinely unique and high-value

Keywords

H1B visa scraper · H1B salary database · H1B sponsor lookup · h1bdata.info scraper · US DOL H1B disclosure · H1B salary by employer · H1B salary by job title · H1B prevailing wage · H1B sponsorship history · immigration salary data · H1B visa API · LCA database scraper · Labor Condition Application data · DOL Office of Foreign Labor Certification scraper · H1B sponsor search · H1B sponsor history lookup · H1B base wage scraper · H1B job title salary · H1B petition data · H1B disclosure data extraction · H1B FAANG salaries · Google H1B salary · Meta H1B salary · Amazon H1B salary · Microsoft H1B salary · Apple H1B salary · NVIDIA H1B salary · OpenAI H1B salary · Infosys H1B petitions · TCS H1B petitions · Goldman Sachs H1B salary · JPMorgan H1B salary · H1B Bay Area salary · H1B NYC salary · H1B Seattle salary · H1B Austin salary · H1B Boston salary · immigration attorney data scraping · recruiter intel scraper · compensation benchmarking API · H1B prevailing wage compliance · H1B journalism data · H1B policy research dataset · H1B M&A due diligence · Apify H1B actor · H1B1 visa data · E3 visa data · Title 20 CFR §655.760 disclosure

Support

Bug reports: Use the Issues tab on the Apify Store page
Feature requests: Same place — please describe your use case so we can prioritize realistically
Direct contact: Through the Apify developer profile (haketa)
Roadmap requests welcome: PERM / green-card disclosure scraper, H2B seasonal-worker scraper, USCIS receipt-number enrichment, prevailing-wage Level 1–4 inference

If this actor saves you time on immigration research, recruiter intel, comp benchmarking, or policy reporting, a 5-star rating on the Apify Store helps other professionals discover it. Thank you.

H-1B LCA Visa Wage & Employer Data Scraper

parseforge/h1b-lca-disclosure-scraper

Scrape US DOL H-1B Labor Condition Application records: employer, job title, base salary, prevailing wage, work location, case status, SOC/NAICS codes, and decision dates.

ParseForge

H-1B Visa Employer Intelligence

ryanclinton/h1b-visa-intelligence

Search H-1B visa petition data by employer, job title, and fiscal year. Extract approval rates, wage levels, and worksite locations from certified LCA and petition records.

Ryan Clinton

🛂 H-1B Visa Salary & Sponsor Database

inexhaustible_glass/h1b-salary-database

Search every H-1B visa filing (LCA) by employer, job title, city & year and get the EXACT base salary filed with the US Dept of Labor, worksite, filing & start dates + salary stats (avg/median/range). For job seekers, recruiters, immigration lawyers, HR. Free, no key, no proxy.

Hitman studio

H-1B Visa Salary Search — Employer & Wage Data API

nexgendata/h1b-visa-salary-search

Search H1B visa salary data by company, title, location. Get prevailing wages, employer sponsorship counts, and approval rates.

NexGenData

US DOL H1B LCA PERM Scraper - Visa Wage Disclosure Data

jungle_synthesizer/dol-h1b-lca-crawler

Scrape official US DOL foreign-labor disclosure data. Covers H-1B, H-1B1, E-3 (LCA), PERM labor certifications, H-2A, H-2B, and CW-1. Get employer, job title, SOC code, offered wage, prevailing wage level, worksite, case status, and decision date.

BowTiedRaccoon

H1B Visa Salary Database Scraper

fortuitous_pirate/h1b-salary-scraper

Search H1B visa salary data. Get employer names, job titles, wages, and locations. Essential for salary research and immigration planning.

Fortuitous Pirate

H-1B Employer Scraper — Sponsoring Company Leads

gocreative.ai/dol-h1b-employer-leads

Scrape the US DOL OFLC quarterly LCA disclosure data to find top H-1B visa sponsoring companies. Returns deduplicated employer leads with address, NAICS industry, application volume, certification rate, and wage ranges.

GoCreative AI

H1B Visa Data Scraper

solidcode/h1bvisadata-scraper

[💰 $3 / 1K] Search millions of certified H1B salary filings by employer, job title, city, and year. Get employer, job title, base salary, work location, and filing dates from public US Department of Labor data.

SolidCode

Dol H1b Salary Scraper

fortuitous_pirate/dol-h1b-salary-scraper

H-1B Salary Disclosure Data Scraper. Structured data export for lead generation, enrichment, and competitive research.