# H1B Visa Database Scraper | DOL Disclosure Salaries (`haketa/h1b-visa-database-scraper`) Actor

Scrape the US H1B Visa database (h1bdata.info) — public Department of Labor disclosure data. Per-employer, per-job, per-city, per-year salary records with submit/start dates. 8M+ approved cases since 2014. Critical for immigration attorneys, job seekers, recruiter intel.

- **URL**: https://apify.com/haketa/h1b-visa-database-scraper.md
- **Developed by:** [Haketa](https://apify.com/haketa) (community)
- **Categories:** Developer tools, Automation, Jobs
- **Stats:** 3 total users, 2 monthly users, 91.3% runs succeeded, NaN bookmarks
- **User rating**: No ratings yet

## Pricing

from $3.50 / 1,000 results

This Actor is paid per event. You are not charged for the Apify platform usage, but only a fixed price for specific events.
Since this Actor supports Apify Store discounts, the price gets lower the higher subscription plan you have.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-event

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

## H1B Visa Database Scraper — DOL Disclosure Salaries, Sponsor History & Prevailing Wage Lookup

> **The fastest way to query the entire US H1B Visa public disclosure database.** This Apify Actor scrapes [h1bdata.info](https://h1bdata.info) — a long-running independent aggregator that re-publishes the **US Department of Labor's** mandatory H1B / H1B1 / E3 disclosure files (Title 20 CFR §655.760). Every CERTIFIED petition since fiscal year 2014 is in here: **employer, job title, base wage, work location, submit date, intended start date**. **8M+ approved cases** across thousands of sponsors and job titles, **no auth, no captcha, no anti-bot, no proxy required**.

[![Apify Actor](https://img.shields.io/badge/Apify-Actor-blue)](https://apify.com/haketa/h1b-visa-database-scraper)
[![Live Data](https://img.shields.io/badge/Data-Live%20at%20Run%20Time-orange)]()
[![Engine](https://img.shields.io/badge/Engine-HTTP%20%2B%20Cheerio-green)]()
[![No Auth](https://img.shields.io/badge/Authentication-None%20Required-success)]()
[![Coverage](https://img.shields.io/badge/Records-8M%2B%20since%202014-purple)]()
[![Pay Per Event](https://img.shields.io/badge/Pricing-Pay%20Per%20Event-yellow)]()
[![Region](https://img.shields.io/badge/Region-United%20States-red)]()
[![Source](https://img.shields.io/badge/Source-US%20DOL%20Disclosure-lightgrey)]()

---

### What This Actor Does

The **H1B Visa Database Scraper** is a production-grade Apify Actor that turns the entire public US H1B disclosure dataset into structured, filterable JSON. It queries **h1bdata.info** — a community-maintained mirror of the **US Department of Labor's (DOL) Office of Foreign Labor Certification** public disclosure feed — and returns one row per **CERTIFIED** Labor Condition Application (LCA) / H1B petition.

Under federal law (**Title 20 CFR §655.760**), the DOL must publish every approved foreign-worker petition with the petitioning employer, the job title, the **base wage offered**, the city/state of work, the application **submit date**, and the **intended start date**. h1bdata.info ingests the DOL's quarterly disclosure files and exposes them via a fast, paginated, ungated search UI. This actor turns that UI into an API.

In a single search (e.g. `employer=google, job=software engineer, year=2024`) the actor will return **30,000+ rows** for popular queries. Across the full back catalog (FY 2014 → present) the underlying dataset contains **8M+ certified petitions**.

Each record returned includes:

- **Employer** — petitioning company exactly as filed with the DOL (e.g. `GOOGLE LLC`, `META PLATFORMS INC`, `JPMORGAN CHASE & CO`)
- **Job title** — the title on the LCA (e.g. `SOFTWARE ENGINEER`, `DATA SCIENTIST`, `INVESTMENT BANKING ANALYST`)
- **Base salary** — the **annual wage offered** to the foreign worker (USD, the wage the DOL approved)
- **Work location** — city + 2-letter state of the intended job site
- **Submit date** — when the LCA was filed with the DOL
- **Start date** — the proposed employment start date
- **Year** — DOL fiscal year of the petition
- **Case status** — always `CERTIFIED` (the DOL only publishes approved petitions; denied/withdrawn cases are not disclosed)
- **Provenance** — exact search URL the row came from + ISO scrape timestamp

The dataset powers **immigration-attorney case research, visa-dependent job-seeker discovery, recruiter intelligence, comp benchmarking, investigative journalism, and labor-economics research** — all from a source no closed API can match for breadth or cost.

#### Why scrape h1bdata.info yourself when this exists?

The H1B disclosure feed *is* public, but turning it into a usable dataset is non-trivial. Teams that try the DIY route run into these obstacles fast:

- The **raw DOL files** are quarterly Excel/CSV dumps with 100+ columns, inconsistent headers per quarter, and a 12-week publication lag
- **h1bdata.info pages** return up to ~18 MB of HTML per query (38k+ rows in a single response) — naive `requests.get()` calls without a 60-second timeout will silently truncate
- The site's **HTML table structure** is unlabeled — you have to parse column order positionally, not by header
- **Salary strings** arrive as `$112,000` style text and must be normalized to numeric USD
- **Dates** arrive as `MM/DD/YYYY` (US format) and need ISO conversion for SQL/BI tools
- **Location** is a combined `CITY, ST` string that needs splitting for analytics
- **Empty query**s are rejected silently — at least one of employer/job/city is required
- Different searches return **wildly different row counts** (10 rows for a niche role, 30k+ for `Google + Software Engineer`) — your scraper must tolerate both extremes
- The DOL's own [performance.dol.gov](https://www.dol.gov/agencies/eta/foreign-labor/performance) portal **cannot be queried** — it only offers full-quarter downloads
- Building a **per-employer search loop** (1,000 employers × 5 job titles × 10 years = 50,000 queries) requires retry/backoff/dedup logic that's tedious to maintain

This actor solves every one of those: it generates the cross-product of your filters as separate tasks, retries each request 3× with exponential backoff and a 60-second timeout, normalizes salary/date/location fields, deduplicates rows, and emits clean JSON ready for SQL, Pandas, Sheets, or your BI tool.

---

### Quick Start

#### One-Click Run

1. Click **"Try for free"** on the [Apify Store page](https://apify.com/haketa/h1b-visa-database-scraper)
2. Enter at least one employer (e.g. `google`), one job title (e.g. `software engineer`), and a year (e.g. `2024`)
3. Hit **Start** — petitions stream into the dataset in seconds
4. Download as JSON, CSV, Excel, JSONL, HTML, XML, or RSS directly from the Apify dataset view

#### API Run (Python)

```python
from apify_client import ApifyClient

client = ApifyClient("YOUR_APIFY_TOKEN")

run = client.actor("haketa/h1b-visa-database-scraper").call(run_input={
    "employers": ["google", "meta", "amazon", "microsoft", "apple"],
    "jobTitles": ["software engineer", "machine learning engineer", "data scientist"],
    "cities": [],                  ## nationwide
    "year": "2024",
    "minSalary": 150000,           ## only show $150k+ offers
    "maxRecords": 5000,
})

for row in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(
        f"{row['employer']:30s} | {row['jobTitle']:35s} | "
        f"${row['baseSalary']:>8,} | {row['city']}, {row['state']} | "
        f"{row['submitDate']}"
    )
````

#### API Run (Node.js / TypeScript)

```typescript
import { ApifyClient } from 'apify-client';

const client = new ApifyClient({ token: 'YOUR_APIFY_TOKEN' });

const run = await client.actor('haketa/h1b-visa-database-scraper').call({
    employers: ['goldman sachs', 'morgan stanley', 'jpmorgan chase'],
    jobTitles: ['quantitative analyst', 'investment banking analyst'],
    cities: ['new york'],
    year: '2024',
    maxRecords: 1000,
});

const { items } = await client.dataset(run.defaultDatasetId).listItems();
console.log(`Pulled ${items.length} certified H1B petitions on Wall Street`);

const avgWage =
    items.reduce((s, r) => s + (r.baseSalary || 0), 0) / items.length;
console.log(`Average base wage: $${Math.round(avgWage).toLocaleString()}`);
```

#### API Run (cURL)

```bash
curl -X POST "https://api.apify.com/v2/acts/haketa~h1b-visa-database-scraper/runs?token=YOUR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "employers": ["openai"],
    "jobTitles": ["research engineer", "software engineer"],
    "year": "2024",
    "maxRecords": 500
  }'
```

#### API Run (paste a raw search URL)

```python
run = client.actor("haketa/h1b-visa-database-scraper").call(run_input={
    "startUrls": [
        "https://h1bdata.info/index.php?em=nvidia&job=ai+engineer&city=&year=2024"
    ],
    "maxRecords": 1000,
})
```

When `startUrls` is provided it **overrides** the structured filters — useful when you have a search you've already built interactively on h1bdata.info and want to replay it programmatically.

***

### How It Works

The actor takes a Cartesian product of `employers × jobTitles × cities` (plus the chosen `year`) and turns each combination into a single GET request against `h1bdata.info/index.php`. Each response is parsed with **cheerio**, normalized, salary-band-filtered, deduplicated, and pushed into the Apify Dataset.

#### Source endpoint

| Endpoint | Method | Pagination | Notes |
|---|---|---|---|
| `https://h1bdata.info/index.php?em={emp}&job={job}&city={city}&year={year}` | GET | None — one request returns every match | Server may return 18MB+ HTML for popular queries |

The query string accepts any combination of the four parameters; at least one of `em`, `job`, or `city` is required (empty queries are rejected). `year` accepts a 4-digit DOL fiscal year (2014 → present) or `All Years`.

#### Engine

- **HTTP-only via `got-scraping`** — no Playwright, no Puppeteer, no headless Chromium overhead
- **Realistic browser headers** auto-generated by `got-scraping`'s `headerGeneratorOptions` (Chrome 120+, desktop, US locale, Windows/macOS)
- **60-second per-request timeout** — needed because popular queries return 18MB+ payloads
- **3-attempt retry with exponential backoff** (`2s × attempt + jitter`) — survives intermittent network blips
- **Response sanity check** — rejects empty bodies and non-200 statuses
- **Polite delay** between requests (`requestDelay`, default 1000ms + 0-500ms jitter)

#### Parsing

- **cheerio** loads the HTML and walks `table#myTable tbody tr` (with a positional fallback for any other `table tbody tr` shape)
- Columns are parsed by **position**: `EMPLOYER | JOB TITLE | BASE SALARY | LOCATION | SUBMIT DATE | START DATE`
- **Salary normalization** — `$112,000` → `112000` (integer); the original display string is preserved as `baseSalaryDisplay`
- **Date normalization** — `MM/DD/YYYY` → `YYYY-MM-DD` (ISO-8601, sort-safe)
- **Location split** — `"MOUNTAIN VIEW, CA"` → `{city: "MOUNTAIN VIEW", state: "CA"}`; the raw string is preserved as `location`
- **Year inference** — derived from `submitDate` when present

#### Output pipeline

- **Salary-band post-filter** — `minSalary` / `maxSalary` applied before push
- **Deduplication** — `(employer | jobTitle | baseSalary | submitDate | city)` is hashed; duplicates within a run are dropped
- **Hard cap** — `maxRecords` stops the loop early (set `0` for unlimited)
- **Empty-result safety net** — if zero records are scraped after all tasks complete, the run is marked `FAILED` via `Actor.fail()` so monitoring/scheduling integrations can alert

#### Proxy

Proxy is **disabled by default**. h1bdata.info has no rate-limit, no CAPTCHA, and no IP-based throttle. You only need to enable Apify Proxy if you're running massive parallel jobs (e.g. 10,000+ employer queries in one run) and want to be courteous about IP diversity.

***

### Input Parameters

```json
{
  "employers": ["google", "meta", "amazon"],
  "jobTitles": ["software engineer", "machine learning engineer"],
  "cities": ["mountain view", "menlo park", "seattle"],
  "year": "2024",
  "minSalary": 150000,
  "maxSalary": 0,
  "maxRecords": 5000,
  "requestDelay": 1000,
  "startUrls": [],
  "proxyConfiguration": { "useApifyProxy": false }
}
```

#### Parameter reference

| Parameter | Type | Default | Description |
|---|---|---|---|
| `startUrls` | `array<string\|object>` | `[]` | Paste any `h1bdata.info` search URL directly (e.g. from your browser). When non-empty, **overrides** all other filter fields. |
| `employers` | `array<string>` | `["google"]` | Free-text employer names, partial match, case-insensitive. Examples: `"google"`, `"goldman sachs"`, `"infosys"`. Each entry runs as its own task. |
| `jobTitles` | `array<string>` | `["software engineer"]` | Free-text job-title queries, partial match. Each combines with each employer × city as a separate task. |
| `cities` | `array<string>` | `[]` | US-city filter, case-insensitive. Empty array = nationwide. Cross-products with employers and job titles. |
| `year` | `enum<string>` | `"All Years"` | DOL fiscal year. `"All Years"` for full 2014-present history or `"2026"` → `"2014"` for a single year. |
| `minSalary` | `integer` | `0` | Post-filter: drop rows where `baseSalary < minSalary`. `0` = no lower bound. |
| `maxSalary` | `integer` | `0` | Post-filter: drop rows where `baseSalary > maxSalary`. `0` = no upper bound. |
| `maxRecords` | `integer` | `500` | Hard cap across all tasks. `0` = unlimited. Popular queries can yield 30,000+ rows — set generously. |
| `requestDelay` | `integer` | `1000` | Milliseconds between requests. h1bdata has no rate-limit but 500–2000 ms is polite. |
| `proxyConfiguration` | `object` | `{ useApifyProxy: false }` | Optional. h1bdata.info has no anti-bot, so proxy is **not required**. Only enable for very large multi-thousand-task runs. |

> **Tip:** Provide an empty `employers` array + a non-empty `jobTitles` or `cities` to search **across all employers** for that role/city. Just remember the URL needs at least one of `em` / `job` / `city` to return results.

***

### Output Schema

Every row in the dataset uses the same flat shape — easy to flatten into a relational table, Google Sheet, or DataFrame.

#### Core petition fields

| Field | Type | Description |
|---|---|---|
| `employer` | `string` | Petitioning employer as filed with DOL, exactly as published (often ALL CAPS, includes legal suffix — e.g. `GOOGLE LLC`, `META PLATFORMS INC`) |
| `jobTitle` | `string` | Job title on the LCA (e.g. `SOFTWARE ENGINEER`, `DATA SCIENTIST III`, `INVESTMENT BANKING ANALYST`) |
| `baseSalary` | `number\|null` | Annual base wage offered, in USD, normalized to an integer |
| `baseSalaryDisplay` | `string\|null` | Original salary string as shown on h1bdata.info (e.g. `$112,000`) |
| `city` | `string\|null` | City of the work location |
| `state` | `string\|null` | Two-letter US state code |
| `location` | `string\|null` | Raw combined location string (e.g. `MOUNTAIN VIEW, CA`) |
| `submitDate` | `string\|null` | LCA submit date in `YYYY-MM-DD` |
| `startDate` | `string\|null` | Intended employment start date in `YYYY-MM-DD` |
| `year` | `integer\|null` | DOL fiscal year inferred from `submitDate` |
| `caseStatus` | `string` | Always `"CERTIFIED"` — the DOL only publishes approved petitions |

#### Provenance / search-echo fields

| Field | Type | Description |
|---|---|---|
| `searchEmployer` | `string\|null` | Employer query that produced this row (or `null` if `startUrls` was used) |
| `searchJobTitle` | `string\|null` | Job-title query that produced this row |
| `searchCity` | `string\|null` | City query that produced this row |
| `searchYear` | `string` | Year query that produced this row (e.g. `"2024"`, `"All Years"`) |
| `sourceUrl` | `string` | Exact `h1bdata.info` URL the row was scraped from — fully reproducible |
| `scrapedAt` | `string` | ISO-8601 timestamp of extraction (UTC) |

#### Example: Google software engineer petition

```json
{
  "employer": "GOOGLE LLC",
  "jobTitle": "SOFTWARE ENGINEER",
  "baseSalary": 112000,
  "baseSalaryDisplay": "$112,000",
  "city": "DURHAM",
  "state": "NC",
  "location": "DURHAM, NC",
  "submitDate": "2024-03-04",
  "startDate": "2024-08-15",
  "year": 2024,
  "caseStatus": "CERTIFIED",
  "searchEmployer": "google",
  "searchJobTitle": "software engineer",
  "searchCity": null,
  "searchYear": "2024",
  "sourceUrl": "https://h1bdata.info/index.php?em=google&job=software+engineer&city=&year=2024",
  "scrapedAt": "2026-05-18T09:14:22.318Z"
}
```

#### Example: Goldman Sachs investment-banking analyst petition

```json
{
  "employer": "GOLDMAN SACHS & CO. LLC",
  "jobTitle": "INVESTMENT BANKING ANALYST",
  "baseSalary": 110000,
  "baseSalaryDisplay": "$110,000",
  "city": "NEW YORK",
  "state": "NY",
  "location": "NEW YORK, NY",
  "submitDate": "2024-01-22",
  "startDate": "2024-07-08",
  "year": 2024,
  "caseStatus": "CERTIFIED",
  "searchEmployer": "goldman sachs",
  "searchJobTitle": "investment banking analyst",
  "searchCity": "new york",
  "searchYear": "2024",
  "sourceUrl": "https://h1bdata.info/index.php?em=goldman+sachs&job=investment+banking+analyst&city=new+york&year=2024",
  "scrapedAt": "2026-05-18T09:14:24.402Z"
}
```

***

### Case Status & Visa Type Reference

The DOL only publishes **approved** petitions in the public disclosure file, so every row this actor returns has `caseStatus = "CERTIFIED"`. Denied, withdrawn, returned, or under-review cases are **not** disclosed and therefore **cannot** appear in this dataset.

#### Visa categories covered

| Visa | Description | In dataset? |
|---|---|---|
| **H1B** | Specialty occupation worker (Bachelor's+ degree role) | Yes — majority |
| **H1B1** | Singapore / Chile free-trade-agreement specialty worker | Yes |
| **E3** | Australian specialty occupation worker | Yes |
| H2A / H2B | Seasonal agricultural / non-agricultural workers | No (separate DOL feed) |
| L1A / L1B | Intra-company transferee | No (USCIS-only, not DOL) |
| O1 / O3 | Extraordinary ability | No (USCIS-only) |
| Green Card (PERM) | Permanent labor cert | No (separate DOL feed — see roadmap below) |

#### DOL fiscal-year coverage

| Year | Status |
|---|---|
| 2014 | Earliest year in h1bdata.info index |
| 2015 → 2023 | Full coverage |
| 2024 | Full coverage |
| 2025 | Rolling (DOL publishes quarterly) |
| 2026 | Partial — Q1 / Q2 typically available by mid-year |

***

### Use Cases

#### Immigration Law & LCA Case Research

Immigration attorneys, paralegals, and corporate immigration teams use this dataset to:

- **Pull employer petition history** for prevailing-wage attack arguments and RFE responses
- **Benchmark Level 1 vs Level 4 wage offers** for a given SOC code and metro to support LCA filings
- **Document an employer's H1B sponsorship pattern** for I-140 / I-485 case files
- **Track concurrent / amended petitions** by tracing repeated submit dates for one employer + job
- **Build evidence packets** for Department of Labor audits and Wage & Hour investigations
- **Cross-reference an employer's claimed wage** vs. what they've previously filed with DOL

#### Visa-Dependent Job Seekers (Students, F1, OPT, H1B Holders)

International students on F1/OPT and current H1B holders use the dataset to:

- **Identify visa-friendly employers** by ranking who actually sponsors in their target city + role
- **Set realistic salary expectations** by looking up the median LCA wage for their job title at the company they're interviewing with
- **Discover smaller sponsors** beyond the headline FAANG names — most petitions are filed by mid-cap firms
- **Time job changes** around H1B transfer windows using the start-date column
- **Avoid serial wage-suppressor employers** by flagging companies whose LCA wages sit consistently below the BLS Level 1
- **Negotiate offers** with real-world bid data from the same employer in the same role and metro

#### Recruiter & Talent-Intel Teams

Internal recruiters, RPO firms, and exec-search teams use the dataset for:

- **Competitor sponsorship analysis** — who in your industry is bringing in foreign talent, at what scale, in which functions?
- **Hot-title detection** — track quarter-over-quarter growth in titles like `AI Engineer`, `ML Researcher`, `Prompt Engineer` to spot category shifts
- **Sponsor-friendliness scorecards** for candidate-facing materials
- **Sourcing pools** — every employer in the dataset has, by definition, hired internationally before and may have a current opening profile
- **Pay-band benchmarking** against direct competitors using actual DOL filings (not survey medians)

#### HR Total-Rewards & Compensation Benchmarking

Comp & ben teams blend H1B disclosure data into compensation studies because **the LCA base wage is the offered base wage** — not a self-reported survey response:

- **Calibrate base salary structures** against peer companies in your metro
- **Build a free, real-world alternative** to Radford, Mercer, and WTW surveys for tech / finance / pharma roles
- **Quantify metro premiums** — same role in Bay Area vs Austin vs Atlanta, sourced from the same employer
- **Validate offer competitiveness** in retention reviews
- **Detect inadvertent wage compression** between long-tenured employees and incoming H1B hires
- **Inform pay-equity audits** by comparing internal salaries to external LCA-filed wages for matching titles

#### Investigative Journalism & Wage-Suppression Reporting

Reporters and data desks use H1B disclosure as a primary source for:

- **Wage-suppression investigations** — flag employers whose LCA wages cluster at DOL Level 1 even for senior titles
- **Body-shop / consultancy exposés** — identify outsourcing firms filing thousands of low-wage petitions per year
- **Geographic-arbitrage reporting** — companies headquartering filings in low-prevailing-wage metros while work is performed elsewhere
- **Tech-layoff coverage** — track post-layoff sponsorship pivots
- **Policy-impact stories** — quantify the real on-the-ground effect of every USCIS rule change

#### Government, Think Tank & Labor-Economics Research

Academic economists, policy shops, and public-sector analysts use this data to:

- **Estimate H1B labor-market effects** at MSA granularity
- **Inform STEM-workforce policy** with empirical wage and headcount data
- **Track industry shifts** — banking → big tech → AI → fintech sponsor mix evolution
- **Model wage-elasticity** of H1B supply by SOC code
- **Support immigration-policy testimony** with real disclosure data
- **Build replication datasets** for peer-reviewed labor-economics papers

#### Tech-Employer Ranking & Industry-Trend Newsletters

Trade publications and data-newsletter operators use the dataset to:

- **Publish "Top H1B Sponsors of {YEAR}"** rankings by total petitions and median wage
- **Run year-over-year comparisons** for FAANG, MAANG, AI labs, fintech, big pharma, and consulting
- **Build interactive dashboards** for paid subscribers (e.g. company search, role search, metro search)
- **Detect emerging hiring centers** — petitions in Austin, Miami, Raleigh, Bellevue growing faster than NYC/SF
- **Publish quarterly "AI Engineer wage tracker"** style data drops

#### University Career Services & International Student Advising

College career centers and graduate-school career offices use the dataset to:

- **Show students** which employers in their field have historically sponsored international hires
- **Benchmark offered salaries** for graduating MS/PhD students by program and metro
- **Build alumni-employer connections** by surfacing alumni-heavy sponsors
- **Justify program ROI** with concrete post-graduation sponsorship outcomes
- **Coach students** on which employers are realistic sponsorship targets vs. long shots

#### Compliance, Audit & M\&A Due Diligence

Corporate-development teams and external auditors use H1B disclosure as a verifiable third-party data point:

- **Verify an acquisition target's sponsorship history** during M\&A due diligence — undisclosed H1B obligations are post-close liabilities
- **Detect undisclosed offshore-staffing arrangements** by cross-checking petition volume vs. headcount disclosures
- **Validate prevailing-wage compliance** when the target is a federal contractor
- **Audit subcontractor labor practices** in supply-chain due diligence
- **Support post-acquisition integration planning** by mapping H1B-dependent talent that needs visa transfers

#### Real Estate & Location Intelligence

Site-selection analysts and CRE teams treat H1B petition density as a leading indicator of high-income knowledge-worker housing demand:

- **Forecast luxury-rental demand** in metros with rising H1B petition counts
- **Validate corporate-relocation rumors** before HQ announcements (filings shift months ahead of press releases)
- **Build neighborhood comps** that account for international-hire population growth
- **Inform retail / hospitality investment** in emerging tech-talent corridors

***

### Sample Queries & Recipes

#### Recipe 1: All Google software-engineer petitions for FY 2024 (the verified smoke test)

```json
{
  "employers": ["google"],
  "jobTitles": ["software engineer"],
  "year": "2024",
  "maxRecords": 1000
}
```

This is the live smoke-test query — 30 records returned with 100% field coverage on the verification run, including rows like `GOOGLE LLC | SOFTWARE ENGINEER | $112,000 | DURHAM, NC | 2024-03-04 | CERTIFIED`.

#### Recipe 2: Top FAANG ML / AI hiring across 2024–2025

```json
{
  "employers": ["google", "meta", "amazon", "apple", "microsoft", "nvidia", "openai", "anthropic"],
  "jobTitles": ["machine learning engineer", "research engineer", "applied scientist", "ai engineer"],
  "year": "All Years",
  "minSalary": 200000,
  "maxRecords": 10000
}
```

#### Recipe 3: Wall Street quant & banking analyst comp benchmark, NYC only

```json
{
  "employers": ["goldman sachs", "morgan stanley", "jpmorgan chase", "citi", "bank of america", "jane street", "citadel", "two sigma"],
  "jobTitles": ["quantitative analyst", "investment banking analyst", "software engineer"],
  "cities": ["new york"],
  "year": "2024"
}
```

#### Recipe 4: Indian-IT body-shop volume tracker

```json
{
  "employers": ["infosys", "tata consultancy services", "wipro", "cognizant", "hcl", "tech mahindra", "capgemini"],
  "jobTitles": ["programmer analyst", "systems analyst", "consultant"],
  "year": "All Years",
  "maxRecords": 50000
}
```

#### Recipe 5: Sponsor-friendliness scan for a specific role across every US metro

```json
{
  "employers": [],
  "jobTitles": ["data scientist"],
  "year": "2024",
  "minSalary": 120000,
  "maxRecords": 20000
}
```

#### Recipe 6: Pharma & biotech R\&D petitions

```json
{
  "employers": ["pfizer", "moderna", "merck", "genentech", "regeneron", "vertex", "eli lilly"],
  "jobTitles": ["scientist", "research associate", "bioinformatics scientist"],
  "year": "All Years"
}
```

#### Recipe 7: Tiny test run — 10 rows to validate your pipeline before a big scrape

```json
{
  "employers": ["amazon"],
  "jobTitles": ["software development engineer"],
  "year": "2024",
  "maxRecords": 10
}
```

#### Recipe 8: Direct URL replay — paste a search you built in your browser

```json
{
  "startUrls": [
    "https://h1bdata.info/index.php?em=stripe&job=&city=san+francisco&year=2024"
  ]
}
```

***

### Integration Examples

#### Google Sheets

Schedule the actor daily, attach Apify's **"Save to Google Sheets"** integration, and your team has a living view of (for example) every petition your competitors filed last quarter — refreshed without anyone touching a spreadsheet.

#### Make.com / Zapier / n8n

Trigger downstream workflows on each new run:

- New rows where `baseSalary > $250,000` → send to Slack `#comp-intel`
- New petitions from any competitor in your tracked list → create a HubSpot deal task
- New employer first-time-sponsor detected → send to your sales / recruiter pipeline

#### Power BI / Tableau / Looker / Mode

Pull Apify's run results into your BI tool of choice via the Apify REST API and build:

- **Top-100 H1B sponsors by year** league tables
- **Median LCA wage by SOC + metro** heat maps
- **Year-over-year petition growth** for any company
- **Wage-band distribution** by employer + role

#### Postgres / Snowflake / BigQuery / Databricks

POST run results to your warehouse via Apify's [webhook integration](https://docs.apify.com/platform/integrations/webhooks). Suggested schema:

```sql
CREATE TABLE h1b_petitions (
    id                BIGSERIAL PRIMARY KEY,
    employer          TEXT NOT NULL,
    job_title         TEXT,
    base_salary       INTEGER,
    city              TEXT,
    state             CHAR(2),
    submit_date       DATE,
    start_date        DATE,
    fiscal_year       SMALLINT,
    case_status       TEXT DEFAULT 'CERTIFIED',
    source_url        TEXT,
    scraped_at        TIMESTAMPTZ,
    UNIQUE (employer, job_title, base_salary, submit_date, city)
);

CREATE INDEX idx_h1b_employer    ON h1b_petitions (employer);
CREATE INDEX idx_h1b_city_state  ON h1b_petitions (city, state);
CREATE INDEX idx_h1b_year_title  ON h1b_petitions (fiscal_year, job_title);
```

#### Salesforce / HubSpot CRM Enrichment

For staffing, recruiting, and corporate immigration firms: nightly-run the actor against your tracked-employer list, then upsert against Account records — `H1B_Petitions_Last_12mo__c` becomes a high-signal lead-scoring field.

#### Webhook → Slack / Discord / Email

Trigger a Make/Zapier webhook on Apify's `ACTOR.RUN.SUCCEEDED` event, parse the dataset, and post highlights:

> *"Stripe filed 47 new H1B petitions this quarter — 31 are SF, 12 are NYC, median base $182k. Top role: Software Engineer."*

***

### Major US Metros for H1B Activity

| Metro | State | Why it matters for H1B data |
|---|---|---|
| San Francisco / Bay Area | CA | Highest median LCA wage in the country; FAANG, AI labs, fintech |
| New York / NYC | NY | Wall Street, consulting (McKinsey/BCG/Bain), big-law support roles |
| Seattle / Bellevue | WA | Amazon, Microsoft — two largest H1B sponsors by volume historically |
| Austin | TX | Fastest-growing tech metro; Apple, Oracle, Tesla expansions |
| Boston / Cambridge | MA | Pharma + biotech (Moderna, Vertex, Genentech), Big Tech Cambridge campuses |
| Chicago | IL | Trading firms (Citadel, Jump, IMC, DRW), consulting back offices |
| Atlanta | GA | Coca-Cola, Delta, Truist, fintech (NCR, Equifax) |
| Dallas / Plano | TX | JPMorgan, AT\&T, Toyota, healthcare IT |
| Houston | TX | Energy majors (ExxonMobil, Chevron, Shell), healthcare |
| Washington DC / NoVa | DC / VA | Federal contractors, AWS GovCloud HQ, defense primes |
| Raleigh-Durham | NC | RTP corridor — IBM, Cisco, Apple expansion |
| Phoenix / Tempe | AZ | TSMC fab, semiconductor expansion, financial services |
| Miami | FL | Crypto, hedge funds relocating from NY/SF |
| Mountain View / Sunnyvale / Menlo Park | CA | Google, Meta, LinkedIn HQ campuses |
| San Jose / Santa Clara | CA | NVIDIA, Cisco, Adobe, Intel |

***

### Cost & Performance

| Metric | Value |
|---|---|
| Engine | HTTP-only (`got-scraping` + `cheerio`) — no browser |
| Runtime, single small query (10 rows) | 2 – 5 seconds |
| Runtime, single popular query (30,000 rows / 18MB) | 10 – 30 seconds |
| Runtime, 50 employer × 5 job × 1 year cross-product | 1 – 5 minutes (with default 1s polite delay) |
| Cost per typical run | a few cents (pay-per-event) |
| Pricing model | Pay-per-event — actor start + per dataset item |
| Data freshness | Live at run time — h1bdata.info refreshes with each DOL disclosure release |
| Auth required | None |
| Proxy required | No (optional, disabled by default) |
| Concurrency | Safe to run many parallel filtered configurations |
| Memory footprint | 256 MB sufficient for most runs; 1024 MB for huge multi-thousand-task jobs |
| Retry | 3 attempts per request, exponential backoff (`2s × attempt + jitter`) |
| Timeout | 60-second per-request HTTP timeout |
| Failure mode | `Actor.fail()` if zero records scraped (alerts your monitoring) |

***

### Compliance, Privacy & Legal Notes

- **Public-record data only.** Every field this actor returns is published by the **US Department of Labor** under the public-disclosure requirements of **Title 20 CFR §655.760**. The DOL publishes the data; h1bdata.info re-publishes it; this actor structures it. Nothing in the output is private, leaked, or non-public.
- **No PII beyond what the DOL already published.** Employer and job title are corporate identifiers, not personal. The disclosure file does **not** include the foreign worker's name, passport number, or contact info.
- **No PHI.** Pharma / biotech petitions are listed by employer + job title only; there is no patient data anywhere in the dataset.
- **No SSNs, passport numbers, or visa-petition USCIS receipt numbers.**
- **Source attribution is preserved** in every row (`sourceUrl`) — useful for journalists and academics who need to cite primary sources.
- **Respect h1bdata.info's terms of service** and load profile. The default 1-second polite delay between requests exists for that reason — do not lower it unnecessarily.
- **GDPR / CCPA** are not implicated for the petitioning *employer* (corporate entity); the foreign worker is not personally identified in the public file.
- **Permissible uses include**: research, journalism, recruiting / sourcing intelligence, immigration-law case work, comp benchmarking, policy analysis, and competitive intelligence.
- **Do not use this data** for: harassment, doxxing, discriminatory employment decisions targeting visa status (which would violate **8 USC §1324b**), or any deceptive marketing claim that misrepresents data freshness or origin.

> **Important:** Disclosure data shows the **wage offered** on the LCA — not necessarily the wage **actually paid**, not signing bonuses, not RSUs, not deferred comp. Treat it as a floor / benchmark, not as a complete-compensation figure.

***

### Frequently Asked Questions

#### How fresh is the data?

The actor scrapes h1bdata.info **live at run time**. h1bdata.info ingests new petitions every time the DOL publishes a new quarterly disclosure file (typically 8–12 weeks after the quarter closes). So FY 2024 Q4 petitions become visible roughly mid-2025, FY 2025 Q1 petitions in mid-to-late 2025, and so on. There is no faster public source for this data.

#### How many records exist in total?

The DOL has certified **8 million+** H1B / H1B1 / E3 petitions since fiscal year 2014. The exact number visible on any given day depends on which quarters h1bdata.info has ingested. A single popular query like `Google + Software Engineer` (all years) can return **30,000+ rows on its own**.

#### Why is `caseStatus` always `CERTIFIED`?

The US Department of Labor only publishes **approved** petitions in the public disclosure file. Denied, withdrawn, returned-for-correction, and under-review cases are **not disclosed** and therefore cannot appear in this dataset.

#### Does this scraper require login, API key, or CAPTCHA solving?

No. h1bdata.info is fully public, has no login, no CAPTCHA, no anti-bot system. You only need an Apify account to run the actor.

#### Do I need to use a proxy?

No. Proxy is disabled by default. h1bdata.info has no rate-limit or IP-based throttle. The proxy option exists only for very large parallel runs where IP diversity is desirable for politeness.

#### Why does my run sometimes take 20–30 seconds for a single query?

Popular queries (a famous employer + common title across all years) return **massive HTML payloads** — sometimes 18MB+ with 30,000+ rows. The 60-second per-request timeout is calibrated for exactly this case. Smaller queries return in 2–5 seconds.

#### What happens if I supply an empty query?

h1bdata.info rejects empty queries (returns no results). The actor builds its task list from the cross-product of `employers × jobTitles × cities` and **skips** combinations where all three are empty. If zero tasks survive, the run fails fast with a clear message; if every task returns zero rows, `Actor.fail()` is called so your monitoring/scheduling integration can alert.

#### Does the actor return denied / withdrawn / pending petitions?

No — see above. Only DOL-certified petitions are in the public file.

#### Does the dataset include the foreign worker's name?

**No.** The DOL public-disclosure file deliberately omits the foreign worker's personal identity. The published fields are: petitioning **employer**, **job title**, **base wage offered**, **work location**, **submit date**, and **start date**.

#### Does it include SOC codes, prevailing-wage level, or worksite ZIP?

Not in the h1bdata.info interface. h1bdata.info publishes the user-friendly subset of the DOL file. For the full raw file (with SOC code, prevailing-wage level, full address, agent attorney, etc.) download the quarterly DOL disclosure files directly from [dol.gov/agencies/eta/foreign-labor/performance](https://www.dol.gov/agencies/eta/foreign-labor/performance) and join on employer + submit date.

#### Can I get green-card / PERM disclosure data?

PERM (permanent labor certification) is a separate DOL disclosure feed, not on h1bdata.info. It is on the roadmap as a separate Apify actor — open a feature request if you need it sooner.

#### Can I filter to only H1B (excluding H1B1 / E3)?

Not directly — h1bdata.info does not expose visa subtype on the result row. The vast majority of petitions in the file are H1B; H1B1 (Singapore/Chile) and E3 (Australia) together are <2% of volume.

#### How do I get every petition from a specific employer across all years?

```json
{ "employers": ["openai"], "jobTitles": [""], "year": "All Years", "maxRecords": 0 }
```

A blank job title combined with an employer will return every role that employer has ever sponsored. Set `maxRecords: 0` for unlimited.

#### What about employer-name variations (e.g. "Google" vs "Google LLC" vs "Alphabet Inc")?

h1bdata.info does **case-insensitive partial matching** on the employer string. `"google"` will match `GOOGLE LLC`, `GOOGLE INC.`, `GOOGLE PAYMENT CORP`, etc. For maximum recall, also try the parent name (`alphabet`) and any DBA / acquired-subsidiary names you know.

#### Why are some salary cells null?

Very rarely a row on h1bdata.info has a missing or malformed salary cell (an edge case in older 2014–2015 data). The actor parses what it can and emits `baseSalary: null` rather than dropping the row, so you can decide downstream how to handle them.

#### Pivot history — what was this actor before?

This actor was previously published as `salary-com-scraper`. After a usefulness audit it became clear that the underlying **Salary.com** data largely duplicated freely-available **BLS Occupational Employment Statistics** and **Glassdoor** content — not a niche worth maintaining. The actor was **repivoted to the H1B disclosure niche** in May 2026 because the source data is **genuinely unique, primary-source, federally-mandated, and very high-value** for immigration law, recruiting, journalism, and policy research. The actor ID is unchanged; only the source target, schema, and engine differ.

#### Does this work on the Apify Free Plan?

Yes — full functionality on the free tier. A typical filtered run costs a few cents in compute units.

#### Can I schedule this to run daily / weekly / monthly?

Yes — Apify's built-in Scheduler lets you trigger this actor on any cron expression. Combine with webhook outputs for fully automated H1B-intel pipelines.

#### What formats can I export the data in?

JSON, JSONL (streaming), CSV, Excel (XLSX), HTML, XML, RSS — directly from the Apify dataset view, or via the Apify REST API for programmatic consumers.

#### Are there competing data sources?

The two main competing surfaces are **MyVisaJobs** and **H1BGrader**. Both ultimately source from the same DOL disclosure file; h1bdata.info is the longest-running, fastest-querying, and least-monetized of the three, which is why it was chosen as the scrape target.

#### How do I report a bug or request a feature?

Open an issue on the Apify Store page or contact the developer directly through the Apify Console profile.

***

### Related Apify Actors by Haketa

If you're building a US labor-market, jobs, or federal-disclosure intelligence stack, these companion actors pair well with the H1B Visa Database Scraper:

- [SEEK Scraper (Australia / NZ)](https://apify.com/haketa/seek-scraper) — live job listings from APAC's largest job board
- [Levels.fyi Scraper](https://apify.com/haketa/levels-fyi-scraper) — self-reported tech compensation (base + equity + bonus) — the perfect complement to LCA wage data
- [ProductHunt Launches & Makers Scraper](https://apify.com/haketa/producthunt-launches-scraper) — daily startup launches, makers, votes & reviews — VC/founder/recruiter intel
- [TTB Alcohol Permittee Scraper](https://apify.com/haketa/ttb-alcohol-permittee-scraper) — another federal public-disclosure dataset (Treasury / TTB) in the same legal family
- [SAM.gov Federal Contractor Entity Scraper](https://apify.com/haketa/sam-gov-federal-contractor-scraper) — every entity registered to do business with the US federal government
- [Texas Pharmacy License Scraper — TSBP](https://apify.com/haketa/tsbp-license-scraper) — state-licensed pharmacist / pharmacy directory
- [Ohio eLicense Scraper](https://apify.com/haketa/ohio-elicense-scraper) — Ohio professional licenses
- [Illinois IDFPR License Scraper](https://apify.com/haketa/illinois-idfpr-license-scraper) — Illinois licensed professionals
- [California DCA Professional License Scraper](https://apify.com/haketa/california-dca-license-scraper) — California consumer-affairs licensees
- [Colorado Professional License Scraper](https://apify.com/haketa/colorado-professional-license-scraper) — Colorado DORA licenses
- [BBB Business Scraper](https://apify.com/haketa/bbb-scraper) — Better Business Bureau company profiles

***

### Comparison vs. Alternatives

| Approach | Setup time | Data freshness | Cost (10k rows) | Schema normalization | Filtering | Provenance |
|---|---|---|---|---|---|---|
| **This actor** | < 1 minute | Live at run | a few cents | Yes — built-in | Employer × Job × City × Year + salary band | Per-row source URL |
| Manual h1bdata.info browsing | Hours / days | Live | Free | None | UI only | None |
| DOL raw quarterly CSV download | 4–8 hours dev | 8–12 weeks lagged | Free + infra | DIY | DIY | Manual |
| MyVisaJobs paid subscription | Minutes | Live | $50–500+/mo | Yes | Limited UI | None |
| Custom Python + requests + BeautifulSoup | 1–2 days dev | Live | Free + infra | DIY | DIY | DIY |
| Hand-built per-employer cron + S3 + Athena | 1–2 weeks dev | Quarterly | $$$ | DIY | SQL | Manual |

***

### Why Pay-Per-Event Pricing?

Most data products either lock you into a monthly seat license (you pay even when idle) or charge per Compute Unit (unpredictable bills). This actor uses Apify's **pay-per-event** model:

- You only pay when the actor actually runs
- Charges scale linearly with how many rows you actually consume
- Transparent line-item billing in the Apify console
- No monthly minimums, no annual contracts
- Free to evaluate — set `maxRecords: 10` and validate the schema before scaling up
- Perfect for both one-off research projects and high-frequency production scrapers

***

### Changelog

| Version | Date | Notes |
|---|---|---|
| 1.0.0 | 2026-05-18 | Initial public release of the H1B Visa Database Scraper — HTTP-only via `got-scraping` + `cheerio`, full h1bdata.info filter parity, salary-band post-filter, deduplication, `Actor.fail()` on empty results |
| (pre-1.0) | 2024–2026 | Same actor ID was previously published as `salary-com-scraper`; repivoted to H1B disclosure data because Salary.com largely duplicated freely-available BLS/Glassdoor content while the H1B niche is genuinely unique and high-value |

***

### Keywords

H1B visa scraper · H1B salary database · H1B sponsor lookup · h1bdata.info scraper · US DOL H1B disclosure · H1B salary by employer · H1B salary by job title · H1B prevailing wage · H1B sponsorship history · immigration salary data · H1B visa API · LCA database scraper · Labor Condition Application data · DOL Office of Foreign Labor Certification scraper · H1B sponsor search · H1B sponsor history lookup · H1B base wage scraper · H1B job title salary · H1B petition data · H1B disclosure data extraction · H1B FAANG salaries · Google H1B salary · Meta H1B salary · Amazon H1B salary · Microsoft H1B salary · Apple H1B salary · NVIDIA H1B salary · OpenAI H1B salary · Infosys H1B petitions · TCS H1B petitions · Goldman Sachs H1B salary · JPMorgan H1B salary · H1B Bay Area salary · H1B NYC salary · H1B Seattle salary · H1B Austin salary · H1B Boston salary · immigration attorney data scraping · recruiter intel scraper · compensation benchmarking API · H1B prevailing wage compliance · H1B journalism data · H1B policy research dataset · H1B M\&A due diligence · Apify H1B actor · H1B1 visa data · E3 visa data · Title 20 CFR §655.760 disclosure

***

### Support

- **Bug reports:** Use the **Issues** tab on the [Apify Store page](https://apify.com/haketa/h1b-visa-database-scraper)
- **Feature requests:** Same place — please describe your use case so we can prioritize realistically
- **Direct contact:** Through the Apify developer profile (haketa)
- **Roadmap requests welcome:** PERM / green-card disclosure scraper, H2B seasonal-worker scraper, USCIS receipt-number enrichment, prevailing-wage Level 1–4 inference

If this actor saves you time on immigration research, recruiter intel, comp benchmarking, or policy reporting, a **5-star rating** on the Apify Store helps other professionals discover it. Thank you.

# Actor input Schema

## `startUrls` (type: `array`):

Paste any h1bdata.info search URL directly. Example: 'https://h1bdata.info/index.php?em=google\&job=software+engineer\&city=\&year=2024'. When provided, overrides the filter fields below.

## `employers` (type: `array`):

Free-text employer names (case-insensitive partial match). Examples: 'google', 'meta', 'amazon', 'goldman sachs'. Each runs as a separate task. Leave empty + use other filters for any-employer search.

## `jobTitles` (type: `array`):

Free-text job-title queries (partial match). Examples: 'software engineer', 'data scientist', 'product manager', 'machine learning engineer'. Each combines with each employer as a separate task.

## `cities` (type: `array`):

Filter results by city (US only, case-insensitive). Examples: 'san francisco', 'new york', 'austin', 'seattle'. Leave empty for nationwide.

## `year` (type: `string`):

DOL fiscal year of the H1B petition. 'All Years' returns the full history (2014-present). Specific years return that year only.

## `minSalary` (type: `integer`):

Post-filter: only return rows with base salary >= this value. 0 = no filter.

## `maxSalary` (type: `integer`):

Post-filter: only return rows with base salary <= this value. 0 = no filter.

## `maxRecords` (type: `integer`):

Hard cap on total records returned across all tasks. Set 0 for unlimited. Single popular employer+job searches can return 30,000+ rows.

## `requestDelay` (type: `integer`):

Delay between HTTP requests. h1bdata.info has no rate-limit, but 500-2000ms is polite.

## `proxyConfiguration` (type: `object`):

Optional. h1bdata.info has no anti-bot, so proxy is NOT required. Only use if you need IP rotation for very large jobs.

## Actor input object example

```json
{
  "startUrls": [],
  "employers": [
    "google"
  ],
  "jobTitles": [
    "software engineer"
  ],
  "cities": [],
  "year": "All Years",
  "minSalary": 0,
  "maxSalary": 0,
  "maxRecords": 100,
  "requestDelay": 1000,
  "proxyConfiguration": {
    "useApifyProxy": false
  }
}
```

# Actor output Schema

## `employer` (type: `string`):

Petitioning employer name

## `jobTitle` (type: `string`):

Petitioned job title

## `baseSalary` (type: `string`):

Annual base salary in USD

## `city` (type: `string`):

Work location city

## `state` (type: `string`):

Work location state

## `submitDate` (type: `string`):

Petition submission date

## `startDate` (type: `string`):

Petitioned employment start date

## `year` (type: `string`):

Fiscal year

## `caseStatus` (type: `string`):

DOL case status (CERTIFIED, etc.)

## `sourceUrl` (type: `string`):

h1bdata.info search URL

## `scrapedAt` (type: `string`):

ISO timestamp

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {
    "startUrls": [],
    "employers": [
        "google"
    ],
    "jobTitles": [
        "software engineer"
    ],
    "cities": [],
    "maxRecords": 100,
    "proxyConfiguration": {
        "useApifyProxy": false
    }
};

// Run the Actor and wait for it to finish
const run = await client.actor("haketa/h1b-visa-database-scraper").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = {
    "startUrls": [],
    "employers": ["google"],
    "jobTitles": ["software engineer"],
    "cities": [],
    "maxRecords": 100,
    "proxyConfiguration": { "useApifyProxy": False },
}

# Run the Actor and wait for it to finish
run = client.actor("haketa/h1b-visa-database-scraper").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{
  "startUrls": [],
  "employers": [
    "google"
  ],
  "jobTitles": [
    "software engineer"
  ],
  "cities": [],
  "maxRecords": 100,
  "proxyConfiguration": {
    "useApifyProxy": false
  }
}' |
apify call haketa/h1b-visa-database-scraper --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=haketa/h1b-visa-database-scraper",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "H1B Visa Database Scraper | DOL Disclosure Salaries",
        "description": "Scrape the US H1B Visa database (h1bdata.info) — public Department of Labor disclosure data. Per-employer, per-job, per-city, per-year salary records with submit/start dates. 8M+ approved cases since 2014. Critical for immigration attorneys, job seekers, recruiter intel.",
        "version": "0.0",
        "x-build-id": "4EHWyUlquTKIW6erB"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/haketa~h1b-visa-database-scraper/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-haketa-h1b-visa-database-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/haketa~h1b-visa-database-scraper/runs": {
            "post": {
                "operationId": "runs-sync-haketa-h1b-visa-database-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/haketa~h1b-visa-database-scraper/run-sync": {
            "post": {
                "operationId": "run-sync-haketa-h1b-visa-database-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "properties": {
                    "startUrls": {
                        "title": "Start URLs",
                        "type": "array",
                        "description": "Paste any h1bdata.info search URL directly. Example: 'https://h1bdata.info/index.php?em=google&job=software+engineer&city=&year=2024'. When provided, overrides the filter fields below.",
                        "items": {
                            "type": "object",
                            "required": [
                                "url"
                            ],
                            "properties": {
                                "url": {
                                    "type": "string",
                                    "title": "URL of a web page",
                                    "format": "uri"
                                }
                            }
                        }
                    },
                    "employers": {
                        "title": "Employers",
                        "type": "array",
                        "description": "Free-text employer names (case-insensitive partial match). Examples: 'google', 'meta', 'amazon', 'goldman sachs'. Each runs as a separate task. Leave empty + use other filters for any-employer search.",
                        "items": {
                            "type": "string"
                        }
                    },
                    "jobTitles": {
                        "title": "Job Titles",
                        "type": "array",
                        "description": "Free-text job-title queries (partial match). Examples: 'software engineer', 'data scientist', 'product manager', 'machine learning engineer'. Each combines with each employer as a separate task.",
                        "items": {
                            "type": "string"
                        }
                    },
                    "cities": {
                        "title": "Cities",
                        "type": "array",
                        "description": "Filter results by city (US only, case-insensitive). Examples: 'san francisco', 'new york', 'austin', 'seattle'. Leave empty for nationwide.",
                        "items": {
                            "type": "string"
                        }
                    },
                    "year": {
                        "title": "Fiscal Year",
                        "enum": [
                            "All Years",
                            "2026",
                            "2025",
                            "2024",
                            "2023",
                            "2022",
                            "2021",
                            "2020",
                            "2019",
                            "2018",
                            "2017",
                            "2016",
                            "2015",
                            "2014"
                        ],
                        "type": "string",
                        "description": "DOL fiscal year of the H1B petition. 'All Years' returns the full history (2014-present). Specific years return that year only.",
                        "default": "All Years"
                    },
                    "minSalary": {
                        "title": "Minimum Salary (USD)",
                        "minimum": 0,
                        "type": "integer",
                        "description": "Post-filter: only return rows with base salary >= this value. 0 = no filter.",
                        "default": 0
                    },
                    "maxSalary": {
                        "title": "Maximum Salary (USD)",
                        "minimum": 0,
                        "type": "integer",
                        "description": "Post-filter: only return rows with base salary <= this value. 0 = no filter.",
                        "default": 0
                    },
                    "maxRecords": {
                        "title": "Max Records",
                        "minimum": 0,
                        "type": "integer",
                        "description": "Hard cap on total records returned across all tasks. Set 0 for unlimited. Single popular employer+job searches can return 30,000+ rows.",
                        "default": 100
                    },
                    "requestDelay": {
                        "title": "Request Delay (ms)",
                        "minimum": 100,
                        "maximum": 10000,
                        "type": "integer",
                        "description": "Delay between HTTP requests. h1bdata.info has no rate-limit, but 500-2000ms is polite.",
                        "default": 1000
                    },
                    "proxyConfiguration": {
                        "title": "Proxy Configuration",
                        "type": "object",
                        "description": "Optional. h1bdata.info has no anti-bot, so proxy is NOT required. Only use if you need IP rotation for very large jobs."
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
