MSHA Mine Data Retrieval Scraper - US Mines Production + Safety avatar

MSHA Mine Data Retrieval Scraper - US Mines Production + Safety

Pricing

Pay per event

Go to Apify Store
MSHA Mine Data Retrieval Scraper - US Mines Production + Safety

MSHA Mine Data Retrieval Scraper - US Mines Production + Safety

Extract US mine records from MSHA Open Government Data CSVs. Covers coal, metal, and nonmetal mines: mine ID, operator, controller, state/county, geocoordinates, mine type, commodity, status, employees, quarterly production tons, violations, and accidents.

Pricing

Pay per event

Rating

0.0

(0)

Developer

BowTiedRaccoon

BowTiedRaccoon

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

a day ago

Last modified

Share

MSHA Mine Data Retrieval Scraper

Extracts US mine records from the MSHA Open Government Data portal. Returns mine identity, operator and controller info, geocoordinates, commodity classification, status, employee counts, and — optionally — quarterly production tons, ownership history, violations, and accident counts for all 91,000+ mines in the MSHA registry.

MSHA Mine Scraper Features

  • Extracts 26+ fields per mine record — mine ID, name, operator, controller (ultimate parent), state, county, FIPS code, lat/lon, mine type, mine class, commodity, SIC code, status, employees, operating days, nearest town, MSHA district and office
  • Filters by mine class — coal, metal/nonmetal, or all
  • Filters by commodity — substring match against SIC description (lithium, copper, iron, bituminous, anthracite, stone, sand, gravel, and any other MSHA commodity label)
  • Filters by status — Active, Temporarily Idled, NonProducing, Abandoned, New Mine, Intermittent, or all
  • Joins quarterly production history — all historical production tons, hours worked, and average employee counts by quarter and subunit, back to the first recorded year
  • Joins operator/controller history — the full M&A ownership chain with start/end dates for each controller and operator
  • Joins violations — citation count for the past 12 months plus a structured history of recent violations with section-of-act and violation type
  • Joins accident records — injury and accident count for the past 12 months
  • No proxy required — MSHA open data is public and does not block bulk downloads

What Can You Do With MSHA Mine Data?

  • Utility and coal analysts — pull quarterly production tons by mine, subunit, and commodity to track output trends without manual MSHA portal navigation
  • Critical-minerals researchers — filter active metal/nonmetal mines by commodity (lithium, copper, cobalt, rare earths) and get operator, location, and production context in one run
  • Mining M&A advisory — join the controller history dataset to reconstruct the full ownership chain for a target mine or portfolio
  • Environmental NGOs — identify active surface mines by state and county, then enrich with violation counts to prioritize investigation targets
  • Compliance teams — pull violation and accident histories to benchmark a mine's safety record against peers in the same district
  • Data journalists — map every active mine in a given state with geocoordinates and production figures, without downloading and parsing multiple MSHA ZIP files by hand

How It Works

  1. Downloads the MSHA Mines registry — the master registry ZIP (7.3 MB compressed, ~91,000 rows) is pulled from the MSHA open data endpoint and parsed in memory
  2. Applies your filters — coal/metal class, status, and commodity filters run against the registry before any joins, so enrichment datasets only load for mines that match
  3. Joins optional datasets in parallel — if you request quarterly production, controller history, violations, or accidents, those ZIPs are downloaded concurrently and indexed by MINE_ID
  4. Returns structured records — each mine record is written to the dataset with flat fields for registry data and JSON strings for the array joins (production, history, violations)

The production quarterly dataset is 56 MB compressed and covers 35,000 unique mines. Controller history runs ~119 MB. Factor that into your run time when enabling those options.

MSHA Mine Scraper Input

{
"coalOrMetal": "M",
"commodityFilter": "copper",
"mineStatus": "Active",
"includeProductionHistory": true,
"includeControllerHistory": false,
"includeViolations": false,
"includeAccidents": false,
"maxItems": 100,
"sp_intended_usage": "critical minerals research",
"sp_improvement_suggestions": "none"
}
FieldTypeDefaultDescription
coalOrMetalstring"all"Mine class filter: "all", "C" (coal only), or "M" (metal/nonmetal only)
commodityFilterstring""Substring match against SIC description (case-insensitive). Leave blank for all commodities.
mineStatusstring"Active"Status filter: Active, Temporarily Idled, NonProducing, Abandoned, Abandoned and Sealed, New Mine, Intermittent, or all
includeProductionHistorybooleantrueJoin quarterly production CSV to add production_quarterly
includeControllerHistorybooleanfalseJoin controller/operator history CSV to add operator_history
includeViolationsbooleanfalseJoin violations CSV to add violations_count_12mo and violations_history
includeAccidentsbooleanfalseJoin accidents CSV to add accidents_12mo
maxItemsinteger10Maximum records to return. Set to 0 for unlimited.
sp_intended_usagestringRequired. Describe your intended use of this data.
sp_improvement_suggestionsstringRequired. Share any suggestions for improving the actor.

MSHA Mine Scraper Output Fields

{
"mine_id": "4200017",
"mine_name": "EMERALD MINE NO 1",
"coal_metal_ind": "C",
"mine_type": "Underground",
"mine_status": "Active",
"status_date": "1978-12-15",
"controller_id": "0000055",
"controller_name": "CONSOL ENERGY INC",
"controller_start_date": "2020-01-01",
"operator_id": "0218869",
"operator_name": "CONSOL PENNSYLVANIA COAL COMPANY LLC",
"state": "PA",
"county": "GREENE",
"fips_county_code": "059",
"latitude": 39.8219,
"longitude": -80.1781,
"primary_sic_code": "1220",
"primary_commodity": "Bituminous Coal",
"primary_canvass": "Coal(Bituminous)",
"secondary_commodity": "",
"num_employees": 350,
"days_per_week": 5,
"nearest_town": "Wind Ridge",
"district": "3",
"office_name": "Waynesburg District",
"portable_operation": "N",
"production_quarterly": "[{\"cal_yr\":2024,\"cal_qtr\":3,\"subunit\":\"UNDERGROUND\",\"avg_employees\":341,\"hours_worked\":148560,\"coal_production\":1247000},{\"cal_yr\":2024,\"cal_qtr\":2,...}]",
"operator_history": null,
"violations_count_12mo": null,
"violations_history": null,
"accidents_12mo": null,
"source_url": "https://arlweb.msha.gov/OpenGovernmentData/OGIMSHA.asp"
}
FieldTypeDescription
mine_idstringMSHA Mine ID (7-digit, primary key)
mine_namestringCurrent mine name from Legal ID Form
coal_metal_indstringMine class: C=Coal, M=Metal/NonMetal
mine_typestringMine type: Surface, Underground, Facility, or Other
mine_statusstringCurrent status: Active, Temporarily Idled, NonProducing, Abandoned, etc.
status_datestringDate mine entered current status (YYYY-MM-DD)
controller_idstringMSHA controller ID for the ultimate parent entity
controller_namestringName of the controller (ultimate parent of operator)
controller_start_datestringDate current controller took control (YYYY-MM-DD)
operator_idstringMSHA operator ID
operator_namestringCurrent operator name
statestring2-letter state abbreviation
countystringCounty name (FIPS county name)
fips_county_codestring3-digit FIPS county code
latitudenumberMine latitude (decimal degrees)
longitudenumberMine longitude (decimal degrees)
primary_sic_codestringPrimary SIC code
primary_commoditystringPrimary commodity description (SIC description)
primary_canvassstringPrimary industry group (e.g., Coal(Bituminous), M/NM (Stone), Metal)
secondary_commoditystringSecondary commodity description
num_employeesnumberNumber of workers at mine
days_per_weeknumberOperating days per week
nearest_townstringNearest town or city
districtstringMSHA district code
office_namestringMSHA office responsible for inspections
portable_operationstringY/N portable mine indicator
production_quarterlystringJSON array of quarterly production records — cal_yr, cal_qtr, subunit, avg_employees, hours_worked, coal_production (requires includeProductionHistory: true)
operator_historystringJSON array of controller/operator history records — controller_name, operator_name, operator_start_dt, operator_end_dt, controller_start_dt, controller_end_dt, mine_status (requires includeControllerHistory: true)
violations_count_12monumberViolations issued in the past 12 months (requires includeViolations: true)
violations_historystringJSON array of recent violations — violation_no, inspection_begin_dt, violation_issue_dt, cal_yr, violator_name, section_of_act, violation_type (requires includeViolations: true)
accidents_12monumberAccident/injury records in the past 12 months (requires includeAccidents: true)
source_urlstringURL of the source MSHA open data page

🔍 FAQ

How do I extract MSHA mine data?

MSHA Mine Data Retrieval Scraper pulls directly from the MSHA Open Government Data bulk CSV exports. Configure your filters in the input, run the actor, and download the dataset — no MSHA portal account or manual CSV downloads required.

What does MSHA Mine Data Retrieval Scraper cost to run?

The actor charges $0.10 per run plus $0.001 per record. Pulling 1,000 active coal mines with quarterly production history runs roughly $1.10 total. Enabling the controller history dataset (119 MB download) adds compute time but not additional per-record cost.

Can I filter by specific commodities like lithium or copper?

Yes. Set commodityFilter to any commodity keyword and the actor does a case-insensitive substring match against the MSHA SIC description. "copper" returns copper mines, "lithium" returns lithium mines, "stone" returns crushed stone operations. Leave it blank to get all commodities.

Does MSHA Mine Data Retrieval Scraper need proxies?

No. MSHA open data is publicly available without authentication or rate limits. The actor downloads ZIP files directly from the MSHA server — no proxy configuration needed.

How current is the mine data?

MSHA updates the open data CSVs regularly. The actor always pulls the latest published version at run time. The status_date field tells you when each mine's current status was last changed by MSHA.


Need More Features?

Need custom filters, additional MSHA datasets, or scheduled runs? File an issue or get in touch.

Why Use MSHA Mine Data Retrieval Scraper?

  • Covers the full registry — all 91,000+ mines across coal, metal, and nonmetal classes, with optional joins for production history, ownership chain, violations, and accidents in a single run
  • No proxies, no auth, no scraping fragility — pulls from official government bulk exports, so it doesn't break when MSHA updates their web UI
  • Clean structured output — flat JSON records with consistent field names, ready for a spreadsheet, database, or downstream pipeline without reformatting