MSHA Mine Data Retrieval Scraper - US Mines Production + Safety
Pricing
Pay per event
MSHA Mine Data Retrieval Scraper - US Mines Production + Safety
Extract US mine records from MSHA Open Government Data CSVs. Covers coal, metal, and nonmetal mines: mine ID, operator, controller, state/county, geocoordinates, mine type, commodity, status, employees, quarterly production tons, violations, and accidents.
Pricing
Pay per event
Rating
0.0
(0)
Developer
BowTiedRaccoon
Actor stats
0
Bookmarked
2
Total users
1
Monthly active users
a day ago
Last modified
Categories
Share
MSHA Mine Data Retrieval Scraper
Extracts US mine records from the MSHA Open Government Data portal. Returns mine identity, operator and controller info, geocoordinates, commodity classification, status, employee counts, and — optionally — quarterly production tons, ownership history, violations, and accident counts for all 91,000+ mines in the MSHA registry.
MSHA Mine Scraper Features
- Extracts 26+ fields per mine record — mine ID, name, operator, controller (ultimate parent), state, county, FIPS code, lat/lon, mine type, mine class, commodity, SIC code, status, employees, operating days, nearest town, MSHA district and office
- Filters by mine class — coal, metal/nonmetal, or all
- Filters by commodity — substring match against SIC description (lithium, copper, iron, bituminous, anthracite, stone, sand, gravel, and any other MSHA commodity label)
- Filters by status — Active, Temporarily Idled, NonProducing, Abandoned, New Mine, Intermittent, or all
- Joins quarterly production history — all historical production tons, hours worked, and average employee counts by quarter and subunit, back to the first recorded year
- Joins operator/controller history — the full M&A ownership chain with start/end dates for each controller and operator
- Joins violations — citation count for the past 12 months plus a structured history of recent violations with section-of-act and violation type
- Joins accident records — injury and accident count for the past 12 months
- No proxy required — MSHA open data is public and does not block bulk downloads
What Can You Do With MSHA Mine Data?
- Utility and coal analysts — pull quarterly production tons by mine, subunit, and commodity to track output trends without manual MSHA portal navigation
- Critical-minerals researchers — filter active metal/nonmetal mines by commodity (lithium, copper, cobalt, rare earths) and get operator, location, and production context in one run
- Mining M&A advisory — join the controller history dataset to reconstruct the full ownership chain for a target mine or portfolio
- Environmental NGOs — identify active surface mines by state and county, then enrich with violation counts to prioritize investigation targets
- Compliance teams — pull violation and accident histories to benchmark a mine's safety record against peers in the same district
- Data journalists — map every active mine in a given state with geocoordinates and production figures, without downloading and parsing multiple MSHA ZIP files by hand
How It Works
- Downloads the MSHA Mines registry — the master registry ZIP (7.3 MB compressed, ~91,000 rows) is pulled from the MSHA open data endpoint and parsed in memory
- Applies your filters — coal/metal class, status, and commodity filters run against the registry before any joins, so enrichment datasets only load for mines that match
- Joins optional datasets in parallel — if you request quarterly production, controller history, violations, or accidents, those ZIPs are downloaded concurrently and indexed by MINE_ID
- Returns structured records — each mine record is written to the dataset with flat fields for registry data and JSON strings for the array joins (production, history, violations)
The production quarterly dataset is 56 MB compressed and covers 35,000 unique mines. Controller history runs ~119 MB. Factor that into your run time when enabling those options.
MSHA Mine Scraper Input
{"coalOrMetal": "M","commodityFilter": "copper","mineStatus": "Active","includeProductionHistory": true,"includeControllerHistory": false,"includeViolations": false,"includeAccidents": false,"maxItems": 100,"sp_intended_usage": "critical minerals research","sp_improvement_suggestions": "none"}
| Field | Type | Default | Description |
|---|---|---|---|
coalOrMetal | string | "all" | Mine class filter: "all", "C" (coal only), or "M" (metal/nonmetal only) |
commodityFilter | string | "" | Substring match against SIC description (case-insensitive). Leave blank for all commodities. |
mineStatus | string | "Active" | Status filter: Active, Temporarily Idled, NonProducing, Abandoned, Abandoned and Sealed, New Mine, Intermittent, or all |
includeProductionHistory | boolean | true | Join quarterly production CSV to add production_quarterly |
includeControllerHistory | boolean | false | Join controller/operator history CSV to add operator_history |
includeViolations | boolean | false | Join violations CSV to add violations_count_12mo and violations_history |
includeAccidents | boolean | false | Join accidents CSV to add accidents_12mo |
maxItems | integer | 10 | Maximum records to return. Set to 0 for unlimited. |
sp_intended_usage | string | — | Required. Describe your intended use of this data. |
sp_improvement_suggestions | string | — | Required. Share any suggestions for improving the actor. |
MSHA Mine Scraper Output Fields
{"mine_id": "4200017","mine_name": "EMERALD MINE NO 1","coal_metal_ind": "C","mine_type": "Underground","mine_status": "Active","status_date": "1978-12-15","controller_id": "0000055","controller_name": "CONSOL ENERGY INC","controller_start_date": "2020-01-01","operator_id": "0218869","operator_name": "CONSOL PENNSYLVANIA COAL COMPANY LLC","state": "PA","county": "GREENE","fips_county_code": "059","latitude": 39.8219,"longitude": -80.1781,"primary_sic_code": "1220","primary_commodity": "Bituminous Coal","primary_canvass": "Coal(Bituminous)","secondary_commodity": "","num_employees": 350,"days_per_week": 5,"nearest_town": "Wind Ridge","district": "3","office_name": "Waynesburg District","portable_operation": "N","production_quarterly": "[{\"cal_yr\":2024,\"cal_qtr\":3,\"subunit\":\"UNDERGROUND\",\"avg_employees\":341,\"hours_worked\":148560,\"coal_production\":1247000},{\"cal_yr\":2024,\"cal_qtr\":2,...}]","operator_history": null,"violations_count_12mo": null,"violations_history": null,"accidents_12mo": null,"source_url": "https://arlweb.msha.gov/OpenGovernmentData/OGIMSHA.asp"}
| Field | Type | Description |
|---|---|---|
mine_id | string | MSHA Mine ID (7-digit, primary key) |
mine_name | string | Current mine name from Legal ID Form |
coal_metal_ind | string | Mine class: C=Coal, M=Metal/NonMetal |
mine_type | string | Mine type: Surface, Underground, Facility, or Other |
mine_status | string | Current status: Active, Temporarily Idled, NonProducing, Abandoned, etc. |
status_date | string | Date mine entered current status (YYYY-MM-DD) |
controller_id | string | MSHA controller ID for the ultimate parent entity |
controller_name | string | Name of the controller (ultimate parent of operator) |
controller_start_date | string | Date current controller took control (YYYY-MM-DD) |
operator_id | string | MSHA operator ID |
operator_name | string | Current operator name |
state | string | 2-letter state abbreviation |
county | string | County name (FIPS county name) |
fips_county_code | string | 3-digit FIPS county code |
latitude | number | Mine latitude (decimal degrees) |
longitude | number | Mine longitude (decimal degrees) |
primary_sic_code | string | Primary SIC code |
primary_commodity | string | Primary commodity description (SIC description) |
primary_canvass | string | Primary industry group (e.g., Coal(Bituminous), M/NM (Stone), Metal) |
secondary_commodity | string | Secondary commodity description |
num_employees | number | Number of workers at mine |
days_per_week | number | Operating days per week |
nearest_town | string | Nearest town or city |
district | string | MSHA district code |
office_name | string | MSHA office responsible for inspections |
portable_operation | string | Y/N portable mine indicator |
production_quarterly | string | JSON array of quarterly production records — cal_yr, cal_qtr, subunit, avg_employees, hours_worked, coal_production (requires includeProductionHistory: true) |
operator_history | string | JSON array of controller/operator history records — controller_name, operator_name, operator_start_dt, operator_end_dt, controller_start_dt, controller_end_dt, mine_status (requires includeControllerHistory: true) |
violations_count_12mo | number | Violations issued in the past 12 months (requires includeViolations: true) |
violations_history | string | JSON array of recent violations — violation_no, inspection_begin_dt, violation_issue_dt, cal_yr, violator_name, section_of_act, violation_type (requires includeViolations: true) |
accidents_12mo | number | Accident/injury records in the past 12 months (requires includeAccidents: true) |
source_url | string | URL of the source MSHA open data page |
🔍 FAQ
How do I extract MSHA mine data?
MSHA Mine Data Retrieval Scraper pulls directly from the MSHA Open Government Data bulk CSV exports. Configure your filters in the input, run the actor, and download the dataset — no MSHA portal account or manual CSV downloads required.
What does MSHA Mine Data Retrieval Scraper cost to run?
The actor charges $0.10 per run plus $0.001 per record. Pulling 1,000 active coal mines with quarterly production history runs roughly $1.10 total. Enabling the controller history dataset (119 MB download) adds compute time but not additional per-record cost.
Can I filter by specific commodities like lithium or copper?
Yes. Set commodityFilter to any commodity keyword and the actor does a case-insensitive substring match against the MSHA SIC description. "copper" returns copper mines, "lithium" returns lithium mines, "stone" returns crushed stone operations. Leave it blank to get all commodities.
Does MSHA Mine Data Retrieval Scraper need proxies?
No. MSHA open data is publicly available without authentication or rate limits. The actor downloads ZIP files directly from the MSHA server — no proxy configuration needed.
How current is the mine data?
MSHA updates the open data CSVs regularly. The actor always pulls the latest published version at run time. The status_date field tells you when each mine's current status was last changed by MSHA.
Need More Features?
Need custom filters, additional MSHA datasets, or scheduled runs? File an issue or get in touch.
Why Use MSHA Mine Data Retrieval Scraper?
- Covers the full registry — all 91,000+ mines across coal, metal, and nonmetal classes, with optional joins for production history, ownership chain, violations, and accidents in a single run
- No proxies, no auth, no scraping fragility — pulls from official government bulk exports, so it doesn't break when MSHA updates their web UI
- Clean structured output — flat JSON records with consistent field names, ready for a spreadsheet, database, or downstream pipeline without reformatting