# OSHA Citation Intelligence Scraper (`belcaidsaad/osha-citation-scraper`) Actor

Pull OSHA-cited manufacturers & warehouses (DOL API v4), aggregated per company and scored into ready-to-reach demand signals.

- **URL**: https://apify.com/belcaidsaad/osha-citation-scraper.md
- **Developed by:** [Saad Belcaid](https://apify.com/belcaidsaad) (community)
- **Categories:** Lead generation, News, AI
- **Stats:** 3 total users, 3 monthly users, 61.1% runs succeeded, 1 bookmarks
- **User rating**: No ratings yet

## Pricing

Pay per usage

This Actor is paid per platform usage. The Actor is free to use, and you only pay for the Apify platform usage, which gets cheaper the higher subscription plan you have.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-usage

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

## OSHA Citation Finder

Find manufacturers and warehouses that just got cited by OSHA — with the serious, willful, and repeat violations spelled out, the open penalty, and a one-line "why call them" signal for each company. Ready to drop straight into outreach.

---

### What you get

One row per company, already cleaned up and sorted with the hottest first. Each row tells you:

- The company name, city, state, and address
- How many **serious / willful / repeat** citations they have
- The total penalty dollars still on the books
- When the latest citation was issued
- How many fix-it items are still open
- Roughly how many people work at the site, and a size band
- Whether they have more than one site
- A **signal line** — one plain sentence you can read out loud

---

### Get started (5 steps, ~3 minutes)

1. **Register** for a free key at **https://dataportal.dol.gov**. It takes about 2 minutes.
2. **Copy** the key they give you.
3. **Paste** the key into the **DOL API key** box in the input form.
4. **Press** Start (or Save & Run).
5. **Wait.** When it finishes, open the **Dataset** tab to see your companies.

That's it. The defaults already target manufacturers and warehouses in Ohio, Indiana, Michigan, Illinois, and Wisconsin. Change the boxes below if you want a different region or industry.

---

### The boxes you can change

- **States** — two-letter codes. Default: OH, IN, MI, IL, WI.
- **NAICS industry prefixes** — `31`, `32`, `33` are manufacturing; `493` is warehousing. Add or remove to widen or narrow.
- **Window: days back (start)** — how far back to look. Default 300.
- **Window: days back (floor / end)** — the fresh edge. Default 60. Citations take about 30 days to show up publicly, so very recent inspections have nothing to find yet. Leave this alone unless you know why you're changing it.
- **Violation types** — `S` serious, `W` willful, `R` repeat. Default: all three.
- **Dry run** — flip this on to do a quick test with built-in sample data. No key needed, no real calls made. Use it to see the output shape before a real run.

---

### Try it without a key first

Turn **Dry run** on and press Start. The actor runs the whole pipeline on a few built-in sample companies and shows you exactly what the output looks like. When you're happy, turn Dry run off, paste your key, and run for real.

---

### Weekly cadence

This is built to run **every Monday**.

1. Open the actor on Apify.
2. Click **Schedule** (or **Schedules**).
3. Set it to run weekly, Monday morning.
4. Done. Leave the inputs as they are.

The date window **slides itself** — every run looks at the same rolling stretch of the recent past, so each Monday you get the newly-posted citations without touching anything.

**A thin week is information, not failure.** Some weeks there just aren't many fresh citations in your region. That's a real fact about the market that week, not a broken run. The actor reports the honest count every time — read it as a gauge, not a scoreboard.

---

### A few honest notes

- **Your key is yours.** It is never stored in the code. You paste it; it stays with your run.
- **Citations lag ~30 days.** A company inspected last week won't appear yet. That's why the window stops 60 days back by default.
- **The count is whatever is real.** This tool never pads the list to hit a number. If your region is quiet, it says so.


### Want contacts, not just companies? (optional)

The list works as-is: company, address, citation details, severity score.

If you also want **who to call**, add two optional keys in the input form:

1. **Apollo key** → adds each company's website and main phone number. Get one at app.apollo.io (Settings → Integrations → API).
2. **AnyMailFinder key** → adds a verified decision-maker email (CEO, then Operations, then Finance). Get one at anymailfinder.com. You are only charged for emails that are actually found.

No keys = no problem. The citation list still comes out complete. Add keys later and re-run when you want the contacts.

# Actor input Schema

## `apiKey` (type: `string`):

Your free DOL Data Portal key. Register at https://dataportal.dol.gov (takes ~2 minutes) and paste the key here. The key is yours and is never stored in the code.
## `apolloKey` (type: `string`):

Your own Apollo.io API key. When provided, each company is resolved to its website domain and main phone number. Without it, results ship with the citation signal only. Get a key at app.apollo.io → Settings → Integrations → API.
## `anymailfinderKey` (type: `string`):

Your own AnyMailFinder key. When provided (together with the Apollo key), each resolved company gets a verified decision-maker email (CEO, then Operations, then Finance). AnyMailFinder only charges for emails it actually finds. Get a key at anymailfinder.com → API.
## `states` (type: `array`):

Two-letter state codes to pull. Default is the upper Midwest manufacturing belt.
## `naicsPrefixes` (type: `array`):

Industry code prefixes (matched as wildcards). 31/32/33 = manufacturing, 493 = warehousing & storage.
## `windowDaysBack` (type: `integer`):

How far back the date window starts. Inspections opened before this are ignored. Default 300.
## `windowDaysFloor` (type: `integer`):

The fresh edge of the window. Inspections newer than this are skipped because citations post ~30 days after issuance and very recent inspections show nothing public yet. Default 60.
## `violTypes` (type: `array`):

Which OSHA violation types to count. S = Serious, W = Willful, R = Repeat. Willful and repeat carry the most weight.
## `dryRun` (type: `boolean`):

Run the full pipeline against a few built-in fixture rows instead of the live DOL API. Use this to confirm the actor works without spending your API budget or needing a key.

## Actor input object example

```json
{
  "states": [
    "OH",
    "IN",
    "MI",
    "IL",
    "WI"
  ],
  "naicsPrefixes": [
    "31",
    "32",
    "33",
    "493"
  ],
  "windowDaysBack": 300,
  "windowDaysFloor": 60,
  "violTypes": [
    "S",
    "W",
    "R"
  ],
  "dryRun": false
}
````

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {};

// Run the Actor and wait for it to finish
const run = await client.actor("belcaidsaad/osha-citation-scraper").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = {}

# Run the Actor and wait for it to finish
run = client.actor("belcaidsaad/osha-citation-scraper").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{}' |
apify call belcaidsaad/osha-citation-scraper --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=belcaidsaad/osha-citation-scraper",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "OSHA Citation Intelligence Scraper",
        "description": "Pull OSHA-cited manufacturers & warehouses (DOL API v4), aggregated per company and scored into ready-to-reach demand signals.",
        "version": "1.0",
        "x-build-id": "iBo50haKHtpAwquW6"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/belcaidsaad~osha-citation-scraper/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-belcaidsaad-osha-citation-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/belcaidsaad~osha-citation-scraper/runs": {
            "post": {
                "operationId": "runs-sync-belcaidsaad-osha-citation-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/belcaidsaad~osha-citation-scraper/run-sync": {
            "post": {
                "operationId": "run-sync-belcaidsaad-osha-citation-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "required": [
                    "apiKey"
                ],
                "properties": {
                    "apiKey": {
                        "title": "DOL API key (X-API-KEY)",
                        "type": "string",
                        "description": "Your free DOL Data Portal key. Register at https://dataportal.dol.gov (takes ~2 minutes) and paste the key here. The key is yours and is never stored in the code."
                    },
                    "apolloKey": {
                        "title": "Apollo API key (optional — adds company domains & phones)",
                        "type": "string",
                        "description": "Your own Apollo.io API key. When provided, each company is resolved to its website domain and main phone number. Without it, results ship with the citation signal only. Get a key at app.apollo.io → Settings → Integrations → API."
                    },
                    "anymailfinderKey": {
                        "title": "AnyMailFinder API key (optional — adds decision-maker emails)",
                        "type": "string",
                        "description": "Your own AnyMailFinder key. When provided (together with the Apollo key), each resolved company gets a verified decision-maker email (CEO, then Operations, then Finance). AnyMailFinder only charges for emails it actually finds. Get a key at anymailfinder.com → API."
                    },
                    "states": {
                        "title": "States",
                        "type": "array",
                        "description": "Two-letter state codes to pull. Default is the upper Midwest manufacturing belt.",
                        "default": [
                            "OH",
                            "IN",
                            "MI",
                            "IL",
                            "WI"
                        ],
                        "items": {
                            "type": "string"
                        }
                    },
                    "naicsPrefixes": {
                        "title": "NAICS industry prefixes",
                        "type": "array",
                        "description": "Industry code prefixes (matched as wildcards). 31/32/33 = manufacturing, 493 = warehousing & storage.",
                        "default": [
                            "31",
                            "32",
                            "33",
                            "493"
                        ],
                        "items": {
                            "type": "string"
                        }
                    },
                    "windowDaysBack": {
                        "title": "Window: days back (start)",
                        "minimum": 30,
                        "type": "integer",
                        "description": "How far back the date window starts. Inspections opened before this are ignored. Default 300.",
                        "default": 300
                    },
                    "windowDaysFloor": {
                        "title": "Window: days back (floor / end)",
                        "minimum": 0,
                        "type": "integer",
                        "description": "The fresh edge of the window. Inspections newer than this are skipped because citations post ~30 days after issuance and very recent inspections show nothing public yet. Default 60.",
                        "default": 60
                    },
                    "violTypes": {
                        "title": "Violation types",
                        "type": "array",
                        "description": "Which OSHA violation types to count. S = Serious, W = Willful, R = Repeat. Willful and repeat carry the most weight.",
                        "default": [
                            "S",
                            "W",
                            "R"
                        ],
                        "items": {
                            "type": "string"
                        }
                    },
                    "dryRun": {
                        "title": "Dry run (test mode, no API calls)",
                        "type": "boolean",
                        "description": "Run the full pipeline against a few built-in fixture rows instead of the live DOL API. Use this to confirm the actor works without spending your API budget or needing a key.",
                        "default": false
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
