# NYS DOCCS Basic Snapshot (`jippylong12/nys-doccs-basic-snapshot`) Actor

Scrapes the NYS DOCCS public incarcerated person lookup by last-name prefixes and outputs a resumable, structured basic snapshot with DIN, name, status, facility, age, race, DOB, source, page, and scrape provenance fields.

- **URL**: https://apify.com/jippylong12/nys-doccs-basic-snapshot.md
- **Developed by:** [Marcus Salinas](https://apify.com/jippylong12) (community)
- **Categories:** Automation, Other, Developer tools
- **Stats:** 2 total users, 1 monthly users, 0.0% runs succeeded, NaN bookmarks
- **User rating**: No ratings yet

## Pricing

from $0.20 / 1,000 results

This Actor is paid per event. You are not charged for the Apify platform usage, but only a fixed price for specific events.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-event

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

## NYS DOCCS Basic Snapshot

NYS DOCCS Basic Snapshot collects a broad current snapshot from the public New York State Department of Corrections and Community Supervision incarcerated person lookup. It is the first step in the two-Actor NYS DOCCS workflow.

Use this Actor when you need the broad searchable corpus with DIN, name, status, facility, DOB, age, race, source page, and scrape provenance. If you also need sentence, parole, crime, county, admission, and other per-DIN custody details, run the companion enrichment Actor after this one succeeds: [NYS DOCCS In-Custody Details](https://apify.com/jippylong12/nys-doccs-in-custody-details).

### Output

The default dataset is an array of person records. A full run typically returns about `75,000` to `85,000` records. Counts change as DOCCS updates the public lookup.

```json
[
  {
    "din": "23R1580",
    "name": "AALIL, MICHAEL",
    "dateOfBirth": "08/09/1996",
    "age": "29 years old",
    "race": "OTHER",
    "status": "RELEASED",
    "facility": "QUEENSBORO",
    "searchPrefix": "AA",
    "pageNumber": 1,
    "clickNextDinUsed": "",
    "scrapedAt": "2026-04-29T00:11:13.287Z",
    "sourceEndpoint": "https://nysdoccslookup.doccs.ny.gov/IncarceratedPerson/SearchByName"
  }
]
````

#### Dataset Schema

```json
{
  "type": "array",
  "items": {
    "type": "object",
    "properties": {
      "din": { "type": "string" },
      "nysid": { "type": "string" },
      "name": { "type": "string" },
      "dateOfBirth": { "type": "string" },
      "age": { "type": "string" },
      "race": { "type": "string" },
      "releaseDate": { "type": "string" },
      "status": { "type": "string" },
      "facility": { "type": "string" },
      "searchPrefix": { "type": "string" },
      "pageNumber": { "type": "integer" },
      "clickNextDinUsed": { "type": "string" },
      "scrapedAt": { "type": "string", "format": "date-time" },
      "sourceEndpoint": { "type": "string" }
    }
  }
}
```

Fields with no source value may be omitted or returned empty depending on export format. The sample above removes null fields so the shape is easier to read.

The Actor also writes an `OUTPUT` key-value store record with run status, logical run ID, runtime settings, counts, transport usage, and sample rows.

### How It Works

This Actor automates the public NYS DOCCS lookup workflow and turns the results into a structured Apify dataset:

1. Breaks the name-search space into many small prefix-based searches.
2. Runs those searches with conservative pacing and retry handling.
3. Collects each visible person record returned by the public lookup.
4. Tracks progress so interrupted runs can resume instead of starting over.
5. Validates and stages records while the run is active.
6. On completion, publishes a clean default dataset for export and downstream use.

The internal staging data is not the product output. Use the completed run's default dataset as the final export and as the source dataset for the Details Actor.

### Expected Runtime

A full Basic Snapshot run usually takes about `2` to `3` hours with the default production settings. Runtime can change based on DOCCS response time, retry volume, network conditions, and Apify platform conditions.

### Using It With In-Custody Details

The Basic Snapshot can run by itself, but it is also the required upstream source for [NYS DOCCS In-Custody Details](https://apify.com/jippylong12/nys-doccs-in-custody-details).

Recommended workflow:

1. Run NYS DOCCS Basic Snapshot.
2. Wait for the run to finish successfully.
3. Open the successful run's default dataset.
4. Run NYS DOCCS In-Custody Details the same day and provide that dataset as `sourceDatasetId`.
5. If using Apify Actor-to-Actor integration, configure Basic Snapshot to start the Details Actor on successful completion and pass:
   - `sourceDatasetId = {{resource.defaultDatasetId}}`
   - `sourceRunId = {{resource.id}}`

The same-day recommendation matters because the Details Actor enriches the in-custody rows found in this snapshot. Running Details against an older snapshot can produce stale custody detail data.

### Input

- `prefixes`: optional explicit last-name prefixes to scrape.
- `prefixDepth`: when `prefixes` is empty, auto-generates `A-Z` for `1` or `AA-ZZ` for `2`.
- `requestDelayMs`: delay between requests per worker.
- `maxPagesPerPrefix`: page cap per prefix; use `0` for no cap.
- `workerCount`: number of parallel prefix workers.
- `proxyMode`: `none`, `datacenter`, or `residential`.
- `proxyCountryCode`: optional 2-letter country code for residential proxy only.
- `resumeMode`: `auto` resumes unfinished logical runs; `forceNew` starts fresh.
- `logicalRunId`: optional stable logical run identifier for manual recovery or testing.
- `retainedCompletedRuns`: number of completed named final datasets to keep.
- `sampleRowLimit`: number of sample rows copied into the final `OUTPUT` summary.

### Reliability And Resume Behavior

- The Actor checkpoints prefix and pagination progress in a named key-value store.
- Failed or interrupted runs resume automatically by default.
- Graceful aborts publish partial output from already staged rows.
- Successful runs rebuild the clean final default dataset from staging.
- The newest retained named final datasets are kept for recovery/history.

### Notes

- Proxy use depends on the selected proxy mode.
- The public DOCCS lookup does not expose a reliable total-result count, so `prefixDepth: 2` is the safer full-snapshot mode.

# Actor input Schema

## `prefixes` (type: `array`):

Optional explicit prefixes to send as lastName in the SearchByName JSON request. Leave empty to auto-generate AA-ZZ from prefixDepth by default.

## `prefixDepth` (type: `integer`):

Used only when prefixes is empty. Set 1 for A-Z. Set 2 for AA-ZZ. Because the API does not expose a total-result count, 2 is the safer default for broader recurring scrapes.

## `requestDelayMs` (type: `integer`):

Conservative pacing between the homepage session GET and the JSON POST requests. This delay is applied independently inside each worker session.

## `maxPagesPerPrefix` (type: `integer`):

Maximum pages to fetch for each prefix. Use 0 to paginate until the endpoint returns no persons or repeats the last DIN cursor.

## `workerCount` (type: `integer`):

Number of parallel prefix workers to run inside one Actor. Each worker uses its own cookie-backed session and pacing window.

## `proxyMode` (type: `string`):

Use Apify datacenter proxy first for the production basic scrape. Switch to residential only if the target starts throttling or blocking too aggressively.

## `proxyCountryCode` (type: `string`):

Optional 2-letter country code used only when proxyMode is residential.

## `resumeMode` (type: `string`):

Auto resumes the current unfinished logical scrape run from named storage. Force new abandons the previous unfinished logical run and starts a fresh one.

## `logicalRunId` (type: `string`):

Optional stable logical run identifier for manual recovery or testing. Leave empty for an automatic timestamp-based run ID.

## `retainedCompletedRuns` (type: `integer`):

How many completed named final datasets to keep after cleanup. Older retained copies are deleted automatically.

## `sampleRowLimit` (type: `integer`):

How many sample dataset rows to include in the final OUTPUT summary record.

## Actor input object example

```json
{
  "prefixDepth": 2,
  "requestDelayMs": 3000,
  "maxPagesPerPrefix": 0,
  "workerCount": 4,
  "proxyMode": "datacenter",
  "resumeMode": "auto",
  "retainedCompletedRuns": 2,
  "sampleRowLimit": 10
}
```

# Actor output Schema

## `results` (type: `string`):

Clean final default dataset of unique NYS DOCCS lookup records with DIN, name, status, facility, date of birth, age, race, search prefix, page, scrape timestamp, and source endpoint.

## `summary` (type: `string`):

JSON run summary stored as OUTPUT in the default key-value store, including logical run ID, completion status, prefix settings, counts, retained dataset name, transport usage, and sample rows.

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {};

// Run the Actor and wait for it to finish
const run = await client.actor("jippylong12/nys-doccs-basic-snapshot").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = {}

# Run the Actor and wait for it to finish
run = client.actor("jippylong12/nys-doccs-basic-snapshot").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{}' |
apify call jippylong12/nys-doccs-basic-snapshot --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=jippylong12/nys-doccs-basic-snapshot",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "NYS DOCCS Basic Snapshot",
        "description": "Scrapes the NYS DOCCS public incarcerated person lookup by last-name prefixes and outputs a resumable, structured basic snapshot with DIN, name, status, facility, age, race, DOB, source, page, and scrape provenance fields.",
        "version": "0.3",
        "x-build-id": "C3NfCWejratLvLOc1"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/jippylong12~nys-doccs-basic-snapshot/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-jippylong12-nys-doccs-basic-snapshot",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/jippylong12~nys-doccs-basic-snapshot/runs": {
            "post": {
                "operationId": "runs-sync-jippylong12-nys-doccs-basic-snapshot",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/jippylong12~nys-doccs-basic-snapshot/run-sync": {
            "post": {
                "operationId": "run-sync-jippylong12-nys-doccs-basic-snapshot",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "properties": {
                    "prefixes": {
                        "title": "Last-name prefixes",
                        "uniqueItems": true,
                        "type": "array",
                        "description": "Optional explicit prefixes to send as lastName in the SearchByName JSON request. Leave empty to auto-generate AA-ZZ from prefixDepth by default.",
                        "items": {
                            "type": "string",
                            "minLength": 1
                        }
                    },
                    "prefixDepth": {
                        "title": "Auto-generated prefix depth",
                        "minimum": 1,
                        "maximum": 3,
                        "type": "integer",
                        "description": "Used only when prefixes is empty. Set 1 for A-Z. Set 2 for AA-ZZ. Because the API does not expose a total-result count, 2 is the safer default for broader recurring scrapes.",
                        "default": 2
                    },
                    "requestDelayMs": {
                        "title": "Delay between requests per worker (ms)",
                        "minimum": 0,
                        "type": "integer",
                        "description": "Conservative pacing between the homepage session GET and the JSON POST requests. This delay is applied independently inside each worker session.",
                        "default": 3000
                    },
                    "maxPagesPerPrefix": {
                        "title": "Max pages per prefix",
                        "minimum": 0,
                        "type": "integer",
                        "description": "Maximum pages to fetch for each prefix. Use 0 to paginate until the endpoint returns no persons or repeats the last DIN cursor.",
                        "default": 0
                    },
                    "workerCount": {
                        "title": "Worker count",
                        "minimum": 1,
                        "maximum": 12,
                        "type": "integer",
                        "description": "Number of parallel prefix workers to run inside one Actor. Each worker uses its own cookie-backed session and pacing window.",
                        "default": 4
                    },
                    "proxyMode": {
                        "title": "Proxy mode",
                        "enum": [
                            "none",
                            "datacenter",
                            "residential"
                        ],
                        "type": "string",
                        "description": "Use Apify datacenter proxy first for the production basic scrape. Switch to residential only if the target starts throttling or blocking too aggressively.",
                        "default": "datacenter"
                    },
                    "proxyCountryCode": {
                        "title": "Residential proxy country code",
                        "pattern": "^[A-Za-z]{2}$",
                        "minLength": 2,
                        "maxLength": 2,
                        "type": "string",
                        "description": "Optional 2-letter country code used only when proxyMode is residential."
                    },
                    "resumeMode": {
                        "title": "Resume mode",
                        "enum": [
                            "auto",
                            "forceNew"
                        ],
                        "type": "string",
                        "description": "Auto resumes the current unfinished logical scrape run from named storage. Force new abandons the previous unfinished logical run and starts a fresh one.",
                        "default": "auto"
                    },
                    "logicalRunId": {
                        "title": "Logical run ID",
                        "minLength": 1,
                        "maxLength": 100,
                        "type": "string",
                        "description": "Optional stable logical run identifier for manual recovery or testing. Leave empty for an automatic timestamp-based run ID."
                    },
                    "retainedCompletedRuns": {
                        "title": "Retained completed runs",
                        "minimum": 1,
                        "maximum": 20,
                        "type": "integer",
                        "description": "How many completed named final datasets to keep after cleanup. Older retained copies are deleted automatically.",
                        "default": 2
                    },
                    "sampleRowLimit": {
                        "title": "Sample rows in OUTPUT",
                        "minimum": 1,
                        "maximum": 50,
                        "type": "integer",
                        "description": "How many sample dataset rows to include in the final OUTPUT summary record.",
                        "default": 10
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
