NYS DOCCS Basic Snapshot avatar

NYS DOCCS Basic Snapshot

Under maintenance

Pricing

from $0.20 / 1,000 results

Go to Apify Store
NYS DOCCS Basic Snapshot

NYS DOCCS Basic Snapshot

Under maintenance

Scrapes the NYS DOCCS public incarcerated person lookup by last-name prefixes and outputs a resumable, structured basic snapshot with DIN, name, status, facility, age, race, DOB, source, page, and scrape provenance fields.

Pricing

from $0.20 / 1,000 results

Rating

0.0

(0)

Developer

Marcus Salinas

Marcus Salinas

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

19 days ago

Last modified

Share

NYS DOCCS Basic Snapshot collects a broad current snapshot from the public New York State Department of Corrections and Community Supervision incarcerated person lookup. It is the first step in the two-Actor NYS DOCCS workflow.

Use this Actor when you need the broad searchable corpus with DIN, name, status, facility, DOB, age, race, source page, and scrape provenance. If you also need sentence, parole, crime, county, admission, and other per-DIN custody details, run the companion enrichment Actor after this one succeeds: NYS DOCCS In-Custody Details.

Output

The default dataset is an array of person records. A full run typically returns about 75,000 to 85,000 records. Counts change as DOCCS updates the public lookup.

[
{
"din": "23R1580",
"name": "AALIL, MICHAEL",
"dateOfBirth": "08/09/1996",
"age": "29 years old",
"race": "OTHER",
"status": "RELEASED",
"facility": "QUEENSBORO",
"searchPrefix": "AA",
"pageNumber": 1,
"clickNextDinUsed": "",
"scrapedAt": "2026-04-29T00:11:13.287Z",
"sourceEndpoint": "https://nysdoccslookup.doccs.ny.gov/IncarceratedPerson/SearchByName"
}
]

Dataset Schema

{
"type": "array",
"items": {
"type": "object",
"properties": {
"din": { "type": "string" },
"nysid": { "type": "string" },
"name": { "type": "string" },
"dateOfBirth": { "type": "string" },
"age": { "type": "string" },
"race": { "type": "string" },
"releaseDate": { "type": "string" },
"status": { "type": "string" },
"facility": { "type": "string" },
"searchPrefix": { "type": "string" },
"pageNumber": { "type": "integer" },
"clickNextDinUsed": { "type": "string" },
"scrapedAt": { "type": "string", "format": "date-time" },
"sourceEndpoint": { "type": "string" }
}
}
}

Fields with no source value may be omitted or returned empty depending on export format. The sample above removes null fields so the shape is easier to read.

The Actor also writes an OUTPUT key-value store record with run status, logical run ID, runtime settings, counts, transport usage, and sample rows.

How It Works

This Actor automates the public NYS DOCCS lookup workflow and turns the results into a structured Apify dataset:

  1. Breaks the name-search space into many small prefix-based searches.
  2. Runs those searches with conservative pacing and retry handling.
  3. Collects each visible person record returned by the public lookup.
  4. Tracks progress so interrupted runs can resume instead of starting over.
  5. Validates and stages records while the run is active.
  6. On completion, publishes a clean default dataset for export and downstream use.

The internal staging data is not the product output. Use the completed run's default dataset as the final export and as the source dataset for the Details Actor.

Expected Runtime

A full Basic Snapshot run usually takes about 2 to 3 hours with the default production settings. Runtime can change based on DOCCS response time, retry volume, network conditions, and Apify platform conditions.

Using It With In-Custody Details

The Basic Snapshot can run by itself, but it is also the required upstream source for NYS DOCCS In-Custody Details.

Recommended workflow:

  1. Run NYS DOCCS Basic Snapshot.
  2. Wait for the run to finish successfully.
  3. Open the successful run's default dataset.
  4. Run NYS DOCCS In-Custody Details the same day and provide that dataset as sourceDatasetId.
  5. If using Apify Actor-to-Actor integration, configure Basic Snapshot to start the Details Actor on successful completion and pass:
    • sourceDatasetId = {{resource.defaultDatasetId}}
    • sourceRunId = {{resource.id}}

The same-day recommendation matters because the Details Actor enriches the in-custody rows found in this snapshot. Running Details against an older snapshot can produce stale custody detail data.

Input

  • prefixes: optional explicit last-name prefixes to scrape.
  • prefixDepth: when prefixes is empty, auto-generates A-Z for 1 or AA-ZZ for 2.
  • requestDelayMs: delay between requests per worker.
  • maxPagesPerPrefix: page cap per prefix; use 0 for no cap.
  • workerCount: number of parallel prefix workers.
  • proxyMode: none, datacenter, or residential.
  • proxyCountryCode: optional 2-letter country code for residential proxy only.
  • resumeMode: auto resumes unfinished logical runs; forceNew starts fresh.
  • logicalRunId: optional stable logical run identifier for manual recovery or testing.
  • retainedCompletedRuns: number of completed named final datasets to keep.
  • sampleRowLimit: number of sample rows copied into the final OUTPUT summary.

Reliability And Resume Behavior

  • The Actor checkpoints prefix and pagination progress in a named key-value store.
  • Failed or interrupted runs resume automatically by default.
  • Graceful aborts publish partial output from already staged rows.
  • Successful runs rebuild the clean final default dataset from staging.
  • The newest retained named final datasets are kept for recovery/history.

Notes

  • Proxy use depends on the selected proxy mode.
  • The public DOCCS lookup does not expose a reliable total-result count, so prefixDepth: 2 is the safer full-snapshot mode.