PII Masking avatar
PII Masking

Pricing

Pay per event

Go to Apify Store
PII Masking

PII Masking

Identify, mark and replace PII information.

Pricing

Pay per event

Rating

0.0

(0)

Developer

Canadesk Support

Canadesk Support

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

14 days ago

Last modified

Share

PII Redaction (PPE)

This Apify Actor redacts PII/PHI/PCI from text using a remote redaction API.

What it does

  • Takes either:
    • INPUT.text (a string or array of strings), or
    • INPUT.datasetId (reads items from an existing dataset and redacts selected fields)
  • Sends text to a redaction API for entity detection + redaction (marker or mask)
  • Makes one API call per input text
  • Writes multiple output records:
    • one record per sentence (type: "sentence")
    • one record per entity (type: "entity")

Pricing (PPE)

The Actor uses Pay-Per-Event.

  • Event: REDACTION_REQUEST (configurable via chargeEventName)
  • Charged per output entry: One PPE event per dataset record pushed.
    • each type: "sentence" record charges once
    • each type: "entity" record charges once

This matches Apify's PPC/PPE design where events correspond to output units.

Input

ParameterTypeDefaultDescription
apiKeyString(optional)API key (sent as x-api-key). Use an Apify Secret in production.
textString | String[]Text(s) to redact.
datasetIdStringDataset to read items from.
fieldPathsString[]["text"]Field paths (dot/bracket notation) to redact inside each dataset item, e.g. "text", "customer.message", "items[0].note".
split.enabledBooleantrueWhether to split the redacted output into sentences for dataset records.
hideit.endpointString(default set in code)Override the redaction endpoint URL.
hideit.entityTypesString[](sane default list)Entity types to enable.
hideit.outputModeStringMARKERRedaction output mode: MARKER (replace with typed markers like [NAME_1]) or MASK (replace with a fixed token).
hideit.maskTokenString[REDACTED]Only used when hideit.outputMode="MASK".
hideit.markerPatternString[UNIQUE_NUMBERED_ENTITY_TYPE]Marker pattern for redaction output.
hideit.markerLanguageStringenMarker language.
hideit.coreferenceResolutionStringheuristicsCoreference setting supported by the endpoint.
proxyObjectOptional Apify proxy configuration.
chargeEventNameStringREDACTION_REQUESTPPE event name to charge.

Output modes

  • MARKER: returns structured placeholders like [NAME_1], [PHONE_NUMBER_1].
  • MASK: returns a uniform token like [REDACTED].

Output dataset schema

Two kinds of output records are written:

Sentence records (type: "sentence")

  • type: "sentence"
  • source: "input.text" or "dataset"
  • sourceMeta: metadata like index, or { datasetId, itemIndex, fieldPath }
  • sentenceIndex, sentencesTotal
  • text (one redacted sentence)
  • hideit request metadata (endpoint + mode)
  • createdAt

Entity records (type: "entity")

  • type: "entity"
  • source, sourceMeta
  • entityIndex, entitiesTotal
  • entity (as returned by the endpoint)
  • createdAt

Notes

  • Keep your API keys out of git. Prefer Apify Secrets.
  • For very large datasets, you may want to implement pagination + concurrency. Current implementation reads the dataset first page only.

Length limits

  • limits.maxInputChars (default: 10000): if an input text exceeds this length, the actor logs max reached, truncates the text, and continues processing.