PII Masking
Pricing
Pay per event
Go to Apify Store
Sentence records (
Entity records (
Pricing
Pay per event
Rating
0.0
(0)
Developer

Canadesk Support
Maintained by Community
Actor stats
0
Bookmarked
2
Total users
1
Monthly active users
14 days ago
Last modified
Categories
Share
PII Redaction (PPE)
This Apify Actor redacts PII/PHI/PCI from text using a remote redaction API.
What it does
- Takes either:
INPUT.text(a string or array of strings), orINPUT.datasetId(reads items from an existing dataset and redacts selected fields)
- Sends text to a redaction API for entity detection + redaction (marker or mask)
- Makes one API call per input text
- Writes multiple output records:
- one record per sentence (
type: "sentence") - one record per entity (
type: "entity")
- one record per sentence (
Pricing (PPE)
The Actor uses Pay-Per-Event.
- Event:
REDACTION_REQUEST(configurable viachargeEventName) - Charged per output entry: One PPE event per dataset record pushed.
- each
type: "sentence"record charges once - each
type: "entity"record charges once
- each
This matches Apify's PPC/PPE design where events correspond to output units.
Input
| Parameter | Type | Default | Description |
|---|---|---|---|
apiKey | String | (optional) | API key (sent as x-api-key). Use an Apify Secret in production. |
text | String | String[] | Text(s) to redact. | |
datasetId | String | Dataset to read items from. | |
fieldPaths | String[] | ["text"] | Field paths (dot/bracket notation) to redact inside each dataset item, e.g. "text", "customer.message", "items[0].note". |
split.enabled | Boolean | true | Whether to split the redacted output into sentences for dataset records. |
hideit.endpoint | String | (default set in code) | Override the redaction endpoint URL. |
hideit.entityTypes | String[] | (sane default list) | Entity types to enable. |
hideit.outputMode | String | MARKER | Redaction output mode: MARKER (replace with typed markers like [NAME_1]) or MASK (replace with a fixed token). |
hideit.maskToken | String | [REDACTED] | Only used when hideit.outputMode="MASK". |
hideit.markerPattern | String | [UNIQUE_NUMBERED_ENTITY_TYPE] | Marker pattern for redaction output. |
hideit.markerLanguage | String | en | Marker language. |
hideit.coreferenceResolution | String | heuristics | Coreference setting supported by the endpoint. |
proxy | Object | Optional Apify proxy configuration. | |
chargeEventName | String | REDACTION_REQUEST | PPE event name to charge. |
Output modes
- MARKER: returns structured placeholders like
[NAME_1],[PHONE_NUMBER_1]. - MASK: returns a uniform token like
[REDACTED].
Output dataset schema
Two kinds of output records are written:
Sentence records (type: "sentence")
type:"sentence"source:"input.text"or"dataset"sourceMeta: metadata likeindex, or{ datasetId, itemIndex, fieldPath }sentenceIndex,sentencesTotaltext(one redacted sentence)hideitrequest metadata (endpoint + mode)createdAt
Entity records (type: "entity")
type:"entity"source,sourceMetaentityIndex,entitiesTotalentity(as returned by the endpoint)createdAt
Notes
- Keep your API keys out of git. Prefer Apify Secrets.
- For very large datasets, you may want to implement pagination + concurrency. Current implementation reads the dataset first page only.
Length limits
limits.maxInputChars(default:10000): if an input text exceeds this length, the actor logsmax reached, truncates the text, and continues processing.