Encrypted Data Integration avatar

Encrypted Data Integration

Pricing

from $4.00 / 1,000 encrypted records

Go to Apify Store
Encrypted Data Integration

Encrypted Data Integration

Encrypted Data Integration encrypts sensitive Apify data before export or automation. It supports selected fields, full records, or full payload encryption with AES-GCM, manifests, fingerprints, and dataset or key-value store output.

Pricing

from $4.00 / 1,000 encrypted records

Rating

0.0

(0)

Developer

Sovanza

Sovanza

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

a day ago

Last modified

Share

🔐 Encrypted Data Integration Tool – Secure API & Data Sync Automation

Securely integrate and transfer encrypted data across your automation stack. Built for developers, SaaS teams, and data engineers who need safe handling of sensitive fields inside Apify runs—without pasting plaintext into exports, logs, or downstream webhooks.

What it actually does: this actor reads structured records from an Apify dataset, key-value store item, pasted JSON, or inline JSON records, applies authenticated encryption, and writes encrypted output back to a dataset and/or KV store. Use it as the cryptographic step in larger sync workflows (schedules + API exports + your own integrations).


Optimized positioning

Securely protect data before it leaves your Apify storage boundary. Choose field-level encryption (most common), encrypt an entire record, or encrypt a full batch payload for handoff. Pair with Apify schedules and your own HTTP/database connectors for end-to-end pipelines.


🚀 Start secure data integration

Automate pipelines with encryption and operational control:

👉 Protect sensitive columns before export or handoff
👉 Deterministic fingerprints for matching workflows (hashes—not encryption by themselves)
👉 Structured manifests so decryptors know algorithm, KDF, and scope
👉 Runs on demand or on schedule via Apify


🧠 What this tool actually does

This actor is a secure transformation engine inside Apify—not a turnkey “reverse ETL” to every SaaS vendor out of the box. It connects to your existing Apify data and produces encrypted derivatives:

CapabilityDetail
Read fromdataset, kv_store, json_records, json_text
Encryptselected_fields, full_record, or full_payload
AlgorithmsAES-256-GCM (recommended) or Fernet compatibility mode
Key materialPassphrase (PBKDF2 or scrypt) or raw symmetric key (base64)
Write todefault dataset, KV item, or both

Use it upstream of integrations: encrypt first, then move ciphertext through your downstream systems—never commit passphrases to git, and rotate secrets responsibly.

Unlike basic “encrypt this string” snippets, outputs include manifests, ciphertext metadata, fingerprints (optional), and summary/error rows suited for audited workflows.


🔄 Supported data sources & outputs (within Apify)

Source (sourceMode)When to use
datasetEncrypt items from another actor’s dataset (sourceDatasetId or name)
kv_storeEncrypt a blob already stored under a KV key
json_records / json_textQuick tests or pasted uploads

Outputs land in outputMode = dataset, kv_store, or both. External CRMs/analytics/database targets are reached through your HTTP workers, exporters, warehouses, or other actors—not inside this codebase.


🔐 Security & encryption

Security is foundational:

  • Uses authenticated encryption (AES-GCM default) via the cryptography library
  • Derives symmetric keys via PBKDF2-HMAC-SHA256 or scrypt when using passphrases
  • Never logs plaintext secrets (passphrase, raw keys); enable redactLogs by default posture
  • Can strip plaintext sensitive fields after writing *_encrypted counterparts

Operational reality: ciphertext is safer than plaintext—but key management remains your responsibility (vaults, KMS, rotating passphrases, least-privilege access on Apify runs).

HTTPS applies to browser/API traffic toward Apify; encryption-at-rest semantics depend on Apify Storage configuration and enterprise controls.


⚡ Key features

  • Multiple input sources (dataset/KV/direct JSON text)
  • Three encryption scopes: selected fields / full record / full payload batch
  • preserveFields to keep indexing columns readable
  • removePlaintextAfterEncryption to minimize accidental leakage
  • Manifests: algorithm, cryptoVersion, KDF iterations, scopes
  • Optional SHA-256 fingerprints for deterministic matching workflows
  • Batched concurrency with chunkSize + maxConcurrency
  • Structured __summary__ and __error__ dataset rows

🎯 Real-world use cases

ScenarioWhy this fits
PII masking before BI exportEncrypt email, phone, nested paths before spreadsheets get shared
CRM / lead payloadsField-level ciphertext + fingerprints for deterministic joins
Data pipeline QAUse failOnMissingFields / invalid record toggles to harden ingestion
Secure archival bundlesfull_payload mode for handing off encrypted batch blobs
Automation hygieneRun on schedule whenever upstream scrapers finish

🛠️ How to use (Apify Console)

  1. Choose sourceMode (dataset, kv_store, json_records, or json_text).
  2. Pick encryptionScope and declare fieldsToEncrypt for selected_fields mode.
  3. Provide passphrase or rawKeyBase64 (keyMode).
  4. Decide removePlaintextAfterEncryption + optional preserveFields.
  5. Set outputMode (dataset, kv_store, or both) and KV key names if applicable.
  6. Click Run → inspect dataset rows (__summary__).
  7. Optional: Schedule runs for repeatable encryption jobs after upstream actors finish.

Quick input snippet

{
"sourceMode": "json_records",
"sourceJsonRecords": [
{ "id": "1", "email": "user@company.com", "notes": "Confidential memo" }
],
"encryptionScope": "selected_fields",
"fieldsToEncrypt": ["email", "notes"],
"preserveFields": ["id"],
"removePlaintextAfterEncryption": true,
"algorithm": "aes_gcm",
"keyMode": "passphrase",
"passphrase": "USE_A_ROTATED_SECRET_FROM_A_VAULT",
"keyDerivation": "pbkdf2_sha256",
"iterations": 200000,
"outputMode": "dataset",
"includeManifest": true,
"includeHashFingerprint": true
}

Full schema lives in INPUT_SCHEMA.json (shown in Console).


📦 Output & results

Depending on configuration you receive encrypted dataset rows, optional KV payload, fingerprints, manifests, __summary__ counters, and structured __error__ diagnostics. Exported via Apify as JSON / CSV / Excel—plus KV exports when you materialize ciphertext externally.

Example selected_fields output shape:

{
"recordId": "lead-1001",
"contact": {
"email_encrypted": {
"algorithm": "aes_gcm",
"version": "1",
"nonce": "BASE64_NONCE",
"salt": "BASE64_SALT",
"kdf": "pbkdf2_sha256",
"iterations": 200000,
"ciphertext": "BASE64_CIPHERTEXT"
}
},
"manifest": {
"cryptoVersion": "1",
"encryptionScope": "selected_fields",
"encryptedFields": ["contact.email"]
}
}

Final summary rows look like:

{
"type": "__summary__",
"inputRecords": 100,
"processedRecords": 100,
"failedRecords": 0,
"encryptedFieldsCount": 230,
"encryptionScope": "selected_fields",
"algorithm": "aes_gcm"
}

📊 Performance & scalability

Throughput scales with chunkSize, maxItems, concurrency, ciphertext size, and Apify Storage API limits—tune thoughtfully for heavy datasets.


❓ Frequently asked questions

QuestionAnswer
Different from Zapier/low-code tools?This actor focuses on cryptographic correctness plus Apify-native IO—not generic SaaS adapters. Plug it into outbound automation with your own exporters.
Multiple integrations in one run?This actor handles encryption. Chain other actors/workflows for multi-hop sync across vendors.
Sensitive workloads?Yes—provided you manage secrets, storage access, rotation, and auditing. Use enterprise policies where required.
Coding skills?Console-friendly; YAML/JSON input only. Understand what fields contain secrets.
Recurring runs?Yes—Apify scheduler triggers after upstream jobs.
Failure handlingControlled via failOnMissingFields, failOnInvalidRecords; errors surface as __error__ rows plus logs.
Stores data permanently?ciphertext persists in whichever output storage you configured (datasets/KV) until you purge it. Plaintext stripping helps reduce exposure.

Security disclaimer

This actor helps protect confidentiality of configured fields—but fingerprints ≠ encryption. Never treat hashes as secrecy. Maintain strong passphrases, avoid committing secrets to git, rotate keys, restrict dataset access.


Input configuration (reference)

Full schema: INPUT_SCHEMA.json. Main groups:

  • Sources: sourceMode, sourceDatasetId / sourceDatasetName, sourceKvStoreKey, sourceJsonRecords, sourceJsonText, maxItems
  • Encryption: encryptionScope, fieldsToEncrypt, preserveFields, removePlaintextAfterEncryption, outputEncryptedFieldSuffix
  • Crypto: algorithm, keyMode, passphrase, rawKeyBase64, keyDerivation, iterations
  • Manifests / fingerprints: includeManifest, includeHashFingerprint, fingerprintFields, deterministicFingerprintSalt (secret)
  • Output: outputMode, outputKvStoreKey, includeRecordId, recordIdField
  • Execution: chunkSize, maxConcurrency, failOnMissingFields, failOnInvalidRecords, includeDebugFields, redactLogs

Encryption modes

selected_fields

Encrypt only listed fields (supports dotted paths like contact.email). With removePlaintextAfterEncryption: true, originals are dropped after sibling *_encrypted fields exist.

full_record

Encrypt the entire JSON object as one ciphertext payload; typically keeps identifiers + fingerprints + manifests.

full_payload

Encrypt the whole batch as one blob—useful for secure handoffs or archiving.


Algorithms

AES-256-GCM (aes_gcm) — default

Per-value random nonce, authenticated AEAD encryption, manifests store only safe metadata (nonce, salt, KDF iterations, algorithm version)—never plaintext.

Fernet (fernet)

Compatibility option when downstream tooling expects Fernet tokens; AES-GCM remains the primary recommendation.


Example: encrypt from another dataset

{
"sourceMode": "dataset",
"sourceDatasetId": "YOUR_SOURCE_DATASET_ID",
"maxItems": 250,
"encryptionScope": "selected_fields",
"fieldsToEncrypt": ["email", "phone", "notes"],
"preserveFields": ["id", "name", "company"],
"removePlaintextAfterEncryption": true,
"algorithm": "aes_gcm",
"keyMode": "passphrase",
"passphrase": "USE_A_ROTATED_SECRET",
"keyDerivation": "pbkdf2_sha256",
"iterations": 200000,
"includeManifest": true,
"includeHashFingerprint": true,
"fingerprintFields": ["email", "phone"],
"outputMode": "dataset",
"chunkSize": 100,
"maxConcurrency": 5
}

Keep passphrase / raw key material only in Apify secret input fields—not in repos.


Apify run notes

  • Use sourceDatasetId or sourceDatasetName when reading sibling actor output.
  • KV input/output keys must differ when both read/write KV to avoid overwriting.
  • Prefer aes_gcm unless you have a downstream compatibility constraint for Fernet.

Error handling

Invalid configuration fails fast (ActorConfigurationError). Per-record failures can emit type="__error__" rows depending on modes; summaries still report aggregates.


Local run & validation

cd encrypted-data-integration
python -m pip install -r requirements.txt
python main.py

Loads INPUT.json when platform input storage is absent.

Round-trip cryptography checks:

$python scripts/roundtrip_validation.py

Uses fixtures under testdata/ — validates decrypt round-trips for scopes and nested dotted paths.


Limitations

  • Encrypted output is not plaintext-searchable inside datasets.
  • Key management stays outside the actor—you bring passphrases/keys securely.
  • Fingerprints are for matching diagnostics; not secrecy.
  • Dotted-path selection targets nested objects; complex array gymnastics may need preprocessing.

📈 Why use this?

Manual handling of sensitive payloads is risky. This actor provides automated, explainable cryptography with manifests, structured errors, fingerprints, dedupe-ready metadata—all inside reproducible runs.


🚀 Start now

Configure input, encrypt your dataset batches, inspect __summary__, and orchestrate downstream secure sync from there.