Encrypted Data Integration
Pricing
from $4.00 / 1,000 encrypted records
Encrypted Data Integration
Encrypted Data Integration encrypts sensitive Apify data before export or automation. It supports selected fields, full records, or full payload encryption with AES-GCM, manifests, fingerprints, and dataset or key-value store output.
Pricing
from $4.00 / 1,000 encrypted records
Rating
0.0
(0)
Developer
Sovanza
Maintained by CommunityActor stats
0
Bookmarked
2
Total users
1
Monthly active users
a day ago
Last modified
Categories
Share
🔐 Encrypted Data Integration Tool – Secure API & Data Sync Automation
Securely integrate and transfer encrypted data across your automation stack. Built for developers, SaaS teams, and data engineers who need safe handling of sensitive fields inside Apify runs—without pasting plaintext into exports, logs, or downstream webhooks.
What it actually does: this actor reads structured records from an Apify dataset, key-value store item, pasted JSON, or inline JSON records, applies authenticated encryption, and writes encrypted output back to a dataset and/or KV store. Use it as the cryptographic step in larger sync workflows (schedules + API exports + your own integrations).
Optimized positioning
Securely protect data before it leaves your Apify storage boundary. Choose field-level encryption (most common), encrypt an entire record, or encrypt a full batch payload for handoff. Pair with Apify schedules and your own HTTP/database connectors for end-to-end pipelines.
🚀 Start secure data integration
Automate pipelines with encryption and operational control:
👉 Protect sensitive columns before export or handoff
👉 Deterministic fingerprints for matching workflows (hashes—not encryption by themselves)
👉 Structured manifests so decryptors know algorithm, KDF, and scope
👉 Runs on demand or on schedule via Apify
🧠 What this tool actually does
This actor is a secure transformation engine inside Apify—not a turnkey “reverse ETL” to every SaaS vendor out of the box. It connects to your existing Apify data and produces encrypted derivatives:
| Capability | Detail |
|---|---|
| Read from | dataset, kv_store, json_records, json_text |
| Encrypt | selected_fields, full_record, or full_payload |
| Algorithms | AES-256-GCM (recommended) or Fernet compatibility mode |
| Key material | Passphrase (PBKDF2 or scrypt) or raw symmetric key (base64) |
| Write to | default dataset, KV item, or both |
Use it upstream of integrations: encrypt first, then move ciphertext through your downstream systems—never commit passphrases to git, and rotate secrets responsibly.
Unlike basic “encrypt this string” snippets, outputs include manifests, ciphertext metadata, fingerprints (optional), and summary/error rows suited for audited workflows.
🔄 Supported data sources & outputs (within Apify)
Source (sourceMode) | When to use |
|---|---|
dataset | Encrypt items from another actor’s dataset (sourceDatasetId or name) |
kv_store | Encrypt a blob already stored under a KV key |
json_records / json_text | Quick tests or pasted uploads |
Outputs land in outputMode = dataset, kv_store, or both. External CRMs/analytics/database targets are reached through your HTTP workers, exporters, warehouses, or other actors—not inside this codebase.
🔐 Security & encryption
Security is foundational:
- Uses authenticated encryption (AES-GCM default) via the
cryptographylibrary - Derives symmetric keys via PBKDF2-HMAC-SHA256 or scrypt when using passphrases
- Never logs plaintext secrets (
passphrase, raw keys); enableredactLogsby default posture - Can strip plaintext sensitive fields after writing
*_encryptedcounterparts
Operational reality: ciphertext is safer than plaintext—but key management remains your responsibility (vaults, KMS, rotating passphrases, least-privilege access on Apify runs).
HTTPS applies to browser/API traffic toward Apify; encryption-at-rest semantics depend on Apify Storage configuration and enterprise controls.
⚡ Key features
- Multiple input sources (dataset/KV/direct JSON text)
- Three encryption scopes: selected fields / full record / full payload batch
preserveFieldsto keep indexing columns readableremovePlaintextAfterEncryptionto minimize accidental leakage- Manifests: algorithm,
cryptoVersion, KDF iterations, scopes - Optional SHA-256 fingerprints for deterministic matching workflows
- Batched concurrency with
chunkSize+maxConcurrency - Structured
__summary__and__error__dataset rows
🎯 Real-world use cases
| Scenario | Why this fits |
|---|---|
| PII masking before BI export | Encrypt email, phone, nested paths before spreadsheets get shared |
| CRM / lead payloads | Field-level ciphertext + fingerprints for deterministic joins |
| Data pipeline QA | Use failOnMissingFields / invalid record toggles to harden ingestion |
| Secure archival bundles | full_payload mode for handing off encrypted batch blobs |
| Automation hygiene | Run on schedule whenever upstream scrapers finish |
🛠️ How to use (Apify Console)
- Choose
sourceMode(dataset,kv_store,json_records, orjson_text). - Pick
encryptionScopeand declarefieldsToEncryptforselected_fieldsmode. - Provide
passphraseorrawKeyBase64(keyMode). - Decide
removePlaintextAfterEncryption+ optionalpreserveFields. - Set
outputMode(dataset,kv_store, orboth) and KV key names if applicable. - Click Run → inspect dataset rows (
__summary__). - Optional: Schedule runs for repeatable encryption jobs after upstream actors finish.
Quick input snippet
{"sourceMode": "json_records","sourceJsonRecords": [{ "id": "1", "email": "user@company.com", "notes": "Confidential memo" }],"encryptionScope": "selected_fields","fieldsToEncrypt": ["email", "notes"],"preserveFields": ["id"],"removePlaintextAfterEncryption": true,"algorithm": "aes_gcm","keyMode": "passphrase","passphrase": "USE_A_ROTATED_SECRET_FROM_A_VAULT","keyDerivation": "pbkdf2_sha256","iterations": 200000,"outputMode": "dataset","includeManifest": true,"includeHashFingerprint": true}
Full schema lives in INPUT_SCHEMA.json (shown in Console).
📦 Output & results
Depending on configuration you receive encrypted dataset rows, optional KV payload, fingerprints, manifests, __summary__ counters, and structured __error__ diagnostics. Exported via Apify as JSON / CSV / Excel—plus KV exports when you materialize ciphertext externally.
Example selected_fields output shape:
{"recordId": "lead-1001","contact": {"email_encrypted": {"algorithm": "aes_gcm","version": "1","nonce": "BASE64_NONCE","salt": "BASE64_SALT","kdf": "pbkdf2_sha256","iterations": 200000,"ciphertext": "BASE64_CIPHERTEXT"}},"manifest": {"cryptoVersion": "1","encryptionScope": "selected_fields","encryptedFields": ["contact.email"]}}
Final summary rows look like:
{"type": "__summary__","inputRecords": 100,"processedRecords": 100,"failedRecords": 0,"encryptedFieldsCount": 230,"encryptionScope": "selected_fields","algorithm": "aes_gcm"}
📊 Performance & scalability
Throughput scales with chunkSize, maxItems, concurrency, ciphertext size, and Apify Storage API limits—tune thoughtfully for heavy datasets.
❓ Frequently asked questions
| Question | Answer |
|---|---|
| Different from Zapier/low-code tools? | This actor focuses on cryptographic correctness plus Apify-native IO—not generic SaaS adapters. Plug it into outbound automation with your own exporters. |
| Multiple integrations in one run? | This actor handles encryption. Chain other actors/workflows for multi-hop sync across vendors. |
| Sensitive workloads? | Yes—provided you manage secrets, storage access, rotation, and auditing. Use enterprise policies where required. |
| Coding skills? | Console-friendly; YAML/JSON input only. Understand what fields contain secrets. |
| Recurring runs? | Yes—Apify scheduler triggers after upstream jobs. |
| Failure handling | Controlled via failOnMissingFields, failOnInvalidRecords; errors surface as __error__ rows plus logs. |
| Stores data permanently? | ciphertext persists in whichever output storage you configured (datasets/KV) until you purge it. Plaintext stripping helps reduce exposure. |
Security disclaimer
This actor helps protect confidentiality of configured fields—but fingerprints ≠ encryption. Never treat hashes as secrecy. Maintain strong passphrases, avoid committing secrets to git, rotate keys, restrict dataset access.
Input configuration (reference)
Full schema: INPUT_SCHEMA.json. Main groups:
- Sources:
sourceMode,sourceDatasetId/sourceDatasetName,sourceKvStoreKey,sourceJsonRecords,sourceJsonText,maxItems - Encryption:
encryptionScope,fieldsToEncrypt,preserveFields,removePlaintextAfterEncryption,outputEncryptedFieldSuffix - Crypto:
algorithm,keyMode,passphrase,rawKeyBase64,keyDerivation,iterations - Manifests / fingerprints:
includeManifest,includeHashFingerprint,fingerprintFields,deterministicFingerprintSalt(secret) - Output:
outputMode,outputKvStoreKey,includeRecordId,recordIdField - Execution:
chunkSize,maxConcurrency,failOnMissingFields,failOnInvalidRecords,includeDebugFields,redactLogs
Encryption modes
selected_fields
Encrypt only listed fields (supports dotted paths like contact.email). With removePlaintextAfterEncryption: true, originals are dropped after sibling *_encrypted fields exist.
full_record
Encrypt the entire JSON object as one ciphertext payload; typically keeps identifiers + fingerprints + manifests.
full_payload
Encrypt the whole batch as one blob—useful for secure handoffs or archiving.
Algorithms
AES-256-GCM (aes_gcm) — default
Per-value random nonce, authenticated AEAD encryption, manifests store only safe metadata (nonce, salt, KDF iterations, algorithm version)—never plaintext.
Fernet (fernet)
Compatibility option when downstream tooling expects Fernet tokens; AES-GCM remains the primary recommendation.
Example: encrypt from another dataset
{"sourceMode": "dataset","sourceDatasetId": "YOUR_SOURCE_DATASET_ID","maxItems": 250,"encryptionScope": "selected_fields","fieldsToEncrypt": ["email", "phone", "notes"],"preserveFields": ["id", "name", "company"],"removePlaintextAfterEncryption": true,"algorithm": "aes_gcm","keyMode": "passphrase","passphrase": "USE_A_ROTATED_SECRET","keyDerivation": "pbkdf2_sha256","iterations": 200000,"includeManifest": true,"includeHashFingerprint": true,"fingerprintFields": ["email", "phone"],"outputMode": "dataset","chunkSize": 100,"maxConcurrency": 5}
Keep passphrase / raw key material only in Apify secret input fields—not in repos.
Apify run notes
- Use
sourceDatasetIdorsourceDatasetNamewhen reading sibling actor output. - KV input/output keys must differ when both read/write KV to avoid overwriting.
- Prefer
aes_gcmunless you have a downstream compatibility constraint for Fernet.
Error handling
Invalid configuration fails fast (ActorConfigurationError). Per-record failures can emit type="__error__" rows depending on modes; summaries still report aggregates.
Local run & validation
cd encrypted-data-integrationpython -m pip install -r requirements.txtpython main.py
Loads INPUT.json when platform input storage is absent.
Round-trip cryptography checks:
$python scripts/roundtrip_validation.py
Uses fixtures under testdata/ — validates decrypt round-trips for scopes and nested dotted paths.
Limitations
- Encrypted output is not plaintext-searchable inside datasets.
- Key management stays outside the actor—you bring passphrases/keys securely.
- Fingerprints are for matching diagnostics; not secrecy.
- Dotted-path selection targets nested objects; complex array gymnastics may need preprocessing.
📈 Why use this?
Manual handling of sensitive payloads is risky. This actor provides automated, explainable cryptography with manifests, structured errors, fingerprints, dedupe-ready metadata—all inside reproducible runs.
🚀 Start now
Configure input, encrypt your dataset batches, inspect __summary__, and orchestrate downstream secure sync from there.