7-Zip Recursive Archive Extractor: Enterprise-Grade Automation
Pricing
Pay per event
7-Zip Recursive Archive Extractor: Enterprise-Grade Automation
High-performance 7-Zip extractor supporting 30+ formats (ZIP, RAR, 7Z, TAR, ISO, GZIP, BZIP2, XZ, CAB & more). Features recursive nested archive extraction, CRC-based incremental updates to skip unchanged files, security filtering, and dual KV Store + Dataset output. Saves up to 90% compute costs.
Pricing
Pay per event
Rating
5.0
(3)
Developer

Adrian Nicolae
Actor stats
0
Bookmarked
3
Total users
1
Monthly active users
2 days ago
Last modified
Categories
Share
7-Zip Recursive Archive Extractor: Enterprise-Grade Archive Automation
Extract, process, and index 100GB+ archives with recursive nesting, incremental updates, and CRC-based change detection - all powered by 7-Zip on Apify's cloud infrastructure. Configure download/file limits and Actor memory to match your target sizes.
The Powerhouse Archive Solution You've Been Looking For
Most unzip tools choke on nested archives, can't skip unchanged files, or force you to download everything locally. This Apify Actor is different: it's a high-performance 7-Zip extractor designed for automated data pipelines, legacy archive migration, and security-first filtering at cloud scale.
What makes this a "powerhouse"?
- 30+ Format Support: ZIP, RAR, 7Z, TAR, GZIP, BZIP2, XZ, ISO, CAB, ARC, ZIPX, and every format 7-Zip recognizes
- Recursive Extraction: Automatically detects and processes archives inside archives (up to configurable depth)
- Incremental Intelligence: CRC32+size signatures let you skip unchanged files between runs - saving up to 90% of compute costs
- Dual-Storage Architecture: Raw files go to KV Store (direct download links), metadata goes to Dataset (structured queries)
- Security Hardened: Blocks executables by default, validates paths against traversal attacks, enforces size limits
How It Works: The Data Flow
โโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโโโโโโโโ URL Input โ โ URL List โ โ Dataset (URL Field) โโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโโโโโโโโ โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ 7-Zip Extractor Engine โโ โข Download with guards โโ โข Detect format โโ โข Incremental check โโ โข Extract files โโ โข Recurse if nested โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโผ โผโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโโโโโโ KV Store โ โ Dataset โโ (Raw Files) โ โ (Structured Index) โโ โ โ โโ โข files โ โ โข archiveUrl โโ โข SUMMARY โ โ โข path โโ โข INCR__sha1 โ โ โข kvKey โโ โข OUTPUT_... โ โ โข status โโโโโโโโโโโโโโโโโโโโ โ โข incrementalStatus โโ โข pointer rows โโโโโโโโโโโโโโโโโโโโโโโโโ
Result: Both downloadable files (KV Store) and a searchable index (Dataset) are provided with per-file metadata including extraction status, size, extension, and incremental comparison results.
The Format Wall: Every Extension Supported
7-Zip's comprehensive format support is leveraged to handle virtually any compressed or archived file:
Primary Archive Formats
ZIP โข RAR โข 7Z โข TAR โข GZIP โข BZIP2 โข XZ โข LZMA
Disk Image & Container Formats
ISO โข VHD โข VHDX โข WIM โข SWM โข DMG โข HFS โข NTFS โข FAT โข SquashFS โข UDF
Legacy & Specialty Formats
CAB โข CHM โข MSI โข ARJ โข LZH โข CPIO โข RPM โข DEB โข ARC โข ZIPX โข SWF/SWFC โข NSIS
Compressed TAR Variants
TGZ (tar.gz) โข TBZ2 (tar.bz2) โข TXZ (tar.xz) โข TAR.LZMA โข TAR.Z
Single-File Compression
Z โข LZW โข LZIP โข LZOP โข ZSTD โข BROTLI โข BASE64 โข HASH
Can't find your format? If 7-Zip can list it with 7z l, it can be extracted. Check the debug logs for supported format detection.
Key Advantages: The Technical Moat
1. Incremental Extraction: The Compute Cost Killer
Traditional extractors re-process every file on every run. This Actor tracks file signatures (CRC32 + size) per archive URL and builds an incremental index:
How it works:
- First run: All files are extracted, CRC32 checksums computed, signatures stored in KV Store under
INCR::{sha1(archiveUrl)} - Subsequent runs: Current archive contents are compared against stored signatures
- Unchanged files are skipped (marked
SKIPPED_UNCHANGEDin dataset) - Only new or modified files are re-extracted (marked
neworchanged)
Real-world impact: A daily job processing a 10GB archive with 5,000 files where only 100 files change will:
- Original approach: Extract 5,000 files every day
- Incremental approach: Extract 100 files after first run (98% reduction)
- Cost savings: ~90% reduction in compute units
Configuration:
{"incremental": {"enabled": true,"strategy": "crc+size","onlyNewOrChanged": true}}
Note: sizeOnly strategy can be used for massive archives where CRC computation overhead outweighs accuracy benefits.
2. Recursive Nested Archive Extraction
Legacy systems often store "archives within archives" (e.g., daily backups as backup-2025-01-15.zip inside monthly-archives.tar.gz inside year-2025.7z). Most tools require manual multi-pass extraction.
Automatic nesting handling:
- Nested archives are detected by extension (zip, rar, 7z, tar, gz, bz2, xz, tgz, iso, cab, arc, zipx)
- Parent is extracted โ contents scanned โ child archives extracted โ repeated up to configured depth
- Nesting level is tracked in dataset (
nestedDepth: 0, 1, 2, ...) archiveUrlincludes#pathsuffix to trace file origins:https://ex.com/data.zip#backup/2025/jan.tar.gz#reports/
Configuration:
{"formats": {"extractNestedArchives": true,"nestedArchiveDepth": 2,"nestedBeyondDepthBehavior": "skip"}}
Use cases:
- Legacy tape backups: Multi-level TAR archives from enterprise backup systems
- Software distributions: Installers containing compressed packages containing archives
- Data dumps: Database exports compressed multiple times for size reduction
3. Data Cleaning & UTF-8 Normalization
International datasets often contain mixed encodings (Windows-1252, ISO-8859-1, Shift-JIS, etc.), causing "mojibake" (corrupt characters) when parsed.
Text normalization features:
- Common text extensions are treated as candidates for UTF-8 verification (
.txt,.csv,.json,.xml,.html,.md,.log,.ini,.yaml) - When enabled, files are decoded as UTF-8 and re-stored in UTF-8 form
- If decoding fails (file is not valid UTF-8), original bytes are stored to avoid corruption
- Normalized content is stored in KV Store when outputMode includes KV
Configuration:
{"textOptions": {"convertTextToUtf8": true,"textExtensions": [".txt", ".csv", ".json", ".xml", ".html", ".log"]}}
Result: Best-effort UTF-8 normalization for files already encoded in UTF-8; other encodings are preserved unchanged to avoid data loss.
4. Security-First Architecture
Archives from untrusted sources can contain malware, path traversal exploits, or resource exhaustion attacks. Multiple defensive layers are implemented:
Security features:
- Extension blocklist: Executables are rejected by default (
.exe,.dll,.bat,.cmd,.sh,.ps1,.msi) - Path validation: Absolute paths, drive letters,
..segments, and self-referential entries are blocked - Size guards:
maxDownloadBytes(500 MB default) andmaxFileSizeBytes(500 MB default) are enforced - File count limits: Processing stops after
maxFiles(1000 default) to prevent zip bombs - Timeout protection: 7-Zip operations exceeding
listTimeoutMillisare aborted
Customization:
{"filters": {"blockedExtensions": [".exe", ".dll", ".bat", ".sh", ".ps1", ".msi", ".scr"],"excludedPatterns": ["__MACOSX/", ".DS_Store", "Thumbs.db"]},"limits": {"maxDownloadBytes": 524288000,"maxFileSizeBytes": 524288000,"maxFiles": 1000}}
Use Cases: Real-World Applications
1. Automated Data Pipeline: Daily Archive Ingestion
Scenario: Your company receives daily data exports as ZIP files on an FTP server. CSVs need extraction, schema validation, and loading into a data warehouse.
Implementation:
{"url": "https://ftp.partner.com/exports/daily-2025-01-15.zip","filters": {"allowedExtensions": [".csv", ".json"],"blockedExtensions": [".exe", ".dll"]},"incremental": {"enabled": true,"onlyNewOrChanged": true},"outputOptions": {"datasetName": "daily-extracts","storeName": "raw-files"},"webhook": {"url": "https://pipeline.yourcompany.com/archive-complete","secret": "your-hmac-secret"}}
Workflow:
- CSVs are extracted to KV Store
raw-files - File index is written to Dataset
daily-extracts - Webhook triggers downstream validation Actor
- Unchanged files are skipped on subsequent runs (incremental mode)
Benefits: Fully automated, cost-optimized, with webhook integration for pipeline orchestration.
2. Legacy Archive Migration: Enterprise Data Modernization
Scenario: 20 years of legacy backups (nested TARs and RARs) need migration from on-premise storage to cloud object storage. Archives are deeply nested (3-4 levels) with mixed compression.
Implementation:
{"datasetId": "legacy-archive-inventory","urlField": "backupUrls","formats": {"extractNestedArchives": true,"nestedArchiveDepth": 4,"archiveTypes": ["tar", "rar", "zip", "7z", "gz", "bz2"]},"limits": {"maxDownloadBytes": 2000000000,"maxFileSizeBytes": 2000000000,"maxFiles": 0},"concurrency": 5,"errorHandling": {"mode": "lenient","maxPerArchiveErrors": 100}}
Workflow:
- Archive URLs are read from inventory dataset
- 4 levels deep recursive extraction is performed
- Errors are logged without stopping (lenient mode)
- Per-archive success rates are shown in final summary
Benefits: Corrupted/incomplete backups are handled gracefully, folder hierarchy is preserved, audit trail is provided via dataset.
3. Security-First Filtering: Malware-Free Document Extraction
Scenario: Documents (PDFs, DOCs) need extraction from user-uploaded archives while blocking executables and scripts. Archives may come from untrusted sources.
Implementation:
{"url": "https://uploads.example.com/user-123/documents.zip","filters": {"allowedExtensions": [".pdf", ".doc", ".docx", ".txt", ".md"],"blockedExtensions": [".exe", ".dll", ".bat", ".cmd", ".sh", ".ps1", ".msi", ".scr", ".vbs", ".js"],"excludedPatterns": ["__MACOSX/", ".DS_Store", "desktop.ini"]},"limits": {"maxDownloadBytes": 104857600,"maxFileSizeBytes": 524288000,"maxFiles": 500},"formats": {"extractNestedArchives": false},"errorHandling": {"mode": "strict"}}
Workflow:
- User archive is downloaded with size guard
- Executables and scripts are rejected
- Only documents are extracted to KV Store
- Processing aborts if malicious content is detected (strict mode)
Benefits: Downstream systems are protected from malware, attack surface is reduced, audit trail is provided.
Outputs: Understanding the Dual-Storage Model
Apify's two-storage architecture is used to provide maximum flexibility:
KV Store (Key-Value Store)
Purpose: Raw binary content storage for direct file downloads
Contents:
- Extracted files: Stored with archive paths as keys (e.g.,
reports/2025/january.pdf) - Flattened mode: Optional deterministic keys like
january.pdf-a1b2c3d4to avoid collisions - Summary record: JSON at
SUMMARYkey (or custom key) with run statistics - Incremental indexes: State records at
INCR::{sha1(archiveUrl)}for change tracking - Output pointers: When custom names are used,
OUTPUT_POINTERScontains destination metadata
Access: Apify Console โ Storage โ Key-Value Stores โ (your store name) โ Records
API endpoint: {{links.apiDefaultKeyValueStoreUrl}}/records/{KEY}
Dataset (Structured Index)
Purpose: Queryable per-file metadata for filtering, searching, and analytics
Schema:
{"archiveUrl": "https://example.com/data.zip#nested.tar.gz","path": "reports/2025/january.csv","kvKey": "reports/2025/january.csv","sizeBytes": 1048576,"extension": ".csv","status": "EXTRACTED","nestedDepth": 1,"incrementalStatus": "changed","errorCode": null,"errorMessage": null}
Statuses:
EXTRACTED: Successfully extracted and storedSKIPPED_UNCHANGED: Incremental mode detected no changesSKIPPED_FILTERED: Blocked by extension/path filtersSKIPPED_MAX_FILES: Exceeded file count limitTOO_LARGE: File size exceedsmaxFileSizeBytesERROR: Extraction failed (see errorCode/errorMessage)DOWNLOAD_ERROR: Archive download failedARCHIVE_ERROR: Archive-level error (corrupt, unsupported)SKIPPED_NESTED_TOO_DEEP: Nested beyondnestedArchiveDepth
Incremental statuses: new, changed, unchanged, or null (when incremental disabled)
Access: Apify Console โ Storage โ Datasets โ (your dataset name) โ Items
API endpoint: {{links.apiDefaultDatasetUrl}}/items
Custom Output Destinations
When custom storeName or datasetName is specified in outputOptions, a pointer row is written to help locate the data:
Dataset pointer row:
{"type": "pointer","kvStoreName": "my-custom-store","kvStoreId": "abc123","kvUrl": "https://api.apify.com/v2/key-value-stores/abc123/records","datasetName": "my-custom-dataset","datasetId": "xyz789","datasetUrl": "https://api.apify.com/v2/datasets/xyz789/items","summaryKey": "SUMMARY","summaryUrl": "https://api.apify.com/v2/key-value-stores/abc123/records/SUMMARY","incrementalIndexPrefix": "INCR::"}
KV Store pointer (OUTPUT_POINTERS key): Same metadata is contained in the default KV store even when custom destinations are used.
Summary Payload: Run Statistics & Webhook Format
Stored in KV Store at SUMMARY key (or custom key) and optionally POSTed to webhook URL:
{"startedAt": "2025-01-15T10:00:00.000Z","finishedAt": "2025-01-15T10:15:32.128Z","totals": {"archivesProcessed": 25,"archivesFailed": 1,"downloadsFailed": 0,"unsupportedArchives": 0,"archiveErrors": 1,"filesDiscovered": 12483,"filesExtracted": 11250,"filesSkipped": 1180,"filesErrored": 53,"skippedTooLarge": 45,"skippedFiltered": 320,"skippedUnchanged": 815,"skippedMaxFiles": 0,"skippedNestedTooDeep": 0,"incrementalNew": 10200,"incrementalChanged": 1050,"incrementalUnchanged": 815,"nestedArchivesProcessed": 78},"byExtension": {".csv": { "files": 3500, "extracted": 3450 },".json": { "files": 2800, "extracted": 2700 },".pdf": { "files": 1950, "extracted": 1925 },".xml": { "files": 1200, "extracted": 1180 },".txt": { "files": 800, "extracted": 780 }},"byArchive": {"https://example.com/data-2025-01-15.zip": {"filesDiscovered": 523,"filesExtracted": 480,"filesSkipped": 38,"filesErrored": 5,"error": null}},"incremental": {"enabled": true,"strategy": "crc+size","indexKeyPrefix": "INCR::"},"nestedArchives": {"enabled": true,"maxDepth": 2}}
Webhook signature: If webhook.secret is provided, the POST includes header:
x-universal-archive-signature: HMAC_HEX
HMAC-SHA256 of request body is computed using the secret for authenticity verification.
Input Configuration: Quick Reference
Required (choose exactly one)
// Option 1: Single URL{ "url": "https://example.com/data.zip" }// Option 2: Multiple URLs{ "urls": ["https://ex.com/a.zip", "https://ex.com/b.tar.gz"] }// Option 3: Dataset source{"datasetId": "abc123","urlField": "archiveUrl"}
Recommended Settings
{"outputMode": "both","concurrency": 10,"incremental": {"enabled": true,"strategy": "crc+size"},"formats": {"extractNestedArchives": true,"nestedArchiveDepth": 2},"filters": {"blockedExtensions": [".exe", ".dll", ".bat", ".sh"]},"limits": {"maxDownloadBytes": 524288000,"maxFiles": 1000},"errorHandling": {"mode": "lenient"}}
Advanced: High-Volume Processing
{"urls": ["https://archives.example.com/dump-01.tar.gz","https://archives.example.com/dump-02.tar.gz","https://archives.example.com/dump-03.tar.gz"],"concurrency": 25,"limits": {"maxDownloadBytes": 2000000000,"maxFileSizeBytes": 0,"maxFiles": 0},"incremental": {"enabled": true,"strategy": "sizeOnly"},"httpOptions": {"useApifyProxy": true,"apifyProxyGroups": ["RESIDENTIAL"]}}
Running on Apify Platform
Getting Started:
- Create an Apify account (free tier available)
- Find this Actor in the Apify Store
Running the Actor:
- Navigate to Apify Console โ Actors โ (your actor)
- Configure input via UI or JSON
- Click "Start"
- Monitor run logs in real-time
Accessing Outputs:
- Go to Storage โ Key-Value Stores / Datasets
- Or use API endpoints from output schema
Performance & Cost Optimization
Compute Unit Usage
- Base cost: Depends on archive type, nesting, and memory; CU usage can be monitored per run in Apify Console
- Incremental savings: 70-90% reduction on subsequent runs
- Nested archives: +20% overhead per nesting level (recursive extraction)
Optimization Strategies
- Incremental mode should be enabled for recurring jobs
sizeOnlystrategy can be used for archives >1 GB with frequent updatesmaxFilesandmaxFileSizeBytesshould be set to prevent runaway costsconcurrencyshould be reduced if memory limits are hit (default: 10)allowedExtensionsshould be used instead ofblockedExtensionsfor targeted extraction
Memory Considerations
- Default limits: Suitable for archives up to 500 MB with 1000 files
- Large archives: Actor memory can be increased in Apify Console (1 GB, 2 GB, 4 GB)
- Nested archives: Each nesting level requires temporary disk space (factor 2-3x archive size)
Troubleshooting
"UNSUPPORTED_ARCHIVE_TYPE" errors
Cause: Archive format not in formats.archiveTypes allowlist
Solution:
- Enable debug mode:
"debug": true - Check logs for "Supported formats: ..."
- Add detected format to allowlist or use
["auto"]
"DOWNLOAD_ERROR: Size limit exceeded"
Cause: Archive larger than maxDownloadBytes (500 MB default)
Solution:
{"limits": {"maxDownloadBytes": 2000000000}}
Incremental mode not skipping files
Cause: Archive URL changed (different domain/path/query params)
Solution: Incremental index is keyed by SHA1 of URL. Consistent URLs should be ensured or INCR:: keys manually copied between runs.
Out of memory errors
Cause: Archive too large for available Actor memory
Solutions:
- Actor memory can be increased in Apify Console
concurrencycan be reduced to free memory per archivemaxFilescan be set to limit extraction scope- Nested archive extraction can be disabled:
"extractNestedArchives": false
Webhook not receiving POST
Cause: Webhook URL unreachable or HMAC signature validation failing
Debug:
- Actor logs should be checked for webhook POST details
- Webhook URL accessibility should be verified
- HMAC signature should be validated:
HMAC-SHA256(requestBody, secret) - Reverse proxy/firewall rules blocking Apify IPs should be checked
API Integration Examples
Trigger run via Apify API
curl -X POST "https://api.apify.com/v2/acts/{ACTOR_ID}/runs" \-H "Authorization: Bearer {APIFY_TOKEN}" \-H "Content-Type: application/json" \-d '{"url": "https://example.com/data.zip","incremental": { "enabled": true }}'
Fetch extracted files
# Get file indexcurl "https://api.apify.com/v2/datasets/{DATASET_ID}/items?format=json"# Download specific filecurl "https://api.apify.com/v2/key-value-stores/{STORE_ID}/records/reports/2025/data.csv" \-o data.csv
Check run status
curl "https://api.apify.com/v2/actor-runs/{RUN_ID}" \-H "Authorization: Bearer {APIFY_TOKEN}"
Frequently Asked Questions
Q: Can password-protected archives be extracted?
A: Not currently supported.
Q: What's the maximum archive size?
A: Default 500 MB via maxDownloadBytes. Higher values can be configured, but Actor memory/disk should be sufficient and timeouts adjusted for very large archives.
Q: How long are extracted files stored?
A: KV Store/Dataset records are retained according to data retention policy (default: 7 days for free tier, configurable for paid plans).
Q: Can specific files be extracted without downloading the entire archive?
A: Not directly. Full archive download is required by 7-Zip for extraction. allowedExtensions or excludedPatterns can be used to filter post-download.
Q: Is streaming extraction supported?
A: No. Archives are downloaded to disk, then extracted. Streaming extraction is incompatible with nested archive detection and incremental logic.
Feature Requests & Roadmap
Feedback is actively collected to improve this Actor. Features under consideration:
Potential Future Enhancements:
- Password-protected archive support (with secure credential management)
- Streaming extraction for extremely large archives
- Direct S3/Azure Blob storage integration (skip KV Store for huge datasets)
- Archive repair/recovery for corrupted files
- Custom extraction callbacks for advanced filtering logic
- Multi-part archive support (.zip.001, .zip.002, etc.)
- Archive metadata extraction (comments, timestamps, permissions)
Submit Your Request:
Bug reports and feature requests can be submitted via the Issues tab on this Actor's page in the Apify Console. Navigate to the Actor โ Issues to view existing requests or create new ones.
Features are prioritized based on user demand and technical feasibility.
License & Support
This Actor is available on the Apify Platform.
Support Channels:
- Actor Issues: Use the Issues tab on this Actor's page for bug reports and feature requests
- Apify Community Forum: https://community.apify.com for general questions and discussions
- Apify Support: support@apify.com for platform-related issues (Apify customers)
Keywords for Search Optimization
7-Zip extractor, recursive archive extraction, incremental unzip, nested archive processor, bulk archive downloader, RAR extractor API, archive automation, 7z batch extractor, ISO file extractor, archive change detection, TAR extractor, GZIP extractor, BZIP2 extractor, XZ extractor, automated archive processing, cloud archive extraction, Apify archive actor, unzip API, zip extractor, archive decompression, legacy archive migration, archive pipeline automation, CRC-based incremental extraction, nested ZIP extractor, recursive TAR extraction, multi-level archive processing, enterprise archive solution, archive format converter, batch unzip tool, automated file extraction, archive metadata indexing, secure archive extraction, malware-free extraction, archive filtering, compression format support, archive validation, incremental file processing, archival data extraction, backup archive extraction, data pipeline automation, archive security filtering, compute cost optimization, cloud-native extraction