Pricing

Pay per event

Try for free

Go to Apify Store

7-Zip Recursive Archive Extractor: Enterprise-Grade Automation

Try for free

High-performance 7-Zip extractor supporting 30+ formats (ZIP, RAR, 7Z, TAR, ISO, GZIP, BZIP2, XZ, CAB & more). Features recursive nested archive extraction, CRC-based incremental updates to skip unchanged files, security filtering, and dual KV Store + Dataset output. Saves up to 90% compute costs.

Pricing

Pay per event

Rating

5.0

(7)

Developer

Adrian Nicolae

Actor stats

Bookmarked

Total users

Monthly active users

2 months ago

Last modified

7-Zip Recursive Archive Extractor: Enterprise-Grade Archive Automation

Extract, process, and index 100GB+ archives with recursive nesting, incremental updates, and CRC-based change detection - all powered by 7-Zip on Apify's cloud infrastructure. Configure download/file limits and Actor memory to match your target sizes.

The Powerhouse Archive Solution You've Been Looking For

Most unzip tools choke on nested archives, can't skip unchanged files, or force you to download everything locally. This Apify Actor is different: it's a high-performance 7-Zip extractor designed for automated data pipelines, legacy archive migration, and security-first filtering at cloud scale.

What makes this a "powerhouse"?

30+ Format Support: ZIP, RAR, 7Z, TAR, GZIP, BZIP2, XZ, ISO, CAB, ARC, ZIPX, and every format 7-Zip recognizes
Recursive Extraction: Automatically detects and processes archives inside archives (up to configurable depth)
Incremental Intelligence: CRC32+size signatures let you skip unchanged files between runs - saving up to 90% of compute costs
Dual-Storage Architecture: Raw files go to KV Store (direct download links), metadata goes to Dataset (structured queries)
Security Hardened: Blocks executables by default, validates paths against traversal attacks, enforces size limits

How It Works: The Data Flow

┌─────────────┐  ┌─────────────┐  ┌────────────────────────┐
│  URL Input  │  │  URL List   │  │   Dataset (URL Field)  │
└─────────────┘  └─────────────┘  └────────────────────────┘
       │                │                     │
       └──────────────────────────────────────┘
                        │
                        ▼
           ┌──────────────────────────┐
           │  7-Zip Extractor Engine  │
           │  • Download with guards  │
           │  • Detect format         │
           │  • Incremental check     │
           │  • Extract files         │
           │  • Recurse if nested     │
           └──────────────────────────┘
                        │
            ┌─────────────────────────┐
            ▼                         ▼
   ┌─────────────────┐    ┌──────────────────────┐
   │   KV Store      │    │      Dataset         │
   │  (Raw Files)    │    │  (Structured Index)  │
   │                 │    │                      │
   │  • files        │    │  • archiveUrl        │
   │  • SUMMARY      │    │  • path              │
   │  • INCR__sha1   │    │  • kvKey             │
   │  • OUTPUT_...   │    │  • status            │
   └─────────────────┘    │  • incrementalStatus │
                          │  • pointer rows      │
                          └──────────────────────┘

Result: Both downloadable files (KV Store) and a searchable index (Dataset) are provided with per-file metadata including extraction status, size, extension, and incremental comparison results.

The Format Wall: Every Extension Supported

7-Zip's comprehensive format support is leveraged to handle virtually any compressed or archived file:

Primary Archive Formats

ZIP • RAR • 7Z • TAR • GZIP • BZIP2 • XZ • LZMA

Disk Image & Container Formats

ISO • VHD • VHDX • WIM • SWM • DMG • HFS • NTFS • FAT • SquashFS • UDF

Legacy & Specialty Formats

CAB • CHM • MSI • ARJ • LZH • CPIO • RPM • DEB • ARC • ZIPX • SWF/SWFC • NSIS

Compressed TAR Variants

TGZ (tar.gz) • TBZ2 (tar.bz2) • TXZ (tar.xz) • TAR.LZMA • TAR.Z

Single-File Compression

Z • LZW • LZIP • LZOP • ZSTD • BROTLI • BASE64 • HASH

Can't find your format? If 7-Zip can list it with 7z l, it can be extracted. Check the debug logs for supported format detection.

Key Advantages: The Technical Moat

1. Incremental Extraction: The Compute Cost Killer

Traditional extractors re-process every file on every run. This Actor tracks file signatures (CRC32 + size) per archive URL and builds an incremental index:

How it works:

First run: All files are extracted, CRC32 checksums computed, signatures stored in KV Store under INCR::{sha1(archiveUrl)}
Subsequent runs: Current archive contents are compared against stored signatures
Unchanged files are skipped (marked SKIPPED_UNCHANGED in dataset)
Only new or modified files are re-extracted (marked new or changed)

Real-world impact: A daily job processing a 10GB archive with 5,000 files where only 100 files change will:

Original approach: Extract 5,000 files every day
Incremental approach: Extract 100 files after first run (98% reduction)
Cost savings: ~90% reduction in compute units

Configuration:

{
  "incremental": {
    "enabled": true,
    "strategy": "crc+size",
    "onlyNewOrChanged": true
  }
}

Note: sizeOnly strategy can be used for massive archives where CRC computation overhead outweighs accuracy benefits.

2. Recursive Nested Archive Extraction

Legacy systems often store "archives within archives" (e.g., daily backups as backup-2025-01-15.zip inside monthly-archives.tar.gz inside year-2025.7z). Most tools require manual multi-pass extraction.

Automatic nesting handling:

Nested archives are detected by extension (zip, rar, 7z, tar, gz, bz2, xz, tgz, iso, cab, arc, zipx)
Parent is extracted → contents scanned → child archives extracted → repeated up to configured depth
Nesting level is tracked in dataset (nestedDepth: 0, 1, 2, ...)
archiveUrl includes #path suffix to trace file origins: https://ex.com/data.zip#backup/2025/jan.tar.gz#reports/

Configuration:

{
  "formats": {
    "extractNestedArchives": true,
    "nestedArchiveDepth": 2,
    "nestedBeyondDepthBehavior": "skip"
  }
}

Use cases:

Legacy tape backups: Multi-level TAR archives from enterprise backup systems
Software distributions: Installers containing compressed packages containing archives
Data dumps: Database exports compressed multiple times for size reduction

3. Data Cleaning & UTF-8 Normalization

International datasets often contain mixed encodings (Windows-1252, ISO-8859-1, Shift-JIS, etc.), causing "mojibake" (corrupt characters) when parsed.

Text normalization features:

Common text extensions are treated as candidates for UTF-8 verification (.txt, .csv, .json, .xml, .html, .md, .log, .ini, .yaml)
When enabled, files are decoded as UTF-8 and re-stored in UTF-8 form
If decoding fails (file is not valid UTF-8), original bytes are stored to avoid corruption
Normalized content is stored in KV Store when outputMode includes KV

Configuration:

{
  "textOptions": {
    "convertTextToUtf8": true,
    "textExtensions": [".txt", ".csv", ".json", ".xml", ".html", ".log"]
  }
}

Result: Best-effort UTF-8 normalization for files already encoded in UTF-8; other encodings are preserved unchanged to avoid data loss.

4. Security-First Architecture

Archives from untrusted sources can contain malware, path traversal exploits, or resource exhaustion attacks. Multiple defensive layers are implemented:

Security features:

Extension blocklist: Executables are rejected by default (.exe, .dll, .bat, .cmd, .sh, .ps1, .msi)
Path validation: Absolute paths, drive letters, .. segments, and self-referential entries are blocked
Size guards: maxDownloadBytes (500 MB default) and maxFileSizeBytes (500 MB default) are enforced
File count limits: Processing stops after maxFiles (1000 default) to prevent zip bombs; overflow files are counted in the SUMMARY but, by default, no dataset rows are written (set limits.maxFilesOverflowMode: "dataset_rows" to keep per-file skips)
Timeout protection: 7-Zip operations exceeding listTimeoutMillis are aborted

Customization:

{
  "filters": {
    "blockedExtensions": [".exe", ".dll", ".bat", ".sh", ".ps1", ".msi", ".scr"],
    "excludedPatterns": ["__MACOSX/", ".DS_Store", "Thumbs.db"]
  },
  "limits": {
    "maxDownloadBytes": 524288000,
    "maxFileSizeBytes": 524288000,
    "maxFiles": 1000
  }
}

Use Cases: Real-World Applications

1. Automated Data Pipeline: Daily Archive Ingestion

Scenario: Your company receives daily data exports as ZIP files on an FTP server. CSVs need extraction, schema validation, and loading into a data warehouse.

Implementation:

{
  "url": "https://ftp.partner.com/exports/daily-2025-01-15.zip",
  "filters": {
    "allowedExtensions": [".csv", ".json"],
    "blockedExtensions": [".exe", ".dll"]
  },
  "incremental": {
    "enabled": true,
    "onlyNewOrChanged": true
  },
  "outputOptions": {
    "datasetName": "daily-extracts",
    "storeName": "raw-files"
  },
  "webhook": {
    "url": "https://pipeline.yourcompany.com/archive-complete",
    "secret": "your-hmac-secret"
  }
}

Workflow:

CSVs are extracted to KV Store raw-files
File index is written to Dataset daily-extracts
Webhook triggers downstream validation Actor
Unchanged files are skipped on subsequent runs (incremental mode)

Benefits: Fully automated, cost-optimized, with webhook integration for pipeline orchestration.

2. Legacy Archive Migration: Enterprise Data Modernization

Scenario: 20 years of legacy backups (nested TARs and RARs) need migration from on-premise storage to cloud object storage. Archives are deeply nested (3-4 levels) with mixed compression.

Implementation:

{
  "datasetId": "legacy-archive-inventory",
  "urlField": "backupUrls",
  "formats": {
    "extractNestedArchives": true,
    "nestedArchiveDepth": 4,
    "archiveTypes": ["tar", "rar", "zip", "7z", "gz", "bz2"]
  },
  "limits": {
    "maxDownloadBytes": 2000000000,
    "maxFileSizeBytes": 2000000000,
    "maxFiles": 0
  },
  "concurrency": 5,
  "errorHandling": {
    "mode": "lenient",
    "maxPerArchiveErrors": 100
  }
}

Workflow:

Archive URLs are read from inventory dataset
4 levels deep recursive extraction is performed
Errors are logged without stopping (lenient mode)
Per-archive success rates are shown in final summary

Benefits: Corrupted/incomplete backups are handled gracefully, folder hierarchy is preserved, audit trail is provided via dataset.

3. Security-First Filtering: Malware-Free Document Extraction

Scenario: Documents (PDFs, DOCs) need extraction from user-uploaded archives while blocking executables and scripts. Archives may come from untrusted sources.

Implementation:

{
  "url": "https://uploads.example.com/user-123/documents.zip",
  "filters": {
    "allowedExtensions": [".pdf", ".doc", ".docx", ".txt", ".md"],
    "blockedExtensions": [".exe", ".dll", ".bat", ".cmd", ".sh", ".ps1", ".msi", ".scr", ".vbs", ".js"],
    "excludedPatterns": ["__MACOSX/", ".DS_Store", "desktop.ini"]
  },
  "limits": {
    "maxDownloadBytes": 104857600,
    "maxFileSizeBytes": 524288000,
    "maxFiles": 500
  },
  "formats": {
    "extractNestedArchives": false
  },
  "errorHandling": {
    "mode": "strict"
  }
}

Workflow:

User archive is downloaded with size guard
Executables and scripts are rejected
Only documents are extracted to KV Store
Processing aborts if malicious content is detected (strict mode)

Benefits: Downstream systems are protected from malware, attack surface is reduced, audit trail is provided.

Outputs: Understanding the Dual-Storage Model

Apify's two-storage architecture is used to provide maximum flexibility:

KV Store (Key-Value Store)

Purpose: Raw binary content storage for direct file downloads

Contents:

Extracted files: Stored with archive paths as keys (e.g., reports/2025/january.pdf)
Flattened mode: Optional deterministic keys like january.pdf-a1b2c3d4 to avoid collisions
Summary record: JSON at SUMMARY key (or custom key) with run statistics
Incremental indexes: State records at INCR::{sha1(archiveUrl)} for change tracking
Output pointers: When custom names are used, OUTPUT_POINTERS contains destination metadata

Access: Apify Console → Storage → Key-Value Stores → (your store name) → Records

API endpoint: {{links.apiDefaultKeyValueStoreUrl}}/records/{KEY}

Dataset (Structured Index)

Purpose: Queryable per-file metadata for filtering, searching, and analytics

Schema:

{
  "archiveUrl": "https://example.com/data.zip#nested.tar.gz",
  "path": "reports/2025/january.csv",
  "kvKey": "reports/2025/january.csv",
  "sizeBytes": 1048576,
  "extension": ".csv",
  "status": "EXTRACTED",
  "nestedDepth": 1,
  "incrementalStatus": "changed",
  "errorCode": null,
  "errorMessage": null
}

Statuses (grouped):

Success: EXTRACTED
Skips: SKIPPED_UNCHANGED, SKIPPED_FILTERED, SKIPPED_MAX_FILES, SKIPPED_SYMLINK_DISABLED, SKIPPED_SYMLINK_TARGET_MISSING, SKIPPED_SYMLINK_UNSAFE, SKIPPED_NESTED_TOO_DEEP
Limits: SKIPPED_TOO_LARGE
Archive errors: DOWNLOAD_ERROR, ARCHIVE_ERROR, UNSUPPORTED_ARCHIVE_TYPE
File errors: ERROR (see errorCode/errorMessage)

Incremental statuses: new, changed, unchanged, or null (when incremental disabled)

Access: Apify Console → Storage → Datasets → (your dataset name) → Items

API endpoint: {{links.apiDefaultDatasetUrl}}/items

Custom Output Destinations

When custom storeName or datasetName is specified in outputOptions, a pointer row is written to help locate the data:

Dataset pointer row:

{
  "type": "pointer",
  "kvStoreName": "my-custom-store",
  "kvStoreId": "abc123",
  "kvUrl": "https://api.apify.com/v2/key-value-stores/abc123/records",
  "datasetName": "my-custom-dataset",
  "datasetId": "xyz789",
  "datasetUrl": "https://api.apify.com/v2/datasets/xyz789/items",
  "summaryKey": "SUMMARY",
  "summaryUrl": "https://api.apify.com/v2/key-value-stores/abc123/records/SUMMARY",
  "incrementalIndexPrefix": "INCR::"
}

KV Store pointer (OUTPUT_POINTERS key): Same metadata is contained in the default KV store even when custom destinations are used.

Summary Payload: Run Statistics & Webhook Format

Stored in KV Store at SUMMARY key (or custom key) and optionally POSTed to webhook URL:

{
  "startedAt": "2025-01-15T10:00:00.000Z",
  "finishedAt": "2025-01-15T10:15:32.128Z",
  "totals": {
    "archivesProcessed": 25,
    "archivesFailed": 1,
    "downloadsFailed": 0,
    "unsupportedArchives": 0,
    "archiveErrors": 1,
    "filesDiscovered": 12483,
    "filesExtracted": 11250,
    "filesSkipped": 1180,
    "filesErrored": 53,
    "skippedTooLarge": 45,
    "skippedFiltered": 320,
    "skippedUnchanged": 815,
    "skippedMaxFiles": 0,
    "skippedNestedTooDeep": 0,
    "incrementalNew": 10200,
    "incrementalChanged": 1050,
    "incrementalUnchanged": 815,
    "nestedArchivesProcessed": 78
  },
  "byExtension": {
    ".csv": { "files": 3500, "extracted": 3450 },
    ".json": { "files": 2800, "extracted": 2700 },
    ".pdf": { "files": 1950, "extracted": 1925 },
    ".xml": { "files": 1200, "extracted": 1180 },
    ".txt": { "files": 800, "extracted": 780 }
  },
  "byArchive": {
    "https://example.com/data-2025-01-15.zip": {
      "filesDiscovered": 523,
      "filesExtracted": 480,
      "filesSkipped": 38,
      "filesErrored": 5,
      "error": null
    }
  },
  "incremental": {
    "enabled": true,
    "strategy": "crc+size",
    "indexKeyPrefix": "INCR::"
  },
  "nestedArchives": {
    "enabled": true,
    "maxDepth": 2
  }
}

Webhook signature: If webhook.secret is provided, the POST includes header:

x-universal-archive-signature: HMAC_HEX

HMAC-SHA256 of request body is computed using the secret for authenticity verification.

Input Configuration: Quick Reference

Required (choose exactly one)

// Option 1: Single URL
{ "url": "https://example.com/data.zip" }

// Option 2: Multiple URLs
{ "urls": ["https://ex.com/a.zip", "https://ex.com/b.tar.gz"] }

// Option 3: Dataset source
{
  "datasetId": "abc123",
  "urlField": "archiveUrl"
}

Recommended Settings

{
  "outputMode": "both",
  "concurrency": 10,
  "incremental": {
    "enabled": true,
    "strategy": "crc+size"
  },
  "formats": {
    "extractNestedArchives": true,
    "nestedArchiveDepth": 2,
    "followSymlinks": false
  },
  "filters": {
    "blockedExtensions": [".exe", ".dll", ".bat", ".sh"]
  },
  "limits": {
    "maxDownloadBytes": 524288000,
    "maxFiles": 1000
  },
  "errorHandling": {
    "mode": "lenient"
  }
}

Advanced: High-Volume Processing

{
  "urls": [
    "https://archives.example.com/dump-01.tar.gz",
    "https://archives.example.com/dump-02.tar.gz",
    "https://archives.example.com/dump-03.tar.gz"
  ],
  "concurrency": 25,
  "limits": {
    "maxDownloadBytes": 2000000000,
    "maxFileSizeBytes": 0,
    "maxFiles": 0
  },
  "incremental": {
    "enabled": true,
    "strategy": "sizeOnly"
  },
  "httpOptions": {
    "useApifyProxy": true,
    "apifyProxyGroups": ["RESIDENTIAL"]
  }
}

Running on Apify Platform

Getting Started:

Create an Apify account (free tier available)
Find this Actor in the Apify Store

Running the Actor:

Navigate to Apify Console → Actors → (your actor)
Configure input via UI or JSON
Click "Start"
Monitor run logs in real-time

Accessing Outputs:

Go to Storage → Key-Value Stores / Datasets
Or use API endpoints from output schema

Performance & Cost Optimization

Compute Unit Usage

Base cost: Depends on archive type, nesting, and memory; CU usage can be monitored per run in Apify Console
Incremental savings: 70-90% reduction on subsequent runs
Nested archives: +20% overhead per nesting level (recursive extraction)

Optimization Strategies

Incremental mode should be enabled for recurring jobs
sizeOnly strategy can be used for archives >1 GB with frequent updates
maxFiles and maxFileSizeBytes should be set to prevent runaway costs
concurrency should be reduced if memory limits are hit (default: 10)
allowedExtensions should be used instead of blockedExtensions for targeted extraction

Memory Considerations

Default limits: Suitable for archives up to 500 MB with 1000 files
Large archives: Actor memory can be increased in Apify Console (1 GB, 2 GB, 4 GB)
Nested archives: Each nesting level requires temporary disk space (factor 2-3x archive size)

Troubleshooting

"UNSUPPORTED_ARCHIVE_TYPE" errors

Cause: Archive format not in formats.archiveTypes allowlist

Solution:

Enable debug mode: "debug": true
Check logs for "Supported formats: ..."
Add detected format to allowlist or use ["auto"]

"DOWNLOAD_ERROR: Size limit exceeded"

Cause: Archive larger than maxDownloadBytes (500 MB default)

Solution:

{
  "limits": {
    "maxDownloadBytes": 2000000000
  }
}

Incremental mode not skipping files

Cause: Archive URL changed (different domain/path/query params)

Solution: Incremental index is keyed by SHA1 of URL. Consistent URLs should be ensured or INCR:: keys manually copied between runs.

Out of memory errors

Cause: Archive too large for available Actor memory

Solutions:

Actor memory can be increased in Apify Console
concurrency can be reduced to free memory per archive
maxFiles can be set to limit extraction scope
Nested archive extraction can be disabled: "extractNestedArchives": false

Webhook not receiving POST

Cause: Webhook URL unreachable or HMAC signature validation failing

Debug:

Actor logs should be checked for webhook POST details
Webhook URL accessibility should be verified
HMAC signature should be validated: HMAC-SHA256(requestBody, secret)
Reverse proxy/firewall rules blocking Apify IPs should be checked

API Integration Examples

Trigger run via Apify API

curl -X POST "https://api.apify.com/v2/acts/{ACTOR_ID}/runs" \
  -H "Authorization: Bearer {APIFY_TOKEN}" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com/data.zip",
    "incremental": { "enabled": true }
  }'

Fetch extracted files

# Get file index
curl "https://api.apify.com/v2/datasets/{DATASET_ID}/items?format=json"

# Download specific file
curl "https://api.apify.com/v2/key-value-stores/{STORE_ID}/records/reports/2025/data.csv" \
  -o data.csv

Check run status

curl "https://api.apify.com/v2/actor-runs/{RUN_ID}" \
  -H "Authorization: Bearer {APIFY_TOKEN}"

Frequently Asked Questions

Q: Can password-protected archives be extracted?
A: Not currently supported.

Q: What's the maximum archive size?
A: Default 500 MB via maxDownloadBytes. Higher values can be configured, but Actor memory/disk should be sufficient and timeouts adjusted for very large archives.

Q: How long are extracted files stored?
A: KV Store/Dataset records are retained according to data retention policy (default: 7 days for free tier, configurable for paid plans).

Q: Can specific files be extracted without downloading the entire archive?
A: Not directly. Full archive download is required by 7-Zip for extraction. allowedExtensions or excludedPatterns can be used to filter post-download.

Q: Is streaming extraction supported?
A: No. Archives are downloaded to disk, then extracted. Streaming extraction is incompatible with nested archive detection and incremental logic.

Feature Requests & Roadmap

Feedback is actively collected to improve this Actor. Features under consideration:

Potential Future Enhancements:

Password-protected archive support (with secure credential management)
Streaming extraction for extremely large archives
Direct S3/Azure Blob storage integration (skip KV Store for huge datasets)
Archive repair/recovery for corrupted files
Custom extraction callbacks for advanced filtering logic
Multi-part archive support (.zip.001, .zip.002, etc.)
Archive metadata extraction (comments, timestamps, permissions)

Submit Your Request:

Bug reports and feature requests can be submitted via the Issues tab on this Actor's page in the Apify Console. Navigate to the Actor → Issues to view existing requests or create new ones.

Features are prioritized based on user demand and technical feasibility.

License & Support

This Actor is available on the Apify Platform.

Support Channels:

Actor Issues: Use the Issues tab on this Actor's page for bug reports and feature requests
Apify Community Forum: https://community.apify.com for general questions and discussions
Apify Support: support@apify.com for platform-related issues (Apify customers)

Keywords for Search Optimization

7-Zip extractor, recursive archive extraction, incremental unzip, nested archive processor, bulk archive downloader, RAR extractor API, archive automation, 7z batch extractor, ISO file extractor, archive change detection, TAR extractor, GZIP extractor, BZIP2 extractor, XZ extractor, automated archive processing, cloud archive extraction, Apify archive actor, unzip API, zip extractor, archive decompression, legacy archive migration, archive pipeline automation, CRC-based incremental extraction, nested ZIP extractor, recursive TAR extraction, multi-level archive processing, enterprise archive solution, archive format converter, batch unzip tool, automated file extraction, archive metadata indexing, secure archive extraction, malware-free extraction, archive filtering, compression format support, archive validation, incremental file processing, archival data extraction, backup archive extraction, data pipeline automation, archive security filtering, compute cost optimization, cloud-native extraction

ZIP Extractor

mikolabs/zip-extractor

Upload any file type or provide a URL, automatically extract archives (ZIP, RAR, TAR, 7Z), categorize files intelligently, and get structured output in milliseconds with download links. Supports flexible storage policies (permanent or expiry-based) with automatic cleanup.

mikolabs

Zip Code API

vivid_astronaut/zip-code

Fabio Suizu

Zip Extractor

ukonhattu/zip-extractor

Extracts files from ZIP archives. Input can be a URL or uploaded ZIP. Extracts contents and saves each file as a record in the Apify Key-Value Store, with sanitized filenames as keys. Ideal for automating data retrieval from compressed sources.

Daniel

News Archive Scraper

fortuitous_pirate/news-archive-scraper

Fortuitous Pirate

Zip Key-value Store

jaroslavhejlek/zip-key-value-store

Takes the ID of the key-value store, archives all their keys into a zip file, and saves them into the key-value store of the actor. For more than 1000 keys, multiple zip files are created. If their total size is bigger than the actor's available memory, it creates multiple smaller zip files.

Jaroslav Hejlek

195

File Unpacker

amzar/file-unpacker

Download, extract, and instantly access ZIP archive contents automatically.

Amzar Mohamad

Zip Download Extraction Scraper

fresh_cliff/zip-download-extraction-scraper

Download and extract zip files automatically. Extract archives, process documents, analyze logs, backup files. Batch extract text, JSON, CSV content. Real-time data extraction API.

Brennan Crawford

Zip Download and Extraction Scraper

balathon/zip

This downloads a zip file from a provided URL and extracts its contents to a specified folder in the key-value store.

Balasai Sigireddy

Zip Download Extraction Scraper

aluminum_jam/zip-download-extraction-scraper

Download ZIP files from URLs and automatically extract their contents with advanced features like retry logic, password protection, duplicate handling, and real-time progress tracking.

anuj upadhyay

5.0

Image Format Converter

programmersoham/image-format-converter

Convert images in the cloud between HEIC, WebP, JPG, PNG, and SVG. Strip EXIF metadata for privacy, resize with aspect-ratio preservation, and export via direct URLs or archives (ZIP, TAR.GZ, GZIP, 7-Zip). Ideal for HEIC conversion, SVG rasterization, and standardizing image collections.

Soham Purohit

7-Zip Recursive Archive Extractor: Enterprise-Grade Automation

7-Zip Recursive Archive Extractor: Enterprise-Grade Archive Automation

The Powerhouse Archive Solution You've Been Looking For

How It Works: The Data Flow

The Format Wall: Every Extension Supported

Primary Archive Formats

Disk Image & Container Formats

Legacy & Specialty Formats

Compressed TAR Variants

Single-File Compression

Key Advantages: The Technical Moat

1. Incremental Extraction: The Compute Cost Killer

2. Recursive Nested Archive Extraction

3. Data Cleaning & UTF-8 Normalization

4. Security-First Architecture

Use Cases: Real-World Applications

1. Automated Data Pipeline: Daily Archive Ingestion

2. Legacy Archive Migration: Enterprise Data Modernization

3. Security-First Filtering: Malware-Free Document Extraction

Outputs: Understanding the Dual-Storage Model

KV Store (Key-Value Store)

Dataset (Structured Index)

Custom Output Destinations

Summary Payload: Run Statistics & Webhook Format

Input Configuration: Quick Reference

Required (choose exactly one)

Recommended Settings

Advanced: High-Volume Processing

Running on Apify Platform

Performance & Cost Optimization

Compute Unit Usage

Optimization Strategies

Memory Considerations

Troubleshooting

"UNSUPPORTED_ARCHIVE_TYPE" errors

"DOWNLOAD_ERROR: Size limit exceeded"

Incremental mode not skipping files

Out of memory errors

Webhook not receiving POST

API Integration Examples

Trigger run via Apify API

Fetch extracted files

Check run status

Frequently Asked Questions

Feature Requests & Roadmap

License & Support

Keywords for Search Optimization

You might also like

ZIP Extractor

Zip Code API

Zip Extractor

News Archive Scraper

Zip Key-value Store

File Unpacker

Zip Download Extraction Scraper

Zip Download and Extraction Scraper

Zip Download Extraction Scraper

Image Format Converter