Zip Download And Extraction Scraper avatar

Zip Download And Extraction Scraper

Pricing

from $0.20 / 1,000 results

Go to Apify Store
Zip Download And Extraction Scraper

Zip Download And Extraction Scraper

Download a zipped file and extract it right away, no extra moves required.

Pricing

from $0.20 / 1,000 results

Rating

5.0

(1)

Developer

Sameer Pun

Sameer Pun

Maintained by Community

Actor stats

1

Bookmarked

2

Total users

1

Monthly active users

21 hours ago

Last modified

Share

Apify Actor that:

  1. Downloads direct ZIP URLs.
  2. Validates ZIP signature (PK\x03\x04) and size limit.
  3. Extracts files safely (Zip Slip protection).
  4. Stores ZIP and extracted files to Key-Value Store (unless listOnly=true).
  5. Pushes file metadata to Dataset.
  6. Writes run summary to KV key OUTPUT.

Quick start (Apify Console)

  1. Open your Actor.
  2. Go to the Input tab.
  3. Fill ZIP URLs with one or more direct .zip links.
  4. Keep defaults or adjust optional fields.
  5. Click Start.

After run:

  • Dataset items are in Storage -> Dataset.
  • Stored files are in Storage -> Key-value store.
  • Summary is in KV key: OUTPUT.

Quick start (local)

npm install
apify run

If you need a local input file:

{
"urls": [
"https://github.com/githubtraining/hellogitworld/archive/refs/heads/master.zip"
],
"timeoutSecs": 120,
"maxRetries": 3,
"overwrite": false,
"kvPrefix": "extracted/",
"maxZipSizeMB": 200,
"listOnly": false
}

Save it as storage/key_value_stores/default/INPUT.json before apify run.

Input fields

  • urls (required): Array of direct ZIP URLs.
  • timeoutSecs (default 120): HTTP timeout per request attempt.
  • maxRetries (default 3): Retry count with exponential backoff.
  • overwrite (default false): If false, duplicate keys from the same run are skipped.
  • kvPrefix (default "extracted/"): Logical prefix used for ZIP/file keys.
  • maxZipSizeMB (default 200): Reject ZIPs above this size.
  • listOnly (default false): If true, only list entries to Dataset and do not save ZIP/files to KV.

Output

Dataset item (listOnly=false)

{
"sourceUrl": "https://example.com/archive.zip",
"zipKey": "extracted!zips!archive-a1b2c3d4.zip",
"entryPath": "folder/file.txt",
"kvKey": "extracted!files!archive-a1b2c3d4!folder!file.txt",
"sizeBytes": 1234,
"sha256": "abc123...",
"mimeType": "text/plain"
}

Dataset item (listOnly=true)

{
"entryPath": "folder/file.txt",
"sizeBytes": 1234
}

Summary key

  • KV key: OUTPUT
  • Includes totals, per-URL status, and errors.

Notes

  • Apify KV keys cannot contain /, so stored keys are converted to safe format with separators like !.
  • content-type header is checked when available, but ZIP validation is based on file signature.
  • ZIP size is enforced via Content-Length when present, and streaming byte limit when not.

Troubleshooting

  • Input schema is not valid: pull latest source and ensure .actor/input_schema.json matches this repo.
  • not logged in warning in local runs is normal unless you need Apify account features.
  • If run fails, open KV key OUTPUT and check errors plus per-URL status.