Under maintenance

Pricing

from $0.01 / 1,000 results

Try for free

Go to Apify Store

Zip Extractor

Under maintenance

Try for free

Extracts files from ZIP archives. Input can be a URL or uploaded ZIP. Extracts contents and saves each file as a record in the Apify Key-Value Store, with sanitized filenames as keys. Ideal for automating data retrieval from compressed sources.

Pricing

from $0.01 / 1,000 results

Rating

0.0

(0)

Developer

Daniel

Actor stats

Bookmarked

Total users

Monthly active users

2 months ago

Last modified

.dockerignore

.git
.mise.toml
.nvim.lua
storage

# The rest is copied from https://github.com/github/gitignore/blob/main/Python.gitignore

# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class

# C extensions
*.so

# Distribution / packaging
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
share/python-wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST

# PyInstaller
#  Usually these files are written by a python script from a template
#  before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec

# Installer logs
pip-log.txt
pip-delete-this-directory.txt

# Unit test / coverage reports
htmlcov/
.tox/
.nox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*.cover
*.py,cover
.hypothesis/
.pytest_cache/
cover/

# Translations
*.mo
*.pot

# Django stuff:
*.log
local_settings.py
db.sqlite3
db.sqlite3-journal

# Flask stuff:
instance/
.webassets-cache

# Scrapy stuff:
.scrapy

# Sphinx documentation
docs/_build/

# PyBuilder
.pybuilder/
target/

# Jupyter Notebook
.ipynb_checkpoints

# IPython
profile_default/
ipython_config.py

# pyenv
#   For a library or package, you might want to ignore these files since the code is
#   intended to run in multiple environments; otherwise, check them in:
.python-version

# pdm
#   Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control.
#pdm.lock
#   pdm stores project-wide configurations in .pdm.toml, but it is recommended to not include it
#   in version control.
#   https://pdm.fming.dev/latest/usage/project/#working-with-version-control
.pdm.toml
.pdm-python
.pdm-build/

# PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm
__pypackages__/

# Celery stuff
celerybeat-schedule
celerybeat.pid

# SageMath parsed files
*.sage.py

# Environments
.env
.venv
env/
venv/
ENV/
env.bak/
venv.bak/

# Spyder project settings
.spyderproject
.spyproject

# Rope project settings
.ropeproject

# mkdocs documentation
/site

# mypy
.mypy_cache/
.dmypy.json
dmypy.json

# Pyre type checker
.pyre/

# pytype static type analyzer
.pytype/

# Cython debug symbols
cython_debug/

# PyCharm
#  JetBrains specific template is maintained in a separate JetBrains.gitignore that can
#  be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore
#  and can be added to the global gitignore or merged into this file.  For a more nuclear
#  option (not recommended) you can uncomment the following to ignore the entire idea folder.
.idea/

# Visual Studio Code
#  Ignores the folder created by VS Code when changing workspace settings, doing debugger
#  configuration, etc. Can be commented out to share Workspace Settings within a team
.vscode

# Zed editor
#  Ignores the folder created when setting Project Settings in the Zed editor. Can be commented out
#  to share Project Settings within a team
.zed

.gitignore

.mise.toml
.nvim.lua
storage

# The rest is copied from https://github.com/github/gitignore/blob/main/Python.gitignore

# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class

# C extensions
*.so

# Distribution / packaging
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
share/python-wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST

# PyInstaller
#  Usually these files are written by a python script from a template
#  before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec

# Installer logs
pip-log.txt
pip-delete-this-directory.txt

# Unit test / coverage reports
htmlcov/
.tox/
.nox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*.cover
*.py,cover
.hypothesis/
.pytest_cache/
cover/

# Translations
*.mo
*.pot

# Django stuff:
*.log
local_settings.py
db.sqlite3
db.sqlite3-journal

# Flask stuff:
instance/
.webassets-cache

# Scrapy stuff:
.scrapy

# Sphinx documentation
docs/_build/

# PyBuilder
.pybuilder/
target/

# Jupyter Notebook
.ipynb_checkpoints

# IPython
profile_default/
ipython_config.py

# pyenv
#   For a library or package, you might want to ignore these files since the code is
#   intended to run in multiple environments; otherwise, check them in:
.python-version

# pdm
#   Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control.
#pdm.lock
#   pdm stores project-wide configurations in .pdm.toml, but it is recommended to not include it
#   in version control.
#   https://pdm.fming.dev/latest/usage/project/#working-with-version-control
.pdm.toml
.pdm-python
.pdm-build/

# PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm
__pypackages__/

# Celery stuff
celerybeat-schedule
celerybeat.pid

# SageMath parsed files
*.sage.py

# Environments
.env
.venv
env/
venv/
ENV/
env.bak/
venv.bak/

# Spyder project settings
.spyderproject
.spyproject

# Rope project settings
.ropeproject

# mkdocs documentation
/site

# mypy
.mypy_cache/
.dmypy.json
dmypy.json

# Pyre type checker
.pyre/

# pytype static type analyzer
.pytype/

# Cython debug symbols
cython_debug/

# PyCharm
#  JetBrains specific template is maintained in a separate JetBrains.gitignore that can
#  be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore
#  and can be added to the global gitignore or merged into this file.  For a more nuclear
#  option (not recommended) you can uncomment the following to ignore the entire idea folder.
.idea/

# Visual Studio Code
#  Ignores the folder created by VS Code when changing workspace settings, doing debugger
#  configuration, etc. Can be commented out to share Workspace Settings within a team
.vscode

# Zed editor
#  Ignores the folder created when setting Project Settings in the Zed editor. Can be commented out
#  to share Project Settings within a team
.zed

# Added by Apify CLI
node_modules

AGENTS.md

1# Apify Actors Development Guide
2
3Important: Before you begin, fill in the `generatedBy` property in the meta section of `.actor/actor.json`. Replace it with the model you're currently using. This helps Apify monitor and improve AGENTS.md for specific LLM models.
4
5## What are Apify Actors?
6
7- Actors are serverless programs that run in the cloud. They're inspired by the UNIX philosophy - programs that do one thing well and can be easily combined to build complex systems.
8- Actors are programs packaged as Docker images that run in isolated containers
9
10## Core Concepts
11
12- Accept well-defined JSON input
13- Perform isolated tasks (web scraping, automation, data processing)
14- Produce structured JSON output to datasets and/or store data in key-value stores
15- Can run from seconds to hours or even indefinitely
16- Persist state and can be restarted
17
18## Do
19
20- accept well-defined JSON input and produce structured JSON output
21- use Apify SDK (`apify`) for code running ON Apify platform
22- validate input early with proper error handling and fail gracefully
23- use CheerioCrawler for static HTML content (10x faster than browsers)
24- use PlaywrightCrawler only for JavaScript-heavy sites and dynamic content
25- use router pattern (createCheerioRouter/createPlaywrightRouter) for complex crawls
26- implement retry strategies with exponential backoff for failed requests
27- use proper concurrency settings (HTTP: 10-50, Browser: 1-5)
28- set sensible defaults in `.actor/input_schema.json` for all optional fields
29- set up output schema in `.actor/output_schema.json`
30- clean and validate data before pushing to dataset
31- use semantic CSS selectors and fallback strategies for missing elements
32- respect robots.txt, ToS, and implement rate limiting with delays
33- check which tools (cheerio/playwright/crawlee) are installed before applying guidance
34- use `Actor.log` for logging (censors sensitive data)
35- implement readiness probe handler for standby Actors
36
37## Don't
38
39- do not rely on `Dataset.getInfo()` for final counts on Cloud platform
40- do not use browser crawlers when HTTP/Cheerio works (massive performance gains with HTTP)
41- do not hard code values that should be in input schema or environment variables
42- do not skip input validation or error handling
43- do not overload servers - use appropriate concurrency and delays
44- do not scrape prohibited content or ignore Terms of Service
45- do not store personal/sensitive data unless explicitly permitted
46- do not use deprecated options like `requestHandlerTimeoutMillis` on CheerioCrawler (v3.x)
47- do not use `additionalHttpHeaders` - use `preNavigationHooks` instead
48- do not disable standby mode (`usesStandbyMode: false`) without explicit permission
49
50## Logging
51
52- **ALWAYS use `Actor.log` for logging** - This logger contains critical security logic including censoring sensitive data (Apify tokens, API keys, credentials) to prevent accidental exposure in logs
53
54### Available Log Levels
55
56The Apify Actor logger provides the following methods for logging:
57
58- `Actor.log.debug()` - Debug level logs (detailed diagnostic information)
59- `Actor.log.info()` - Info level logs (general informational messages)
60- `Actor.log.warning()` - Warning level logs (warning messages for potentially problematic situations)
61- `Actor.log.error()` - Error level logs (error messages for failures)
62- `Actor.log.exception()` - Exception level logs (for exceptions with stack traces)
63
64**Best practices:**
65
66- Use `Actor.log.debug()` for detailed operation-level diagnostics (inside functions)
67- Use `Actor.log.info()` for general informational messages (API requests, successful operations)
68- Use `Actor.log.warning()` for potentially problematic situations (validation failures, unexpected states)
69- Use `Actor.log.error()` for actual errors and failures
70- Use `Actor.log.exception()` for caught exceptions with stack traces
71
72## Standby Mode
73
74- **NEVER disable standby mode (`usesStandbyMode: false`) in `.actor/actor.json` without explicit permission** - Actor Standby mode solves this problem by letting you have the Actor ready in the background, waiting for the incoming HTTP requests. In a sense, the Actor behaves like a real-time web server or standard API server instead of running the logic once to process everything in batch. Always keep `usesStandbyMode: true` unless there is a specific documented reason to disable it
75- **ALWAYS implement readiness probe handler for standby Actors** - Handle the `x-apify-container-server-readiness-probe` header at GET / endpoint to ensure proper Actor lifecycle management
76
77You can recognize a standby Actor by checking the `usesStandbyMode` property in `.actor/actor.json`. Only implement the readiness probe if this property is set to `true`.
78
79### Readiness Probe Implementation Example
80
81```python
82# Apify standby readiness probe
83from http.server import SimpleHTTPRequestHandler
84
85class GetHandler(SimpleHTTPRequestHandler):
86    def do_GET(self):
87        # Handle Apify standby readiness probe
88        if 'x-apify-container-server-readiness-probe' in self.headers:
89            self.send_response(200)
90            self.end_headers()
91            self.wfile.write(b'Readiness probe OK')
92            return
93
94        self.send_response(200)
95        self.end_headers()
96        self.wfile.write(b'Actor is ready')
97```
98
99Key points:
100
101- Detect the `x-apify-container-server-readiness-probe` header in incoming requests
102- Respond with HTTP 200 status code for both readiness probe and normal requests
103- This enables proper Actor lifecycle management in standby mode
104
105## Commands
106
107```bash
108# Local development
109apify run                              # Run Actor locally
110
111# Authentication & deployment
112apify login                            # Authenticate account
113apify push                             # Deploy to Apify platform
114
115# Help
116apify help                             # List all commands
117```
118
119## Safety and Permissions
120
121Allowed without prompt:
122
123- read files with `Actor.get_value()`
124- push data with `Actor.push_data()`
125- set values with `Actor.set_value()`
126- enqueue requests to RequestQueue
127- run locally with `apify run`
128
129Ask first:
130
131- npm/pip package installations
132- apify push (deployment to cloud)
133- proxy configuration changes (requires paid plan)
134- Dockerfile changes affecting builds
135- deleting datasets or key-value stores
136
137## Project Structure
138
139.actor/
140├── actor.json # Actor config: name, version, env vars, runtime settings
141├── input_schema.json # Input validation & Console form definition
142└── output_schema.json # Specifies where an Actor stores its output
143src/
144└── main.js # Actor entry point and orchestrator
145storage/ # Local storage (mirrors Cloud during development)
146├── datasets/ # Output items (JSON objects)
147├── key_value_stores/ # Files, config, INPUT
148└── request_queues/ # Pending crawl requests
149Dockerfile # Container image definition
150AGENTS.md # AI agent instructions (this file)
151
152## Actor Input Schema
153
154The input schema defines the input parameters for an Actor. It's a JSON object comprising various field types supported by the Apify platform.
155
156### Structure
157
158```json
159{
160    "title": "<INPUT-SCHEMA-TITLE>",
161    "type": "object",
162    "schemaVersion": 1,
163    "properties": {
164        /* define input fields here */
165    },
166    "required": []
167}
168```
169
170### Example
171
172```json
173{
174    "title": "E-commerce Product Scraper Input",
175    "type": "object",
176    "schemaVersion": 1,
177    "properties": {
178        "startUrls": {
179            "title": "Start URLs",
180            "type": "array",
181            "description": "URLs to start scraping from (category pages or product pages)",
182            "editor": "requestListSources",
183            "default": [{ "url": "https://example.com/category" }],
184            "prefill": [{ "url": "https://example.com/category" }]
185        },
186        "followVariants": {
187            "title": "Follow Product Variants",
188            "type": "boolean",
189            "description": "Whether to scrape product variants (different colors, sizes)",
190            "default": true
191        },
192        "maxRequestsPerCrawl": {
193            "title": "Max Requests per Crawl",
194            "type": "integer",
195            "description": "Maximum number of pages to scrape (0 = unlimited)",
196            "default": 1000,
197            "minimum": 0
198        },
199        "proxyConfiguration": {
200            "title": "Proxy Configuration",
201            "type": "object",
202            "description": "Proxy settings for anti-bot protection",
203            "editor": "proxy",
204            "default": { "useApifyProxy": false }
205        },
206        "locale": {
207            "title": "Locale",
208            "type": "string",
209            "description": "Language/country code for localized content",
210            "default": "cs",
211            "enum": ["cs", "en", "de", "sk"],
212            "enumTitles": ["Czech", "English", "German", "Slovak"]
213        }
214    },
215    "required": ["startUrls"]
216}
217```
218
219## Actor Output Schema
220
221The Actor output schema builds upon the schemas for the dataset and key-value store. It specifies where an Actor stores its output and defines templates for accessing that output. Apify Console uses these output definitions to display run results.
222
223### Structure
224
225```json
226{
227    "actorOutputSchemaVersion": 1,
228    "title": "<OUTPUT-SCHEMA-TITLE>",
229    "properties": {
230        /* define your outputs here */
231    }
232}
233```
234
235### Example
236
237```json
238{
239    "actorOutputSchemaVersion": 1,
240    "title": "Output schema of the files scraper",
241    "properties": {
242        "files": {
243            "type": "string",
244            "title": "Files",
245            "template": "{{links.apiDefaultKeyValueStoreUrl}}/keys"
246        },
247        "dataset": {
248            "type": "string",
249            "title": "Dataset",
250            "template": "{{links.apiDefaultDatasetUrl}}/items"
251        }
252    }
253}
254```
255
256### Output Schema Template Variables
257
258- `links` (object) - Contains quick links to most commonly used URLs
259- `links.publicRunUrl` (string) - Public run url in format `https://console.apify.com/view/runs/:runId`
260- `links.consoleRunUrl` (string) - Console run url in format `https://console.apify.com/actors/runs/:runId`
261- `links.apiRunUrl` (string) - API run url in format `https://api.apify.com/v2/actor-runs/:runId`
262- `links.apiDefaultDatasetUrl` (string) - API url of default dataset in format `https://api.apify.com/v2/datasets/:defaultDatasetId`
263- `links.apiDefaultKeyValueStoreUrl` (string) - API url of default key-value store in format `https://api.apify.com/v2/key-value-stores/:defaultKeyValueStoreId`
264- `links.containerRunUrl` (string) - URL of a webserver running inside the run in format `https://<containerId>.runs.apify.net/`
265- `run` (object) - Contains information about the run same as it is returned from the `GET Run` API endpoint
266- `run.defaultDatasetId` (string) - ID of the default dataset
267- `run.defaultKeyValueStoreId` (string) - ID of the default key-value store
268
269## Dataset Schema Specification
270
271The dataset schema defines how your Actor's output data is structured, transformed, and displayed in the Output tab in the Apify Console.
272
273### Example
274
275Consider an example Actor that calls `Actor.pushData()` to store data into dataset:
276
277```python
278# Dataset push example (Python)
279import asyncio
280from datetime import datetime
281from apify import Actor
282
283async def main():
284    await Actor.init()
285
286    # Actor code
287    await Actor.push_data({
288        'numericField': 10,
289        'pictureUrl': 'https://www.google.com/images/branding/googlelogo/2x/googlelogo_color_92x30dp.png',
290        'linkUrl': 'https://google.com',
291        'textField': 'Google',
292        'booleanField': True,
293        'dateField': datetime.now().isoformat(),
294        'arrayField': ['#hello', '#world'],
295        'objectField': {},
296    })
297
298    # Exit successfully
299    await Actor.exit()
300
301if __name__ == '__main__':
302    asyncio.run(main())
303```
304
305To set up the Actor's output tab UI, reference a dataset schema file in `.actor/actor.json`:
306
307```json
308{
309    "actorSpecification": 1,
310    "name": "book-library-scraper",
311    "title": "Book Library Scraper",
312    "version": "1.0.0",
313    "storages": {
314        "dataset": "./dataset_schema.json"
315    }
316}
317```
318
319Then create the dataset schema in `.actor/dataset_schema.json`:
320
321```json
322{
323    "actorSpecification": 1,
324    "fields": {},
325    "views": {
326        "overview": {
327            "title": "Overview",
328            "transformation": {
329                "fields": [
330                    "pictureUrl",
331                    "linkUrl",
332                    "textField",
333                    "booleanField",
334                    "arrayField",
335                    "objectField",
336                    "dateField",
337                    "numericField"
338                ]
339            },
340            "display": {
341                "component": "table",
342                "properties": {
343                    "pictureUrl": {
344                        "label": "Image",
345                        "format": "image"
346                    },
347                    "linkUrl": {
348                        "label": "Link",
349                        "format": "link"
350                    },
351                    "textField": {
352                        "label": "Text",
353                        "format": "text"
354                    },
355                    "booleanField": {
356                        "label": "Boolean",
357                        "format": "boolean"
358                    },
359                    "arrayField": {
360                        "label": "Array",
361                        "format": "array"
362                    },
363                    "objectField": {
364                        "label": "Object",
365                        "format": "object"
366                    },
367                    "dateField": {
368                        "label": "Date",
369                        "format": "date"
370                    },
371                    "numericField": {
372                        "label": "Number",
373                        "format": "number"
374                    }
375                }
376            }
377        }
378    }
379}
380```
381
382### Structure
383
384```json
385{
386    "actorSpecification": 1,
387    "fields": {},
388    "views": {
389        "<VIEW_NAME>": {
390            "title": "string (required)",
391            "description": "string (optional)",
392            "transformation": {
393                "fields": ["string (required)"],
394                "unwind": ["string (optional)"],
395                "flatten": ["string (optional)"],
396                "omit": ["string (optional)"],
397                "limit": "integer (optional)",
398                "desc": "boolean (optional)"
399            },
400            "display": {
401                "component": "table (required)",
402                "properties": {
403                    "<FIELD_NAME>": {
404                        "label": "string (optional)",
405                        "format": "text|number|date|link|boolean|image|array|object (optional)"
406                    }
407                }
408            }
409        }
410    }
411}
412```
413
414**Dataset Schema Properties:**
415
416- `actorSpecification` (integer, required) - Specifies the version of dataset schema structure document (currently only version 1)
417- `fields` (JSONSchema object, required) - Schema of one dataset object (use JsonSchema Draft 2020-12 or compatible)
418- `views` (DatasetView object, required) - Object with API and UI views description
419
420**DatasetView Properties:**
421
422- `title` (string, required) - Visible in UI Output tab and API
423- `description` (string, optional) - Only available in API response
424- `transformation` (ViewTransformation object, required) - Data transformation applied when loading from Dataset API
425- `display` (ViewDisplay object, required) - Output tab UI visualization definition
426
427**ViewTransformation Properties:**
428
429- `fields` (string[], required) - Fields to present in output (order matches column order)
430- `unwind` (string[], optional) - Deconstructs nested children into parent object
431- `flatten` (string[], optional) - Transforms nested object into flat structure
432- `omit` (string[], optional) - Removes specified fields from output
433- `limit` (integer, optional) - Maximum number of results (default: all)
434- `desc` (boolean, optional) - Sort order (true = newest first)
435
436**ViewDisplay Properties:**
437
438- `component` (string, required) - Only `table` is available
439- `properties` (Object, optional) - Keys matching `transformation.fields` with ViewDisplayProperty values
440
441**ViewDisplayProperty Properties:**
442
443- `label` (string, optional) - Table column header
444- `format` (string, optional) - One of: `text`, `number`, `date`, `link`, `boolean`, `image`, `array`, `object`
445
446## Key-Value Store Schema Specification
447
448The key-value store schema organizes keys into logical groups called collections for easier data management.
449
450### Example
451
452Consider an example Actor that calls `Actor.setValue()` to save records into the key-value store:
453
454```python
455# Key-Value Store set example (Python)
456import asyncio
457from apify import Actor
458
459async def main():
460    await Actor.init()
461
462    # Actor code
463    await Actor.set_value('document-1', 'my text data', content_type='text/plain')
464
465    image_id = '123'          # example placeholder
466    image_buffer = b'...'     # bytes buffer with image data
467    await Actor.set_value(f'image-{image_id}', image_buffer, content_type='image/jpeg')
468
469    # Exit successfully
470    await Actor.exit()
471
472if __name__ == '__main__':
473    asyncio.run(main())
474```
475
476To configure the key-value store schema, reference a schema file in `.actor/actor.json`:
477
478```json
479{
480    "actorSpecification": 1,
481    "name": "data-collector",
482    "title": "Data Collector",
483    "version": "1.0.0",
484    "storages": {
485        "keyValueStore": "./key_value_store_schema.json"
486    }
487}
488```
489
490Then create the key-value store schema in `.actor/key_value_store_schema.json`:
491
492```json
493{
494    "actorKeyValueStoreSchemaVersion": 1,
495    "title": "Key-Value Store Schema",
496    "collections": {
497        "documents": {
498            "title": "Documents",
499            "description": "Text documents stored by the Actor",
500            "keyPrefix": "document-"
501        },
502        "images": {
503            "title": "Images",
504            "description": "Images stored by the Actor",
505            "keyPrefix": "image-",
506            "contentTypes": ["image/jpeg"]
507        }
508    }
509}
510```
511
512### Structure
513
514```json
515{
516    "actorKeyValueStoreSchemaVersion": 1,
517    "title": "string (required)",
518    "description": "string (optional)",
519    "collections": {
520        "<COLLECTION_NAME>": {
521            "title": "string (required)",
522            "description": "string (optional)",
523            "key": "string (conditional - use key OR keyPrefix)",
524            "keyPrefix": "string (conditional - use key OR keyPrefix)",
525            "contentTypes": ["string (optional)"],
526            "jsonSchema": "object (optional)"
527        }
528    }
529}
530```
531
532**Key-Value Store Schema Properties:**
533
534- `actorKeyValueStoreSchemaVersion` (integer, required) - Version of key-value store schema structure document (currently only version 1)
535- `title` (string, required) - Title of the schema
536- `description` (string, optional) - Description of the schema
537- `collections` (Object, required) - Object where each key is a collection ID and value is a Collection object
538
539**Collection Properties:**
540
541- `title` (string, required) - Collection title shown in UI tabs
542- `description` (string, optional) - Description appearing in UI tooltips
543- `key` (string, conditional\*) - Single specific key for this collection
544- `keyPrefix` (string, conditional\*) - Prefix for keys included in this collection
545- `contentTypes` (string[], optional) - Allowed content types for validation
546- `jsonSchema` (object, optional) - JSON Schema Draft 07 format for `application/json` content type validation
547
548\*Either `key` or `keyPrefix` must be specified for each collection, but not both.
549
550## Apify MCP Tools
551
552If MCP server is configured, use these tools for documentation:
553
554- `search-apify-docs` - Search documentation
555- `fetch-apify-docs` - Get full doc pages
556
557Otherwise, reference: `@https://mcp.apify.com/`
558
559## Resources
560
561- [docs.apify.com/llms.txt](https://docs.apify.com/llms.txt) - Quick reference
562- [docs.apify.com/llms-full.txt](https://docs.apify.com/llms-full.txt) - Complete docs
563- [crawlee.dev](https://crawlee.dev) - Crawlee documentation
564- [whitepaper.actor](https://raw.githubusercontent.com/apify/actor-whitepaper/refs/heads/master/README.md) - Complete Actor specification

Dockerfile

# First, specify the base Docker image.
# You can see the Docker images from Apify at https://hub.docker.com/r/apify/.
# You can also use any other image from Docker Hub.
FROM apify/actor-python:3.13

USER myuser

# Second, copy just requirements.txt into the Actor image,
# since it should be the only file that affects the dependency install in the next step,
# in order to speed up the build
COPY --chown=myuser:myuser requirements.txt ./

# Install the packages specified in requirements.txt,
# Print the installed Python version, pip version
# and all installed packages with their versions for debugging
RUN echo "Python version:" \
 && python --version \
 && echo "Pip version:" \
 && pip --version \
 && echo "Installing dependencies:" \
 && pip install -r requirements.txt \
 && echo "All installed Python packages:" \
 && pip freeze

# Next, copy the remaining files and directories with the source code.
# Since we do this after installing the dependencies, quick build will be really fast
# for most source file changes.
COPY --chown=myuser:myuser . ./

# Use compileall to ensure the runnability of the Actor Python code.
RUN python3 -m compileall -q src/

# Specify how to launch the source code of your Actor.
# By default, the "python3 -m ." command is run
CMD ["python3", "-m", "src"]

example_input.json

{
  "zipFileUrl": "https://www.learningcontainer.com/wp-content/uploads/2020/05/sample-zip-file.zip"
}

requirements.txt

1# Feel free to add your Python dependencies below. For formatting guidelines, see:
2# https://pip.pypa.io/en/latest/reference/requirements-file-format/
3
4apify < 4.0.0
5requests
6httpx

.actor/actor.json

{
	"actorSpecification": 1,
	"name": "zip-extractor",
	"title": "Zip Extractor",
	"description": "An Apify actor that extracts files from a ZIP archive.",
	"version": "1.0",
	"buildTag": "latest",
	"dockerfile": "../Dockerfile",
	"input": {
        "schemaVersion": 1,
		"title": "Zip Extractor Input",
		"type": "object",
		"properties": {
			"zipFileUrl": {
				"title": "ZIP file URL or upload",
				"type": "string",
				"description": "The URL of the ZIP file to extract, or upload a file directly.",
				"editor": "fileupload"
			}
		},
		"required": [
			"zipFileUrl"
		]
	},
	"storages": {
		"keyValueStore": {
			"actorKeyValueStoreSchemaVersion": 1,
			"title": "Extracted Files",
			"description": "This key-value store contains the files extracted from the ZIP archive.",
			"collections": {
				"extracted-files": {
					"title": "Extracted Files",
					"description": "All files extracted from the uploaded ZIP archive.",
					"keyPrefix": "extracted-"
				}
			}
		}
	}
}

src/init.py

src/main.py

1import asyncio
2
3from .main import main
4
5# Execute the Actor entry point.
6asyncio.run(main())

src/main.py

1"""Module defines the main entry point for the Apify Actor.
2
3Feel free to modify this file to suit your specific needs.
4
5To build Apify Actors, utilize the Apify SDK toolkit, read more at the official documentation:
6https://docs.apify.com/sdk/python
7"""
8
9from __future__ import annotations
10import io
11import re
12import zipfile
13
14import httpx
15from apify import Actor
16
17
18def sanitize_key(key: str) -> str:
19    """Sanitize the key to be used in the key-value store."""
20    # Replace slashes with hyphens
21    key = key.replace('/', '-')
22    # Remove any characters that are not alphanumeric, dots, hyphens, or underscores
23    key = re.sub(r'[^a-zA-Z0-9._-]', '', key)
24    return key
25
26
27async def main() -> None:
28    """Define a main entry point for the Apify Actor.
29
30    This coroutine is executed using `asyncio.run()`, so it must remain an asynchronous function for proper execution.
31    Asynchronous execution is required for communication with Apify platform, and it also enhances performance in
32    the field of web scraping significantly.
33    """
34    async with Actor:
35        Actor.log.info('Getting input...')
36        actor_input = await Actor.get_input() or {}
37        zip_file_url = actor_input.get('zipFileUrl')
38
39        if not zip_file_url:
40            Actor.log.error('Missing "zipFileUrl" in input!')
41            await Actor.fail()
42            return
43
44        Actor.log.info(f'Downloading ZIP file from: {zip_file_url}')
45        try:
46            async with httpx.AsyncClient() as client:
47                response = await client.get(zip_file_url, timeout=60)
48                response.raise_for_status()
49        except httpx.HTTPError as e:
50            Actor.log.error(f'Failed to download ZIP file: {e}')
51            await Actor.fail()
52            return
53
54        Actor.log.info('Extracting files from the ZIP archive...')
55        try:
56            with zipfile.ZipFile(io.BytesIO(response.content)) as zf:
57                kvs = await Actor.open_key_value_store()
58                for member in zf.infolist():
59                    if not member.is_dir():
60                        file_content = zf.read(member.filename)
61                        sanitized_key = sanitize_key(member.filename)
62                        await kvs.set_value(f'extracted-{sanitized_key}', file_content)
63                        Actor.log.info(f'Extracted and saved file: {member.filename} as extracted-{sanitized_key}')
64        except zipfile.BadZipFile as e:
65            Actor.log.error(f'The downloaded file is not a valid ZIP archive: {e}')
66            await Actor.fail()

src/py.typed

Zip Download and Extraction Scraper

balathon/zip

This downloads a zip file from a provided URL and extracts its contents to a specified folder in the key-value store.

Balasai Sigireddy

Zip Key-value Store

jaroslavhejlek/zip-key-value-store

Takes the ID of the key-value store, archives all their keys into a zip file, and saves them into the key-value store of the actor. For more than 1000 keys, multiple zip files are created. If their total size is bigger than the actor's available memory, it creates multiple smaller zip files.

Jaroslav Hejlek

194

Zip Download of KV Store

useful-tools/downloadKvStoreZip

Creates a zip file from all items in the key-value store, zips them, and downloads them in a unique file. On input, you can specify the number of the files or keep it null to download them all. The key-value store must be under your Apify account.

Useful tools

122

Zip Code API

vivid_astronaut/zip-code

Fabio Suizu

Zip Download Extraction Scraper

fresh_cliff/zip-download-extraction-scraper

Download and extract zip files automatically. Extract archives, process documents, analyze logs, backup files. Batch extract text, JSON, CSV content. Real-time data extraction API.

Brennan Crawford

Zip Download Extraction Scraper

aluminum_jam/zip-download-extraction-scraper

Download ZIP files from URLs and automatically extract their contents with advanced features like retry logic, password protection, duplicate handling, and real-time progress tracking.

anuj upadhyay

5.0

ZIP Extractor

mikolabs/zip-extractor

Upload any file type or provide a URL, automatically extract archives (ZIP, RAR, TAR, 7Z), categorize files intelligently, and get structured output in milliseconds with download links. Supports flexible storage policies (permanent or expiry-based) with automatic cleanup.

mikolabs

Zip Code Lookup

consummate_mandala/zip-code-lookup

Donny Nguyen

File Unpacker

amzar/file-unpacker

Download, extract, and instantly access ZIP archive contents automatically.

Amzar Mohamad

Average Rent and Home Price per Zip Code

trellis-tech/average-rent-and-home-price-per-zip-code

This actor collects and calculates the average rent and home price for each zip code. It extracts data from real estate listings, processes the information, and provides a summary of average values by zip code for easy analysis and comparison.

Trellis Tech

5.0

Zip Extractor

.dockerignore

.gitignore

AGENTS.md

Dockerfile

example_input.json

requirements.txt

.actor/actor.json

src/__init__.py

src/__main__.py

src/main.py

src/py.typed

You might also like

Zip Download and Extraction Scraper

Zip Key-value Store

Zip Download of KV Store

Zip Code API

Zip Download Extraction Scraper

Zip Download Extraction Scraper

ZIP Extractor

Zip Code Lookup

File Unpacker

Average Rent and Home Price per Zip Code

Related articles

.dockerignore

.gitignore

AGENTS.md

Dockerfile

example_input.json

requirements.txt

.actor/actor.json

src/__init__.py

src/__main__.py

src/main.py

src/py.typed

src/init.py

src/main.py

src/init.py

src/main.py