Under maintenance

Pricing

from $200.01 / 1,000 results

Try for free

Go to Apify Store

Instagram Post Scraper – Fast, Proxy-Based, No Login

Under maintenance

Try for free

Scrape public Instagram posts from any profile. Extract captions, likes, comments, media URLs, and timestamps. Supports Residential, Datacenter, and custom proxies. No login required.

Pricing

from $200.01 / 1,000 results

Rating

5.0

(1)

Developer

koushik Biswas

Actor stats

Bookmarked

Total users

Monthly active users

2 months ago

Last modified

.actor/actor.json

{
    "actorSpecification": 1,
    "name": "my-actor",
    "title": "Project Playwright Crawler JavaScript",
    "description": "Crawlee and Playwright project in JavaScript.",
    "version": "0.0",
    "meta": {
        "templateId": "js-crawlee-playwright-chrome",
        "generatedBy": "<FILL-IN-MODEL>"
    },
    "input": "./input_schema.json",
    "output": "./output_schema.json",
    "storages": {
        "dataset": "./dataset_schema.json"
    },
    "dockerfile": "../Dockerfile"
}

.actor/dataset_schema.json

{
  "actorSpecification": 1,
  "fields": {},
  "views": {
    "overview": {
      "title": "Instagram Posts",
      "transformation": {
        "fields": [
          "type",
          "contentUrl",
          "engagement.likes",
          "engagement.comments",
          "engagement.views",
          "media.downloadUrl",
          "timestamp"
        ]
      },
      "display": {
        "component": "table",
        "properties": {
          "type": {
            "label": "Type"
          },
          "contentUrl": {
            "label": "Instagram URL",
            "format": "link"
          },
          "engagement.likes": {
            "label": "Likes"
          },
          "engagement.comments": {
            "label": "Comments"
          },
          "engagement.views": {
            "label": "Views"
          },
          "media.downloadUrl": {
            "label": "Download Media",
            "format": "link"
          },
          "timestamp": {
            "label": "Published At",
            "format": "datetime"
          }
        }
      }
    }
  }
}

.actor/input_schema.json

{
  "title": "Instagram Post Scraper",
  "type": "object",
  "schemaVersion": 1,
  "properties": {
    "startUrls": {
      "title": "Instagram Profile URLs",
      "type": "array",
      "description": "Instagram profile URLs to scrape posts from.",
      "editor": "requestListSources",
      "prefill": [
        { "url": "https://www.instagram.com/nike/" }
      ]
    },
    "proxyType": {
      "title": "Proxy Type",
      "type": "string",
      "description": "Choose which proxy type to use for scraping.",
      "default": "datacenter",
      "enum": ["datacenter", "residential", "custom"]
    },
    "customProxyUrls": {
      "title": "Custom Proxy URLs",
      "type": "array",
      "description": "List of custom proxy URLs. Used only if proxyType is set to custom.",
      "editor": "stringList",
      "prefill": []
    },
    "maxPosts": {
      "title": "Maximum Posts",
      "type": "integer",
      "description": "Maximum number of Instagram posts to collect per profile.",
      "default": 30
    }
  }
}

.actor/output_schema.json

{
  "actorOutputSchemaVersion": 1,
  "title": "Instagram Post Scraper Output",
  "description": "Direct access to scraped Instagram posts dataset.",
  "properties": {
    "dataset": {
      "type": "string",
      "title": "Scraped Posts Dataset",
      "template": "{{links.apiDefaultDatasetUrl}}/items?view=overview"
    }
  }
}

src/main.js

1/**
2 * This template is a production ready boilerplate for developing with `PlaywrightCrawler`.
3 * Use this to bootstrap your projects using the most up-to-date code.
4 */
5
6// For more information, see https://crawlee.dev
7import { PlaywrightCrawler } from '@crawlee/playwright';
8// For more information, see https://docs.apify.com/sdk/js
9import { Actor } from 'apify';
10
11// ESM import
12import { router } from './routes.js';
13
14// Initialize the Apify SDK
15await Actor.init();
16
17/**
18 * READ INPUT ONCE
19 */
20const input = (await Actor.getInput()) ?? {};
21
22const {
23    startUrls = ['https://apify.com'],
24    proxyType = 'datacenter',
25    customProxyUrls = [],
26} = input;
27
28/**
29 * PROXY CONFIGURATION (BEFORE CRAWLER)
30 */
31/**
32 * PROXY CONFIGURATION (FULLY ACCURATE)
33 */
34let proxyConfiguration;
35
36switch (proxyType) {
37    case 'residential':
38        proxyConfiguration = await Actor.createProxyConfiguration({
39            groups: ['RESIDENTIAL'],
40            checkAccess: true,
41        });
42        break;
43
44    case 'datacenter':
45        proxyConfiguration = await Actor.createProxyConfiguration({
46            groups: ['SHARED'],
47            checkAccess: true,
48        });
49        break;
50
51    case 'buyproxies':
52        proxyConfiguration = await Actor.createProxyConfiguration({
53            groups: ['BUYPROXIES94952'],
54            checkAccess: true,
55        });
56        break;
57
58    case 'custom':
59        proxyConfiguration = await Actor.createProxyConfiguration({
60            proxyUrls: customProxyUrls,
61            checkAccess: true,
62        });
63        break;
64
65    default:
66        // Safe fallback
67        proxyConfiguration = await Actor.createProxyConfiguration({
68            groups: ['RESIDENTIAL'],
69            checkAccess: true,
70        });
71}
72
73/**
74 * CREATE CRAWLER
75 */
76const crawler = new PlaywrightCrawler({
77    proxyConfiguration,
78    requestHandler: router,
79    launchContext: {
80        launchOptions: {
81            args: [
82                '--disable-gpu', // Docker stability
83            ],
84        },
85    },
86});
87
88/**
89 * RUN
90 */
91await crawler.run(startUrls);
92
93// Exit successfully
94await Actor.exit();

src/routes.js

1import { createPlaywrightRouter, Dataset } from '@crawlee/playwright';
2import { Actor } from 'apify';
3
4export const router = createPlaywrightRouter();
5
6router.addDefaultHandler(async ({ page, request, log }) => {
7    const input = await Actor.getInput();
8    const { cookies, maxPosts = 30 } = input;
9
10    if (!cookies) {
11        throw new Error('Instagram cookies are required.');
12    }
13
14    /* ----------------------------------
15       1. APPLY LOGIN COOKIES (EARLY)
16    ---------------------------------- */
17    const parsedCookies = cookies.split(';').map(c => {
18        const [name, ...rest] = c.trim().split('=');
19        return {
20            name,
21            value: rest.join('='),
22            domain: '.instagram.com',
23            path: '/',
24        };
25    });
26
27    await page.context().addCookies(parsedCookies);
28
29    /* ----------------------------------
30       2. MOBILE HEADERS + VIEWPORT
31    ---------------------------------- */
32    await page.setViewportSize({ width: 390, height: 844 });
33    await page.setExtraHTTPHeaders({
34        'user-agent':
35            'Mozilla/5.0 (iPhone; CPU iPhone OS 17_0 like Mac OS X) ' +
36            'AppleWebKit/605.1.15 (KHTML, like Gecko) Mobile',
37        'accept-language': 'en-US,en;q=0.9',
38        'x-ig-app-id': '936619743392459',
39    });
40
41    /* ----------------------------------
42       3. CLEAN PROFILE URL
43    ---------------------------------- */
44    const u = new URL(request.url);
45    u.search = '';
46    const profileUrl = u.toString();
47
48    log.info('Opening Instagram profile (logged-in)', { profileUrl });
49
50    const posts = new Map();
51
52    /* ----------------------------------
53       4. CAPTURE ALL FEED RESPONSES
54    ---------------------------------- */
55    page.on('response', async (response) => {
56        const resUrl = response.url();
57
58        if (
59            !resUrl.includes('/api/v1/feed/user') &&
60            !resUrl.includes('/api/v1/feed/reels')
61        ) return;
62
63        try {
64            const json = await response.json();
65            const items = json?.items ?? json?.reels_media?.[0]?.items ?? [];
66
67            for (const item of items) {
68                if (!item?.id || posts.has(item.id)) continue;
69
70                const isVideo = item.media_type === 2;
71                const caption = item.caption?.text ?? '';
72
73                const imageUrl =
74                    item.image_versions2?.candidates?.[0]?.url ?? null;
75
76                const videoUrl =
77                    item.video_versions?.[0]?.url ?? null;
78
79                posts.set(item.id, {
80                    postId: item.id,
81                    shortcode: item.code,
82                    type: isVideo ? 'video' : 'image',
83
84                    contentUrl: isVideo
85                        ? `https://www.instagram.com/reel/${item.code}/`
86                        : `https://www.instagram.com/p/${item.code}/`,
87
88                    caption,
89
90                    hashtags:
91                        caption.match(/#([\w]+)/g)?.map(t => t.slice(1)) ?? [],
92
93                    mentions:
94                        caption.match(/@([a-zA-Z0-9_.]+)/g)?.map(m => m.slice(1)) ?? [],
95
96                    taggedUsers:
97                        item.usertags?.in?.map(u => u.user.username) ?? [],
98
99                    engagement: {
100                        likes: item.like_count ?? 0,
101                        comments: item.comment_count ?? 0,
102                        views: isVideo ? item.view_count ?? null : null,
103                    },
104
105                    media: {
106                        imageUrl,
107                        videoUrl,
108                        downloadUrl: isVideo ? videoUrl : imageUrl,
109                        width: item.original_width ?? null,
110                        height: item.original_height ?? null,
111                        durationSeconds: isVideo ? item.video_duration ?? null : null,
112                    },
113
114                    timestamp: new Date(item.taken_at * 1000).toISOString(),
115                });
116
117                if (posts.size >= maxPosts) break;
118            }
119        } catch (e) {
120            log.debug('Failed to parse Instagram feed JSON');
121        }
122    });
123
124    /* ----------------------------------
125       5. NAVIGATE + SCROLL
126    ---------------------------------- */
127    await page.goto(profileUrl, {
128        waitUntil: 'domcontentloaded',
129        timeout: 60000,
130    });
131
132    await page.waitForTimeout(6000);
133
134    for (let i = 0; i < 6 && posts.size < maxPosts; i++) {
135        await page.mouse.wheel(0, 5000);
136        await page.waitForTimeout(3000);
137    }
138
139    /* ----------------------------------
140       6. SAVE DATA
141    ---------------------------------- */
142    log.info(`Collected ${posts.size} posts`);
143
144    for (const post of posts.values()) {
145        await Dataset.pushData(post);
146    }
147});

.dockerignore

# configurations
.idea
.vscode
.zed

# crawlee and apify storage folders
apify_storage
crawlee_storage
storage

# installed files
node_modules

# git folder
.git

.editorconfig

root = true

[*]
indent_style = space
indent_size = 4
charset = utf-8
trim_trailing_whitespace = true
insert_final_newline = true
end_of_line = lf
quote_type = single

.gitignore

# This file tells Git which files shouldn't be added to source control

.DS_Store
.idea
.vscode
.zed
dist
node_modules
apify_storage
storage

.prettierrc

{
    "printWidth": 120,
    "tabWidth": 4,
    "singleQuote": true
}

AGENTS.md

1# Apify Actors Development Guide
2
3Important: Before you begin, fill in the `generatedBy` property in the meta section of `.actor/actor.json`. Replace it with the model you're currently using. This helps Apify monitor and improve AGENTS.md for specific LLM models.
4
5## What are Apify Actors?
6
7- Actors are serverless programs that run in the cloud. They're inspired by the UNIX philosophy - programs that do one thing well and can be easily combined to build complex systems.
8- Actors are programs packaged as Docker images that run in isolated containers
9
10## Core Concepts
11
12- Accept well-defined JSON input
13- Perform isolated tasks (web scraping, automation, data processing)
14- Produce structured JSON output to datasets and/or store data in key-value stores
15- Can run from seconds to hours or even indefinitely
16- Persist state and can be restarted
17
18## Do
19
20- accept well-defined JSON input and produce structured JSON output
21- use Apify SDK (`apify`) for code running ON Apify platform
22- validate input early with proper error handling and fail gracefully
23- use CheerioCrawler for static HTML content (10x faster than browsers)
24- use PlaywrightCrawler only for JavaScript-heavy sites and dynamic content
25- use router pattern (createCheerioRouter/createPlaywrightRouter) for complex crawls
26- implement retry strategies with exponential backoff for failed requests
27- use proper concurrency settings (HTTP: 10-50, Browser: 1-5)
28- set sensible defaults in `.actor/input_schema.json` for all optional fields
29- set up output schema in `.actor/output_schema.json`
30- clean and validate data before pushing to dataset
31- use semantic CSS selectors and fallback strategies for missing elements
32- respect robots.txt, ToS, and implement rate limiting with delays
33- check which tools (cheerio/playwright/crawlee) are installed before applying guidance
34- use `apify/log` package for logging (censors sensitive data)
35- implement readiness probe handler for standby Actors
36
37## Don't
38
39- do not rely on `Dataset.getInfo()` for final counts on Cloud platform
40- do not use browser crawlers when HTTP/Cheerio works (massive performance gains with HTTP)
41- do not hard code values that should be in input schema or environment variables
42- do not skip input validation or error handling
43- do not overload servers - use appropriate concurrency and delays
44- do not scrape prohibited content or ignore Terms of Service
45- do not store personal/sensitive data unless explicitly permitted
46- do not use deprecated options like `requestHandlerTimeoutMillis` on CheerioCrawler (v3.x)
47- do not use `additionalHttpHeaders` - use `preNavigationHooks` instead
48- do not disable standby mode (`usesStandbyMode: false`) without explicit permission
49
50## Logging
51
52- **ALWAYS use the `apify/log` package for logging** - This package contains critical security logic including censoring sensitive data (Apify tokens, API keys, credentials) to prevent accidental exposure in logs
53
54### Available Log Levels in `apify/log`
55
56The Apify log package provides the following methods for logging:
57
58- `log.debug()` - Debug level logs (detailed diagnostic information)
59- `log.info()` - Info level logs (general informational messages)
60- `log.warning()` - Warning level logs (warning messages for potentially problematic situations)
61- `log.warningOnce()` - Warning level logs (same warning message logged only once)
62- `log.error()` - Error level logs (error messages for failures)
63- `log.exception()` - Exception level logs (for exceptions with stack traces)
64- `log.perf()` - Performance level logs (performance metrics and timing information)
65- `log.deprecated()` - Deprecation level logs (warnings about deprecated code)
66- `log.softFail()` - Soft failure logs (non-critical failures that don't stop execution, e.g., input validation errors, skipped items)
67- `log.internal()` - Internal level logs (internal/system messages)
68
69**Best practices:**
70
71- Use `log.debug()` for detailed operation-level diagnostics (inside functions)
72- Use `log.info()` for general informational messages (API requests, successful operations)
73- Use `log.warning()` for potentially problematic situations (validation failures, unexpected states)
74- Use `log.error()` for actual errors and failures
75- Use `log.exception()` for caught exceptions with stack traces
76
77## Standby Mode
78
79- **NEVER disable standby mode (`usesStandbyMode: false`) in `.actor/actor.json` without explicit permission** - Actor Standby mode solves this problem by letting you have the Actor ready in the background, waiting for the incoming HTTP requests. In a sense, the Actor behaves like a real-time web server or standard API server instead of running the logic once to process everything in batch. Always keep `usesStandbyMode: true` unless there is a specific documented reason to disable it
80- **ALWAYS implement readiness probe handler for standby Actors** - Handle the `x-apify-container-server-readiness-probe` header at GET / endpoint to ensure proper Actor lifecycle management
81
82You can recognize a standby Actor by checking the `usesStandbyMode` property in `.actor/actor.json`. Only implement the readiness probe if this property is set to `true`.
83
84### Readiness Probe Implementation Example
85
86```javascript
87// Apify standby readiness probe at root path
88app.get('/', (req, res) => {
89    res.writeHead(200, { 'Content-Type': 'text/plain' });
90    if (req.headers['x-apify-container-server-readiness-probe']) {
91        res.end('Readiness probe OK\n');
92    } else {
93        res.end('Actor is ready\n');
94    }
95});
96```
97
98Key points:
99
100- Detect the `x-apify-container-server-readiness-probe` header in incoming requests
101- Respond with HTTP 200 status code for both readiness probe and normal requests
102- This enables proper Actor lifecycle management in standby mode
103
104## Commands
105
106```bash
107# Local development
108apify run                              # Run Actor locally
109
110# Authentication & deployment
111apify login                            # Authenticate account
112apify push                             # Deploy to Apify platform
113
114# Help
115apify help                             # List all commands
116```
117
118## Safety and Permissions
119
120Allowed without prompt:
121
122- read files with `Actor.getValue()`
123- push data with `Actor.pushData()`
124- set values with `Actor.setValue()`
125- enqueue requests to RequestQueue
126- run locally with `apify run`
127
128Ask first:
129
130- npm/pip package installations
131- apify push (deployment to cloud)
132- proxy configuration changes (requires paid plan)
133- Dockerfile changes affecting builds
134- deleting datasets or key-value stores
135
136## Project Structure
137
138.actor/
139├── actor.json # Actor config: name, version, env vars, runtime settings
140├── input_schema.json # Input validation & Console form definition
141└── output_schema.json # Specifies where an Actor stores its output
142src/
143└── main.js # Actor entry point and orchestrator
144storage/ # Local storage (mirrors Cloud during development)
145├── datasets/ # Output items (JSON objects)
146├── key_value_stores/ # Files, config, INPUT
147└── request_queues/ # Pending crawl requests
148Dockerfile # Container image definition
149AGENTS.md # AI agent instructions (this file)
150
151## Actor Input Schema
152
153The input schema defines the input parameters for an Actor. It's a JSON object comprising various field types supported by the Apify platform.
154
155### Structure
156
157```json
158{
159    "title": "<INPUT-SCHEMA-TITLE>",
160    "type": "object",
161    "schemaVersion": 1,
162    "properties": {
163        /* define input fields here */
164    },
165    "required": []
166}
167```
168
169### Example
170
171```json
172{
173    "title": "E-commerce Product Scraper Input",
174    "type": "object",
175    "schemaVersion": 1,
176    "properties": {
177        "startUrls": {
178            "title": "Start URLs",
179            "type": "array",
180            "description": "URLs to start scraping from (category pages or product pages)",
181            "editor": "requestListSources",
182            "default": [{ "url": "https://example.com/category" }],
183            "prefill": [{ "url": "https://example.com/category" }]
184        },
185        "followVariants": {
186            "title": "Follow Product Variants",
187            "type": "boolean",
188            "description": "Whether to scrape product variants (different colors, sizes)",
189            "default": true
190        },
191        "maxRequestsPerCrawl": {
192            "title": "Max Requests per Crawl",
193            "type": "integer",
194            "description": "Maximum number of pages to scrape (0 = unlimited)",
195            "default": 1000,
196            "minimum": 0
197        },
198        "proxyConfiguration": {
199            "title": "Proxy Configuration",
200            "type": "object",
201            "description": "Proxy settings for anti-bot protection",
202            "editor": "proxy",
203            "default": { "useApifyProxy": false }
204        },
205        "locale": {
206            "title": "Locale",
207            "type": "string",
208            "description": "Language/country code for localized content",
209            "default": "cs",
210            "enum": ["cs", "en", "de", "sk"],
211            "enumTitles": ["Czech", "English", "German", "Slovak"]
212        }
213    },
214    "required": ["startUrls"]
215}
216```
217
218## Actor Output Schema
219
220The Actor output schema builds upon the schemas for the dataset and key-value store. It specifies where an Actor stores its output and defines templates for accessing that output. Apify Console uses these output definitions to display run results.
221
222### Structure
223
224```json
225{
226    "actorOutputSchemaVersion": 1,
227    "title": "<OUTPUT-SCHEMA-TITLE>",
228    "properties": {
229        /* define your outputs here */
230    }
231}
232```
233
234### Example
235
236```json
237{
238    "actorOutputSchemaVersion": 1,
239    "title": "Output schema of the files scraper",
240    "properties": {
241        "files": {
242            "type": "string",
243            "title": "Files",
244            "template": "{{links.apiDefaultKeyValueStoreUrl}}/keys"
245        },
246        "dataset": {
247            "type": "string",
248            "title": "Dataset",
249            "template": "{{links.apiDefaultDatasetUrl}}/items"
250        }
251    }
252}
253```
254
255### Output Schema Template Variables
256
257- `links` (object) - Contains quick links to most commonly used URLs
258- `links.publicRunUrl` (string) - Public run url in format `https://console.apify.com/view/runs/:runId`
259- `links.consoleRunUrl` (string) - Console run url in format `https://console.apify.com/actors/runs/:runId`
260- `links.apiRunUrl` (string) - API run url in format `https://api.apify.com/v2/actor-runs/:runId`
261- `links.apiDefaultDatasetUrl` (string) - API url of default dataset in format `https://api.apify.com/v2/datasets/:defaultDatasetId`
262- `links.apiDefaultKeyValueStoreUrl` (string) - API url of default key-value store in format `https://api.apify.com/v2/key-value-stores/:defaultKeyValueStoreId`
263- `links.containerRunUrl` (string) - URL of a webserver running inside the run in format `https://<containerId>.runs.apify.net/`
264- `run` (object) - Contains information about the run same as it is returned from the `GET Run` API endpoint
265- `run.defaultDatasetId` (string) - ID of the default dataset
266- `run.defaultKeyValueStoreId` (string) - ID of the default key-value store
267
268## Dataset Schema Specification
269
270The dataset schema defines how your Actor's output data is structured, transformed, and displayed in the Output tab in the Apify Console.
271
272### Example
273
274Consider an example Actor that calls `Actor.pushData()` to store data into dataset:
275
276```javascript
277import { Actor } from 'apify';
278// Initialize the JavaScript SDK
279await Actor.init();
280
281/**
282 * Actor code
283 */
284await Actor.pushData({
285    numericField: 10,
286    pictureUrl: 'https://www.google.com/images/branding/googlelogo/2x/googlelogo_color_92x30dp.png',
287    linkUrl: 'https://google.com',
288    textField: 'Google',
289    booleanField: true,
290    dateField: new Date(),
291    arrayField: ['#hello', '#world'],
292    objectField: {},
293});
294
295// Exit successfully
296await Actor.exit();
297```
298
299To set up the Actor's output tab UI, reference a dataset schema file in `.actor/actor.json`:
300
301```json
302{
303    "actorSpecification": 1,
304    "name": "book-library-scraper",
305    "title": "Book Library Scraper",
306    "version": "1.0.0",
307    "storages": {
308        "dataset": "./dataset_schema.json"
309    }
310}
311```
312
313Then create the dataset schema in `.actor/dataset_schema.json`:
314
315```json
316{
317    "actorSpecification": 1,
318    "fields": {},
319    "views": {
320        "overview": {
321            "title": "Overview",
322            "transformation": {
323                "fields": [
324                    "pictureUrl",
325                    "linkUrl",
326                    "textField",
327                    "booleanField",
328                    "arrayField",
329                    "objectField",
330                    "dateField",
331                    "numericField"
332                ]
333            },
334            "display": {
335                "component": "table",
336                "properties": {
337                    "pictureUrl": {
338                        "label": "Image",
339                        "format": "image"
340                    },
341                    "linkUrl": {
342                        "label": "Link",
343                        "format": "link"
344                    },
345                    "textField": {
346                        "label": "Text",
347                        "format": "text"
348                    },
349                    "booleanField": {
350                        "label": "Boolean",
351                        "format": "boolean"
352                    },
353                    "arrayField": {
354                        "label": "Array",
355                        "format": "array"
356                    },
357                    "objectField": {
358                        "label": "Object",
359                        "format": "object"
360                    },
361                    "dateField": {
362                        "label": "Date",
363                        "format": "date"
364                    },
365                    "numericField": {
366                        "label": "Number",
367                        "format": "number"
368                    }
369                }
370            }
371        }
372    }
373}
374```
375
376### Structure
377
378```json
379{
380    "actorSpecification": 1,
381    "fields": {},
382    "views": {
383        "<VIEW_NAME>": {
384            "title": "string (required)",
385            "description": "string (optional)",
386            "transformation": {
387                "fields": ["string (required)"],
388                "unwind": ["string (optional)"],
389                "flatten": ["string (optional)"],
390                "omit": ["string (optional)"],
391                "limit": "integer (optional)",
392                "desc": "boolean (optional)"
393            },
394            "display": {
395                "component": "table (required)",
396                "properties": {
397                    "<FIELD_NAME>": {
398                        "label": "string (optional)",
399                        "format": "text|number|date|link|boolean|image|array|object (optional)"
400                    }
401                }
402            }
403        }
404    }
405}
406```
407
408**Dataset Schema Properties:**
409
410- `actorSpecification` (integer, required) - Specifies the version of dataset schema structure document (currently only version 1)
411- `fields` (JSONSchema object, required) - Schema of one dataset object (use JsonSchema Draft 2020-12 or compatible)
412- `views` (DatasetView object, required) - Object with API and UI views description
413
414**DatasetView Properties:**
415
416- `title` (string, required) - Visible in UI Output tab and API
417- `description` (string, optional) - Only available in API response
418- `transformation` (ViewTransformation object, required) - Data transformation applied when loading from Dataset API
419- `display` (ViewDisplay object, required) - Output tab UI visualization definition
420
421**ViewTransformation Properties:**
422
423- `fields` (string[], required) - Fields to present in output (order matches column order)
424- `unwind` (string[], optional) - Deconstructs nested children into parent object
425- `flatten` (string[], optional) - Transforms nested object into flat structure
426- `omit` (string[], optional) - Removes specified fields from output
427- `limit` (integer, optional) - Maximum number of results (default: all)
428- `desc` (boolean, optional) - Sort order (true = newest first)
429
430**ViewDisplay Properties:**
431
432- `component` (string, required) - Only `table` is available
433- `properties` (Object, optional) - Keys matching `transformation.fields` with ViewDisplayProperty values
434
435**ViewDisplayProperty Properties:**
436
437- `label` (string, optional) - Table column header
438- `format` (string, optional) - One of: `text`, `number`, `date`, `link`, `boolean`, `image`, `array`, `object`
439
440## Key-Value Store Schema Specification
441
442The key-value store schema organizes keys into logical groups called collections for easier data management.
443
444### Example
445
446Consider an example Actor that calls `Actor.setValue()` to save records into the key-value store:
447
448```javascript
449import { Actor } from 'apify';
450// Initialize the JavaScript SDK
451await Actor.init();
452
453/**
454 * Actor code
455 */
456await Actor.setValue('document-1', 'my text data', { contentType: 'text/plain' });
457
458await Actor.setValue(`image-${imageID}`, imageBuffer, { contentType: 'image/jpeg' });
459
460// Exit successfully
461await Actor.exit();
462```
463
464To configure the key-value store schema, reference a schema file in `.actor/actor.json`:
465
466```json
467{
468    "actorSpecification": 1,
469    "name": "data-collector",
470    "title": "Data Collector",
471    "version": "1.0.0",
472    "storages": {
473        "keyValueStore": "./key_value_store_schema.json"
474    }
475}
476```
477
478Then create the key-value store schema in `.actor/key_value_store_schema.json`:
479
480```json
481{
482    "actorKeyValueStoreSchemaVersion": 1,
483    "title": "Key-Value Store Schema",
484    "collections": {
485        "documents": {
486            "title": "Documents",
487            "description": "Text documents stored by the Actor",
488            "keyPrefix": "document-"
489        },
490        "images": {
491            "title": "Images",
492            "description": "Images stored by the Actor",
493            "keyPrefix": "image-",
494            "contentTypes": ["image/jpeg"]
495        }
496    }
497}
498```
499
500### Structure
501
502```json
503{
504    "actorKeyValueStoreSchemaVersion": 1,
505    "title": "string (required)",
506    "description": "string (optional)",
507    "collections": {
508        "<COLLECTION_NAME>": {
509            "title": "string (required)",
510            "description": "string (optional)",
511            "key": "string (conditional - use key OR keyPrefix)",
512            "keyPrefix": "string (conditional - use key OR keyPrefix)",
513            "contentTypes": ["string (optional)"],
514            "jsonSchema": "object (optional)"
515        }
516    }
517}
518```
519
520**Key-Value Store Schema Properties:**
521
522- `actorKeyValueStoreSchemaVersion` (integer, required) - Version of key-value store schema structure document (currently only version 1)
523- `title` (string, required) - Title of the schema
524- `description` (string, optional) - Description of the schema
525- `collections` (Object, required) - Object where each key is a collection ID and value is a Collection object
526
527**Collection Properties:**
528
529- `title` (string, required) - Collection title shown in UI tabs
530- `description` (string, optional) - Description appearing in UI tooltips
531- `key` (string, conditional) - Single specific key for this collection
532- `keyPrefix` (string, conditional) - Prefix for keys included in this collection
533- `contentTypes` (string[], optional) - Allowed content types for validation
534- `jsonSchema` (object, optional) - JSON Schema Draft 07 format for `application/json` content type validation
535
536Either `key` or `keyPrefix` must be specified for each collection, but not both.
537
538## Apify MCP Tools
539
540If MCP server is configured, use these tools for documentation:
541
542- `search-apify-docs` - Search documentation
543- `fetch-apify-docs` - Get full doc pages
544
545Otherwise, reference: `@https://mcp.apify.com/`
546
547## Resources
548
549- [docs.apify.com/llms.txt](https://docs.apify.com/llms.txt) - Quick reference
550- [docs.apify.com/llms-full.txt](https://docs.apify.com/llms-full.txt) - Complete docs
551- [crawlee.dev](https://crawlee.dev) - Crawlee documentation
552- [whitepaper.actor](https://raw.githubusercontent.com/apify/actor-whitepaper/refs/heads/master/README.md) - Complete Actor specification

Dockerfile

# Specify the base Docker image. You can read more about
# the available images at https://crawlee.dev/docs/guides/docker-images
# You can also use any other image from Docker Hub.
FROM apify/actor-node-playwright-chrome:22-1.56.1

# Check preinstalled packages
RUN npm ls @crawlee/core apify puppeteer playwright

# Copy just package.json and package-lock.json
# to speed up the build using Docker layer cache.
COPY --chown=myuser:myuser package*.json Dockerfile ./

# Check Playwright version is the same as the one from base image.
RUN node check-playwright-version.mjs

# Install NPM packages, skip optional and development dependencies to
# keep the image small. Avoid logging too much and print the dependency
# tree for debugging
RUN npm --quiet set progress=false \
    && npm install --omit=dev --omit=optional \
    && echo "Installed NPM packages:" \
    && (npm list --omit=dev --all || true) \
    && echo "Node.js version:" \
    && node --version \
    && echo "NPM version:" \
    && npm --version \
    && rm -r ~/.npm

# Next, copy the remaining files and directories with the source code.
# Since we do this after NPM install, quick build will be really fast
# for most source file changes.
COPY --chown=myuser:myuser . ./

CMD ["node", "src/main.js"]

eslint.config.mjs

1import prettier from 'eslint-config-prettier';
2
3import apify from '@apify/eslint-config/js.js';
4
5// eslint-disable-next-line import/no-default-export
6export default [{ ignores: ['**/dist'] }, ...apify, prettier];

package.json

{
    "name": "crawlee-playwright-javascript",
    "version": "0.0.1",
    "type": "module",
    "description": "This is an example of an Apify Actor.",
    "dependencies": {
        "apify": "^3.5.2",
        "@crawlee/playwright": "^3.15.3",
        "playwright": "1.56.1"
    },
    "devDependencies": {
        "@apify/eslint-config": "^1.0.0",
        "eslint": "^9.29.0",
        "eslint-config-prettier": "^10.1.5",
        "prettier": "^3.5.3"
    },
    "scripts": {
        "start": "node src/main.js",
        "format": "prettier --write .",
        "format:check": "prettier --check .",
        "lint": "eslint",
        "lint:fix": "eslint --fix",
        "test": "echo \"Error: oops, the Actor has no tests yet, sad!\" && exit 1",
        "postinstall": "npx crawlee install-playwright-browsers"
    },
    "author": "It's not you it's me",
    "license": "ISC"
}

Instagram Posts Scraper

instaprism/instagram-posts-scraper

Extract posts from any public Instagram profile. Get likes, comments, captions, media URLs. Auto-saves progress. No login required. Export JSON/CSV/Excel.

red

💬 Instagram Comments Scraper (No Login)

louisdeconinck/instagram-comments-scraper

Scrape all comments from any Instagram post in seconds. No login required. Get text, likes, timestamps, and usernames — even from large posts. Fast, affordable, and easy to use. Ideal for lead gen, research, and trend spotting. Just paste the URL and go.

Louis Deconinck

1.4K

4.3

Instagram Post Scraper

hello.datawizards/Instagram-post-scraper

Scrape public Instagram posts by username. Get captions, likes, comments, media URLs, timestamps, and more. Perfect for research, trend analysis, and media monitoring. Fast, reliable, and proxy-supported with structured JSON output.

datawizards

1.1

🚀 Instagram Posts Scraper ⚡ No Login Required

vulnv/instagram-posts-scraper

Extract detailed data from Instagram posts including captions, likes, comments, images, videos, hashtags, tagged users, and creator details. Bulk processing supported. No Instagram authentication needed - just provide post URLs and get structured JSON data.

VulnV

Instagram Profile Post Scraper

scrapier/instagram-profile-post-scraper

Scrape posts from individual Instagram profiles with the Instagram Profile Post Scraper. Extract images, videos, captions, hashtags, likes, comments, and timestamps. Perfect for content analysis, engagement tracking, and social media research.

Scrapier

Instagram Profile Post Scraper

scrapio/instagram-profile-post-scraper

Scrapes posts from any public Instagram profile, capturing captions, images, videos, timestamps, likes, comments, hashtags, and post URLs. Ideal for content research, competitor tracking, influencer analysis, and automated extraction of Instagram profile posts at scale

Scrapio

Instagram Profile Posts Scraper

futurizerush/instagram-profile-posts-scraper

Get posts, reels, and carousels from any public Instagram profile. Includes download links, likes, comments, captions, and more. Works with up to 10 profiles at once. No login required.

Rush

5.0

Instagram Posts Scraper

api-empire/instagram-posts-scraper

Instagram Posts Scraper extracts posts from any public Instagram profile. Get captions, media URLs, hashtags, timestamps, and engagement metrics fast. Ideal for research, content analysis, trend tracking, and automation workflows needing structured Instagram post data.

API Empire

Instagram Posts Scraper

scrapio/instagram-posts-scraper

Scrapes posts from any public Instagram profile or hashtag, capturing captions, images, videos, timestamps, likes, comments, hashtags, and post URLs. Ideal for content analysis, competitor research, influencer insights, and large-scale Instagram post extraction

Scrapio

Instagram User Reels Scraper

powerai/instagram-user-reels-scraper

Extract Instagram reels from any public user profile with detailed media and metadata. Supports auto-pagination and custom result limits. No login required.

PowerAI

129

Instagram Post Scraper – Fast, Proxy-Based, No Login

.actor/actor.json

.actor/dataset_schema.json

.actor/input_schema.json

.actor/output_schema.json

src/main.js

src/routes.js

.dockerignore

.editorconfig

.gitignore

.prettierrc

AGENTS.md

Dockerfile

eslint.config.mjs

package.json

You might also like

Instagram Posts Scraper

💬 Instagram Comments Scraper (No Login)

Instagram Post Scraper

🚀 Instagram Posts Scraper ⚡ No Login Required

Instagram Profile Post Scraper

Instagram Profile Post Scraper

Instagram Profile Posts Scraper

Instagram Posts Scraper

Instagram Posts Scraper

Instagram User Reels Scraper

.actor/actor.json

.actor/dataset_schema.json

.actor/input_schema.json

.actor/output_schema.json

src/main.js

src/routes.js

.dockerignore

.editorconfig

.gitignore

.prettierrc

AGENTS.md

Dockerfile

eslint.config.mjs

package.json