# Instagram Scraper (`spry_headset/instagram-page-post-scraper`) Actor

Standalone Instagram scraper for profile feeds, direct post/reel URLs, profile details, and visible comments. Uses Apify Proxy, supports lower-bandwidth scraping by blocking heavy media resources

- **URL**: https://apify.com/spry\_headset/instagram-page-post-scraper.md
- **Developed by:** [insomniac dev](https://apify.com/spry_headset) (community)
- **Categories:** Automation, Social media, Developer tools
- **Stats:** 5 total users, 3 monthly users, 30.4% runs succeeded, 0 bookmarks
- **User rating**: No ratings yet

## Pricing

from $0.50 / 1,000 results

This Actor is paid per event. You are not charged for the Apify platform usage, but only a fixed price for specific events.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-event

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

## Instagram Standalone Post/Profile Scraper

Standalone Apify Actor for scraping Instagram **profile feeds**, **direct post/reel links**, **profile details**, and **visible/latest comments**.

This actor is designed to be compatible with the **post/profile scraping workflow** used in `fabulate-dagster` while avoiding the wrapper approach.

### Key points

- standalone scraper, not a wrapper to `apify/instagram-scraper`
- uses **Apify Proxy** by default
- blocks **images, media, fonts, and stylesheets** during page load to reduce bandwidth and proxy spend
- supports profile URLs and direct post/reel URLs
- uses an API-first profile feed path and only uses DOM comment extraction on direct post/reel pages
- keeps output compatible with the fields used by the Dagster Instagram post pipeline

---

### Supported input modes

Currently supported:

- `resultsType: "posts"`
- `resultsType: "details"`
- `resultsType: "comments"`
- `resultsType: "reels"`
- `directUrls`

Currently **not** supported in standalone mode:

- search-only scraping
- `mentions`
- `stories`
- hashtag/place discovery

If no `directUrls` are provided, the actor fails with a clear error.

---

### Input example

```json
{
  "resultsType": "posts",
  "directUrls": [
    "https://www.instagram.com/instagram/"
  ],
  "resultsLimit": 3,
  "addParentData": true,
  "skipPinnedPosts": true,
  "proxyConfiguration": {
    "useApifyProxy": true
  }
}
````

### Proxy behavior

If you do not pass `proxyConfiguration`, the actor defaults to:

```json
{
  "useApifyProxy": true,
  "apifyProxyGroups": ["RESIDENTIAL"]
}
```

This is recommended for Instagram on Apify cloud. Datacenter IPs are more likely to get redirected to the Instagram login page.

***

### Output

The actor emits Instagram-like result objects for:

- profile feed posts
- direct post/reel URLs
- profile detail objects
- comment objects

For profile feeds, `latestComments` comes from timeline/preview data when available. For direct post/reel URLs and `resultsType="comments"`, the actor extracts the visible comments rendered on the page.

Reference schemas are documented here:

- `docs/schemas/instagram-output.schema.json`
- `docs/schemas/instagram-post.schema.json`
- `docs/schemas/instagram-comment.schema.json`
- `docs/schemas/instagram-profile.schema.json`
- `docs/schemas/instagram-hashtag.schema.json`
- `docs/schemas/instagram-place.schema.json`

These schema files are documentation aids. The actual actor dataset remains permissive.

***

### Why page loading is cheaper

This standalone actor blocks these resource types during navigation:

- images
- media
- fonts
- stylesheets

It also blocks common Instagram CDN media hosts.

That means page HTML and JavaScript still load, but heavy media payloads do not.

***

### Local run

```bash
npm install
apify run --input-file=input.json
```

***

### Dagster compatibility scope

This actor is intended to replace the **post/profile scraping** use of `apify/instagram-scraper` in `fabulate_dagster/ops/scrape_insta.py`.

It is **not** intended to replace the separate Instagram story actor.

***

### Testing

See:

- `docs/TESTING.md`

***

### Technical details

See:

- `docs/IMPLEMENTATION.md`

***

### License

ISC

# Actor input Schema

## `resultsType` (type: `string`):

**Posts** returns a feed of content. **Profile, hashtag, or place details** returns metadata about the page: follower count, bio, post count, and profile picture.<br><br>Comments only work with post URLs, e.g. `instagram.com/p/ABC123xyz/` not `instagram.com/username/`.

## `directUrls` (type: `array`):

Add one or more Instagram URLs to scrape.<br><br>URL format must match your content type:<ul><li><code>/p/</code> for posts and comments</li><li><code>/reel/</code> for reels</li><li><code>/username/</code> for profile details</li></ul><br>Leave blank if using the search query instead.

## `resultsLimit` (type: `integer`):

Set how many posts or comments to scrape per Instagram URL. Higher values increase compute usage and cost.<br><br>Maximum 50 comments per post. Instagram may return fewer than 15 comments on some posts.<br><br>Use `1` to retrieve a single post per page.

## `onlyPostsNewerThan` (type: `string`):

Limit how far back to scrape. Enter a date in `YYYY-MM-DD`, ISO format, or as a relative value, e.g. `1 day`, `2 months`, `3 years`.<br><br>Times are in UTC, not local time. New York is UTC-5 in winter, UTC-4 in summer. Pinned posts may still appear even with this filter set.

## `search` (type: `string`):

Search for Instagram hashtags, profiles, or places.<br>Use `#` for hashtags, e.g. `#travel`.

## `searchType` (type: `string`):

Hashtag search returns posts tagged with that term. Profile search returns matching accounts. Place search returns matching locations.

## `searchLimit` (type: `integer`):

Set how many profiles, places, or hashtags to find.<br>To limit posts per result, use Results limit per URL.<br>Higher values increase compute usage and cost.

## `addParentData` (type: `boolean`):

This applies only to feed items. Adds a dataSource field to each result: profile posts are labeled `profile`, and tag posts are labeled `hashtag`.

## `proxyConfiguration` (type: `object`):

Apify Proxy configuration. If omitted, this actor defaults to `{ "useApifyProxy": true, "apifyProxyGroups": ["RESIDENTIAL"] }`, which is recommended for Instagram on Apify cloud.

## Actor input object example

```json
{
  "resultsType": "posts",
  "directUrls": [
    "https://www.instagram.com/humansofny/"
  ],
  "resultsLimit": 100,
  "searchType": "hashtag",
  "searchLimit": 10,
  "addParentData": false,
  "proxyConfiguration": {
    "useApifyProxy": true,
    "apifyProxyGroups": [
      "RESIDENTIAL"
    ]
  }
}
```

# Actor output Schema

## `datasetItems` (type: `string`):

No description

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {
    "resultsType": "posts",
    "directUrls": [
        "https://www.instagram.com/humansofny/"
    ],
    "resultsLimit": 100,
    "searchType": "hashtag",
    "searchLimit": 10
};

// Run the Actor and wait for it to finish
const run = await client.actor("spry_headset/instagram-page-post-scraper").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = {
    "resultsType": "posts",
    "directUrls": ["https://www.instagram.com/humansofny/"],
    "resultsLimit": 100,
    "searchType": "hashtag",
    "searchLimit": 10,
}

# Run the Actor and wait for it to finish
run = client.actor("spry_headset/instagram-page-post-scraper").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{
  "resultsType": "posts",
  "directUrls": [
    "https://www.instagram.com/humansofny/"
  ],
  "resultsLimit": 100,
  "searchType": "hashtag",
  "searchLimit": 10
}' |
apify call spry_headset/instagram-page-post-scraper --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=spry_headset/instagram-page-post-scraper",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "Instagram Scraper",
        "description": "Standalone Instagram scraper for profile feeds, direct post/reel URLs, profile details, and visible comments. Uses Apify Proxy, supports lower-bandwidth scraping by blocking heavy media resources",
        "version": "0.1",
        "x-build-id": "wIJmL3Orhshb15O7D"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/spry_headset~instagram-page-post-scraper/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-spry_headset-instagram-page-post-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/spry_headset~instagram-page-post-scraper/runs": {
            "post": {
                "operationId": "runs-sync-spry_headset-instagram-page-post-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/spry_headset~instagram-page-post-scraper/run-sync": {
            "post": {
                "operationId": "run-sync-spry_headset-instagram-page-post-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "properties": {
                    "resultsType": {
                        "title": "Content to scrape",
                        "enum": [
                            "posts",
                            "details",
                            "comments",
                            "reels",
                            "mentions",
                            "stories"
                        ],
                        "type": "string",
                        "description": "**Posts** returns a feed of content. **Profile, hashtag, or place details** returns metadata about the page: follower count, bio, post count, and profile picture.<br><br>Comments only work with post URLs, e.g. `instagram.com/p/ABC123xyz/` not `instagram.com/username/`.",
                        "default": "posts"
                    },
                    "directUrls": {
                        "title": "Instagram URLs based on content type",
                        "uniqueItems": true,
                        "type": "array",
                        "description": "Add one or more Instagram URLs to scrape.<br><br>URL format must match your content type:<ul><li><code>/p/</code> for posts and comments</li><li><code>/reel/</code> for reels</li><li><code>/username/</code> for profile details</li></ul><br>Leave blank if using the search query instead.",
                        "items": {
                            "type": "string"
                        }
                    },
                    "resultsLimit": {
                        "title": "Results limit per URL",
                        "minimum": 1,
                        "type": "integer",
                        "description": "Set how many posts or comments to scrape per Instagram URL. Higher values increase compute usage and cost.<br><br>Maximum 50 comments per post. Instagram may return fewer than 15 comments on some posts.<br><br>Use `1` to retrieve a single post per page."
                    },
                    "onlyPostsNewerThan": {
                        "title": "Filter by date",
                        "pattern": "^(\\d{4})-(0[1-9]|1[0-2])-(0[1-9]|[12]\\d|3[01])(T[0-2]\\d:[0-5]\\d(:[0-5]\\d)?(\\.\\d+)?Z?)?$|^(\\d+)\\s*(minute|hour|day|week|month|year)s?$",
                        "type": "string",
                        "description": "Limit how far back to scrape. Enter a date in `YYYY-MM-DD`, ISO format, or as a relative value, e.g. `1 day`, `2 months`, `3 years`.<br><br>Times are in UTC, not local time. New York is UTC-5 in winter, UTC-4 in summer. Pinned posts may still appear even with this filter set."
                    },
                    "search": {
                        "title": "Search query",
                        "type": "string",
                        "description": "Search for Instagram hashtags, profiles, or places.<br>Use `#` for hashtags, e.g. `#travel`."
                    },
                    "searchType": {
                        "title": "Search type",
                        "enum": [
                            "hashtag",
                            "profile",
                            "place",
                            "user"
                        ],
                        "type": "string",
                        "description": "Hashtag search returns posts tagged with that term. Profile search returns matching accounts. Place search returns matching locations.",
                        "default": "hashtag"
                    },
                    "searchLimit": {
                        "title": "Search results limit",
                        "minimum": 1,
                        "maximum": 250,
                        "type": "integer",
                        "description": "Set how many profiles, places, or hashtags to find.<br>To limit posts per result, use Results limit per URL.<br>Higher values increase compute usage and cost."
                    },
                    "addParentData": {
                        "title": "Add metadata",
                        "type": "boolean",
                        "description": "This applies only to feed items. Adds a dataSource field to each result: profile posts are labeled `profile`, and tag posts are labeled `hashtag`.",
                        "default": false
                    },
                    "proxyConfiguration": {
                        "title": "Proxy configuration",
                        "type": "object",
                        "description": "Apify Proxy configuration. If omitted, this actor defaults to `{ \"useApifyProxy\": true, \"apifyProxyGroups\": [\"RESIDENTIAL\"] }`, which is recommended for Instagram on Apify cloud.",
                        "default": {
                            "useApifyProxy": true,
                            "apifyProxyGroups": [
                                "RESIDENTIAL"
                            ]
                        }
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
