Instagram Tracker avatar
Instagram Tracker
Under maintenance

Pricing

from $0.01 / 1,000 results

Go to Apify Store
Instagram Tracker

Instagram Tracker

Under maintenance

This actor collects public Instagram content with a browser-based approach that mirrors real user behaviour. It works with profiles, posts, reels, and stories, and returns a consistent JSON structure you can feed into automation, analytics, or reporting workflows.

Pricing

from $0.01 / 1,000 results

Rating

0.0

(0)

Developer

vivid travelogue

vivid travelogue

Maintained by Community

Actor stats

2

Bookmarked

3

Total users

2

Monthly active users

7 days ago

Last modified

Share

The Instagram Tracker is a powerful and reliable Instagram scraper built as an Apify Actor. It automates the collection of public Instagram data, including posts, reels, stories, and profile insights. Designed for accuracy and long-term reliability, it uses real browser automation to reduce blocking and adapt to Instagram’s changing structure.

This actor uses Business Account making it suitable for use cases that require authenticated access. It supports multiple fallback scraping modes by combining official web endpoints, mobile API endpoints, and JSON structures embedded inside Instagram pages. This helps maintain stable output even when Instagram adjusts its API or HTML.

You can customize how many posts, reels, and stories to collect per profile. Each media item is normalized into a clean, consistent dataset with the essential fields: media ID, shortcode, URL, thumbnail, video source, caption, timestamps, and more. The original Instagram payload is preserved inside a raw object for advanced analysis.

Optional raw filtering lets you target items with specific characteristics such as video duration, play count, audio presence, engagement metrics, or any available raw field. You can also enable media downloading to save images and videos directly to the Key-Value Store.

The actor exports results in JSON, CSV, and the Apify Dataset, making it easy to integrate with automation tools, dashboards, or machine-learning workflows.

Ideal for social media analytics, competitor research, influencer monitoring, content archiving, and Instagram automation, the Instagram Tracker is a scalable, SEO-friendly solution for structured Instagram data extraction.

Use it to streamline research, power content strategies, or build automated social intelligence systems.

Features

  • Scrape Instagram posts, profiles, tags, and places.
  • Extract metadata from og:* meta tags and embedded JSON (__NEXT_DATA__, ld+json).
  • Optionally download media to Key-Value store ({outputFileName}/media/...).
  • Export results as JSON and/or CSV and push items to the dataset.

Input (high-level)

Configure runtime options in storage\key_value_stores\default\INPUT.json. Example values below are taken from the repository's sample INPUT.json.

  • startUrls (array): array of Instagram URLs to crawl. Default: [].
  • usernames (array): list of Instagram usernames to explicitly fetch (e.g. ["nick_saraev"]). Default: [].
  • queries (array): optional array of search queries (unused by default). Default: [].
  • maxItems (integer): maximum number of total items to collect. Sample: 25.
  • numberOfPosts (integer): per-user number of posts to fetch. Sample: 3.
  • numberOfReels (integer): per-user number of reels to fetch. Sample: 3.
  • numberOfStories (integer): per-user number of stories to fetch. Sample: 3.
  • downloadMedia (boolean): whether to download media into the Key-Value store. Sample: true.
  • mediaTypes (string): which media to download — image, video, or both. Sample: both.
  • useApifyProxy (boolean): use Apify Proxy for requests (recommended when scraping at scale). Sample: false.
  • apifyProxyCountry (string): two-letter country code for Apify Proxy routing (when enabled). Sample: US.
  • maxConcurrency (integer): Playwright concurrency/workers. Sample: 2.
  • requestTimeoutSecs (integer): per-request timeout in seconds. Sample: 60.
  • outputFormat (string): which outputs to write: json, csv, or both. Sample: both.
  • outputFileName (string): base filename/key for outputs. Sample: instagram-output.
  • rawFilter (array or object): optional filter applied to the raw node data.
    • The actor supports two filter shapes:
      • Object map: { "path.to.field": <value>, ... } (strings may be prefixed with re: for regex or contains: for substring).
      • Array of rules: [{ "path": "path.to.field", "op": "eq|contains|re|gt|lt|exists", "value": ... }, ...].
    • The sample INPUT.json in this repo uses a simple array-of-keys/paths (e.g. ["has_audio","video_duration","play_count",...]) meaning "only include items that contain these raw keys".

When present, rawFilter is applied to each item's raw object and only matching items are included in the final dataset and exported outputs.

See .actor/input_schema.json for the authoritative schema used by the Apify Console.

Output

  • JSON: saved to the Key-Value store as {outputFileName}.json.
  • CSV: saved as {outputFileName}.csv when requested.
  • Media: saved under the Key-Value store prefix {outputFileName}/media/ when downloadMedia is enabled.

Normalized item fields

Each dataset item contains a small set of normalized top-level fields plus the original platform payload under raw.

  • id: internal item id (usually {mediaId}_{ownerId})
  • shortcode: Instagram shortcode (post code)
  • url: canonical web URL for the post
  • is_video: boolean flag if the post contains video
  • display_url: preferred image thumbnail URL
  • video_url: best-effort direct MP4 URL (when available)
  • sourceTab: source type (e.g. reels, profile)
  • sourceProfile: source username
  • crawledAt: ISO timestamp when item was crawled
  • raw: full original JSON object returned by Instagram endpoints (very large, nested)

Common raw paths (useful with rawFilter)

Use dot paths into the raw object for rawFilter rules. Common keys present in dataset examples:

  • raw.id — internal media id string
  • raw.pk — numeric PK for the media
  • raw.fbid — platform FB id (when present)
  • raw.code — shortcode string
  • raw.is_dash_eligible — numeric/video eligibility flag
  • raw.media_type — numeric media type (1=image, 2=video, ...)
  • raw.is_video — (sometimes provided at normalized level; check raw.media_type)
  • raw.play_count / raw.ig_play_count — view counts for videos
  • raw.video_duration — video duration in seconds
  • raw.number_of_qualities — available quality count
  • raw.taken_at — Unix timestamp when media was taken
  • raw.like_count — number of likes
  • raw.comment_count — comment count
  • raw.caption.text — caption text
  • raw.caption.user.username — caption author username
  • raw.user.username — owner's username
  • raw.user.pk / raw.user.id — owner id fields
  • raw.owner.username — owner username (duplicate of user in many responses)
  • raw.image_versions2.candidates[].url — candidate image URLs (array)
  • raw.image_versions2.additional_candidates.first_frame.url — first-frame preview URL
  • raw.video_versions[].url — array of direct video URLs (may be multiple)
  • raw.video_dash_manifest — DASH MPD XML (string)
  • raw.original_sound_info.dash_manifest — audio DASH MPD XML
  • raw.scrubber_spritesheet_info_candidates.default.sprite_urls[] — thumbnail sprite URLs

Notes:

  • Many fields are nested and arrays use canonical names like candidates[] or video_versions[]. Use bracket notation for index-specific checks in rawFilter (e.g. image_versions2.candidates[0].url).
  • Not all responses contain every field. Prefer checking for existence (or using op: "exists") when building filters.
  • The raw object is large and can include platform-specific manifests, metrics, and nested user objects. The rawFilter engine supports equality, contains, regex, and basic numeric ops.

Notes

  • Instagram actively blocks automated access. Use low concurrency and consider using Apify Proxy for reliable runs.
  • This actor is intended as a general-purpose starter; deep pagination of comments or API-level scraping is out-of-scope in this initial version.

Example Input

{
"maxItems": 5,
"downloadMedia": true,
"mediaTypes": "both",
"useApifyProxy": false,
"apifyProxyCountry": "US",
"maxConcurrency": 2,
"requestTimeoutSecs": 60,
"outputFormat": "both",
"outputFileName": "instagram-output",
"startUrls": [],
"queries": [],
"numberOfPosts": 3,
"numberOfReels": 3,
"numberOfStories": 3,
"usernames": [
"nick_saraev"
],
"rawFilter": ["has_audio","video_duration", "play_count", "has_liked","commerciality_status","like_count","comment_count"]
}

Configuration Summary

This section documents the Instagram actor's configuration options (sample defaults taken from storage\key_value_stores\default\INPUT.json). These are the primary inputs you can set when running the Actor locally or in the Apify Console.

FieldDescription
startUrlsArray of Instagram URLs to crawl (e.g. profile/post/tag pages). Default: []
usernamesArray of Instagram usernames to fetch directly (e.g. ["nick_saraev"]). Default: []
queriesOptional free-text queries (not used by default). Default: []
maxItemsMaximum number of total items to collect (across users). Sample: 5
numberOfPostsPer-user number of posts to fetch. Sample: 3
numberOfReelsPer-user number of reels to fetch. Sample: 3
numberOfStoriesPer-user number of stories to fetch. Sample: 3
downloadMediaWhen true, downloads media into the Key-Value store under {outputFileName}/media/. Sample: true
mediaTypesWhich media to download — image, video, or both. Sample: both
useApifyProxyWhen true, use Apify Proxy for requests (recommended for large runs). Sample: false
apifyProxyCountryTwo-letter country code for Apify Proxy routing (when enabled). Sample: US
maxConcurrencyPlaywright concurrency/workers. Sample: 2
requestTimeoutSecsPer-request timeout in seconds. Sample: 60
outputFormatWhich outputs to write: json, csv, or both. Sample: both
outputFileNameBase filename/key for outputs. JSON and CSV are written using this base (sample: instagram-output)
rawFilterOptional filter applied to the raw node data. Can be an object map or an array of rules (see README rawFilter docs). Sample: array of keys ["has_audio","video_duration",...]

Notes:

  • Prefer low maxConcurrency and useApifyProxy: true for larger or long-running runs to reduce blocking.
  • rawFilter can be used to include only items that match provided raw-field checks (exists/equality/contains/regex/ numeric ops).
  • See .actor/input_schema.json for the Console schema and field validations.

Output

  • Dataset (OUTPUT) — structured JSON objects
  • output.csv — exported CSV

Each dataset object includes the normalized Instagram fields plus the original platform payload under raw. Example item:

{
"id": "1234567890",
"shortcode": "Ck1aB2XhYz",
"url": "https://www.instagram.com/p/Ck1aB2XhYz/",
"is_video": true,
"display_url": "https://scontent.cdninstagram.com/v/t51.2885-15/......jpg",
"video_url": "https://r2---sn-xxx.googlevideo.com/videoplayback?....mp4",
"sourceTab": "posts",
"sourceProfile": "nick_saraev",
"raw": {
"id": "1234567890_0987654321",
"pk": 1234567890,
"code": "Ck1aB2XhYz",
"is_video": 1,
"media_type": 2,
"video_versions": [{ "url": "https://...mp4" }],
"image_versions2": { "candidates": [{ "url": "https://...jpg" }] },
"play_count": 45231,
"video_duration": 38.5,
"like_count": 1024,
"comment_count": 12,
"caption": { "text": "Check out our new product!", "user": { "username": "brand_xyz" } },
"user": { "username": "nick_saraev", "pk": 987654321 }
},
"crawledAt": "2025-11-26T07:32:47.389Z"
}
---
## Proxy Notes
Instagram frequently blocks bots.
For stable scraping:
```json
"useApifyProxy": true,
"apifyProxyGroups": ["RESIDENTIAL"]

Set token locally:

$export APIFY_TOKEN="your-token"

License

No license specified.