πŸ“Έ Instagram Post Scraper Pro avatar

πŸ“Έ Instagram Post Scraper Pro

Pricing

from $1.77 / 1,000 results

Go to Apify Store
πŸ“Έ Instagram Post Scraper Pro

πŸ“Έ Instagram Post Scraper Pro

Instagram posts with captions, hashtags, engagement, media URLs, and MCP-ready metadata. Desktop+embed fallback chain. Hashtag categorization. 3 modes.

Pricing

from $1.77 / 1,000 results

Rating

0.0

(0)

Developer

Virtual Footprint LLC

Virtual Footprint LLC

Maintained by Community

Actor stats

0

Bookmarked

1

Total users

0

Monthly active users

2 days ago

Last modified

Share

Instagram Post Scraper Pro

Apify Version Pricing Platform Modes MCP-ready

Instagram post intelligence: captions, hashtags, engagement, media URLs, hashtag categorization, and MCP-ready providerHealth metadata. Desktop+embed fallback chain. No login cookies required.


Why This Actor Is Better

Competitor comparison

FeatureThis ActorApify IG Post ScraperFree scrapersSocial analytics SaaS
No login cookies requiredβœ…βŒβŒn/a
Desktop+embed fallbackβœ…βŒβŒn/a
Hashtag categorizationβœ… 7 categoriesβŒβŒβœ… paid
Engagement-rate-per-postβœ…βŒβŒβœ… paid
Caption sentimentβœ…βŒβŒβœ… paid
Optional Google Vision tagsβœ… user keyβŒβŒβœ… paid
Confidence score (0-1)βœ…βŒβŒβŒ
MCP-ready metadataβœ… providerHealth❌❌❌
Price / 1K posts$1.77~$2.50free (rate-limited)~$5-20

Key Features

  • πŸ›‘οΈ Multi-API fallback chain β€” Instagram desktop (Playwright) primary with automatic embed HTML fallback.
  • 🏷️ Hashtag categorization β€” auto-maps hashtags to 7 categories (fitness, food, travel, fashion, beauty, business, tech).
  • πŸ“Š Engagement scoring β€” engagement-rate-per-post + caption sentiment on every result.
  • 🎬 Media detection β€” image/video/carousel classification + media URL extraction.
  • 🎯 Confidence scoring β€” 0.0–1.0 reliability score.
  • πŸ”— Source attribution β€” know which providers contributed each field.
  • ⚑ Cache-first mode β€” fast_lookup hits KVS cache (1h TTL).
  • πŸ€– MCP-ready β€” providerHealth{} on every result.
  • πŸ”Œ Optional Google Vision β€” drop in GOOGLE_VISION_API_KEY for image labeling.
  • πŸ’° Transparent PPE pricing β€” pay only for successful posts.

Architecture

flowchart TD
A[Input: usernames + mode] --> B{Cache hit?}
B -- yes --> C[Return cached base posts]
B -- no --> D[Primary: Instagram desktop Playwright]
D -- fails --> E[Fallback: Instagram embed HTML httpx]
D --> F[Normalize: caption/media/likes/comments]
E --> F
F --> G[Enrichment layer]
G --> G1[Hashtag extraction + categorization]
G --> G2[Engagement-rate scoring]
G --> G3[Caption sentiment]
G --> G4[Email/URL extraction]
G --> G5[Optional: Google Vision image tags]
G1 --> H[Confidence scoring + source attribution]
G2 --> H
G3 --> H
G4 --> H
G5 --> H
H --> I[Progressive dataset push]
I --> J[Webhook + MCP-ready metadata]
C --> J

Modes

ModeDescriptionTarget latencyUse case
fast_lookupCache-first, base posts only<800ms cachedQuick counts, dedup
enrichHashtags + engagement + sentiment + optional Vision~2-4s/postContent analysis, trend research
batchQueue-based, full enrichment, per-item isolationvariesLarge username lists (100+)

Input

ParameterTypeRequiredDefaultDescription
modestringβ€”enrichfast_lookup | enrich | batch
queriesarrayβœ…["cristiano"]Usernames or profile URLs
maxResultsintegerβ€”25Max posts per user (1–1000)
webhookUrlstringβ€”β€”Webhook for completion notification

Example input

{
"mode": "enrich",
"queries": ["cristiano", "leomessi"],
"maxResults": 50,
"webhookUrl": "https://your-app.com/webhook"
}

Output

FieldTypeDescription
querystringInput query
usernamestringProfile handle
postUrlstringPost URL
authorstringAuthor handle
captionstringCaption text (truncated to 1000 chars)
imageUrlstringMedia thumbnail URL
mediaTypestringimage | video
timestampstringPost timestamp
likesintegerLike count
commentsintegerComment count
hashtagsarrayHashtags in caption
hashtagCategoriesarrayMapped categories (fitness/food/travel/fashion/beauty/business/tech)
captionSentimentstringpositive | negative | neutral
engagementRatenumberLikes-to-comments ratio
imageLabelsarray | nullGoogle Vision labels (if key provided)
emailsarrayEmails in caption
urlsarrayURLs in caption
confidenceScorenumber0.0–1.0 reliability
sourcesarrayProvider attribution
providerHealthobjectPer-provider status/latency
cacheStatusstringhit | miss | degraded
modestringExecution mode
extractedAtstringISO timestamp

Example output

{
"query": "cristiano",
"username": "cristiano",
"postUrl": "https://www.instagram.com/p/Cxxx/",
"author": "cristiano",
"caption": "Great game today! ⚽ #football #fitness",
"imageUrl": "https://...",
"mediaType": "image",
"timestamp": "2026-06-28T10:00:00Z",
"likes": 1200000,
"comments": 8500,
"hashtags": ["football", "fitness"],
"hashtagCategories": ["fitness"],
"captionSentiment": "positive",
"engagementRate": 141.18,
"emails": [],
"urls": [],
"confidenceScore": 0.85,
"sources": ["instagram", "hashtag_analyzer", "engagement_scorer"],
"providerHealth": {
"instagram_desktop": {"status": "ok", "latency_ms": 4500, "error": null},
"hashtag_analyzer": {"status": "ok", "latency_ms": 0, "error": null},
"engagement_scorer": {"status": "ok", "latency_ms": 0, "error": null}
},
"cacheStatus": "miss",
"mode": "enrich",
"extractedAt": "2026-06-28T23:55:00.000Z"
}

Pricing

PlanPrice per 1K postsSavings vs. top competitor
Leading competitors~$2.50/1Kβ€”
This actor (≀10K/mo)$1.77/1K29% cheaper
This actor (10K–100K/mo)$1.50/1K40% cheaper
This actor (100K+/mo)$1.25/1K50% cheaper

Optional event: media_url at $0.20/1K posts with extracted media URL.


Use Cases

  • Content trend research β€” track hashtag categories and engagement across creators
  • Competitor analysis β€” benchmark post engagement and caption sentiment
  • Influencer vetting β€” measure post-level engagement rates before partnerships
  • Hashtag strategy β€” find which categories drive engagement in your niche
  • Media monitoring β€” extract image/video URLs for asset libraries
  • MCP agent workflows β€” providerHealth lets agents route around failures
  • Market research β€” map content categories by creator audience
  • Image AI pipelines β€” optional Google Vision tagging for visual classification

Integration Examples

Python (Apify SDK)

from apify_client import ApifyClient
client = ApifyClient("YOUR_APIFY_TOKEN")
run = client.actor("ayeeyee/instagram-post-scraper-pro").call(run_input={
"mode": "enrich",
"queries": ["cristiano", "leomessi"],
"maxResults": 50,
})
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
print(f"@{item['username']}: {item['likes']} likes, {item['hashtagCategories']}, {item['captionSentiment']}")

cURL

curl -X POST "https://api.apify.com/v2/acts/ayeeyee~instagram-post-scraper-pro/runs?token=YOUR_TOKEN" \
-H "Content-Type: application/json" \
-d '{"mode":"enrich","queries":["cristiano"],"maxResults":25}'

MCP (Model Context Protocol)

$npx -y @apify/actors-mcp-server --tools actors,ayeeyee/instagram-post-scraper-pro

Agents can call call-actor and use providerHealth + hashtagCategories + engagementRate to filter and route.


FAQ

Q: Do I need Instagram login cookies? No. Public posts are scraped via DOM and meta tags β€” no login required. Private accounts return no posts.

Q: How does the desktop+embed fallback work? If Playwright fails (blocked, timeout, login wall), the actor falls back to fetching profile HTML for post URLs and thumbnails (degraded β€” no per-post captions). providerHealth shows which provider succeeded.

Q: How are hashtag categories assigned? Open-source keyword mapping to 7 categories: fitness, food, travel, fashion, beauty, business, tech. A post with #fitness and #gym gets categorized as fitness.

Q: What is engagementRate? Likes-to-comments ratio. Higher = more passive engagement (likes without discussion). Useful for content-type benchmarking.

Q: Can I call this from an LLM agent? Yes. MCP-ready with providerHealth{}, hashtagCategories, engagementRate, and confidenceScore.

Q: What is the cache TTL? 1 hour for fast_lookup. Enrichment results are not cached (always fresh).


Scrapes publicly available Instagram post data. Does not access private data, bypass authentication, or store credentials. Users are responsible for complying with GDPR/CCPA and Instagram's ToS.


AI-DLC / Data Lifecycle

  • Collection β€” Public data only; respects robots.txt and rate limits.
  • Processing β€” In-memory normalization; no PII logging.
  • Storage β€” Results in user's Apify dataset, not retained by actor.
  • Usage β€” Content analysis, trend research, legitimate marketing.
  • Disposal β€” No long-term caching (1h TTL for base results only).

Enhancement Roadmap (API / MCP Integrations)

  • Google Vision image tagging MCP β€” auto-label post images (optional, user key)
  • OpenAI caption summarization MCP β€” 1-line summaries for feed aggregation
  • Chartmetric sound trends MCP β€” for Reels/audio trend correlation
  • LangGraph workflow β€” IG posts β†’ hashtag clustering β†’ trend alerts
  • Vector store β€” semantic post deduplication across creators

Changelog

  • v3.0 β€” Multi-API orchestration edition: desktop+embed fallback, hashtag categorization, engagement scoring, MCP-ready providerHealth, optional Google Vision, expanded FAQ, integration examples, volume pricing.
  • v2.0 β€” Premium README, AI-DLC docs, confidence scoring, source attribution.
  • v1.0 β€” Initial release with Playwright scraping and hashtag extraction.