πΈ Instagram Post Scraper Pro
Pricing
from $1.77 / 1,000 results
πΈ Instagram Post Scraper Pro
Instagram posts with captions, hashtags, engagement, media URLs, and MCP-ready metadata. Desktop+embed fallback chain. Hashtag categorization. 3 modes.
Pricing
from $1.77 / 1,000 results
Rating
0.0
(0)
Developer
Virtual Footprint LLC
Maintained by CommunityActor stats
0
Bookmarked
1
Total users
0
Monthly active users
2 days ago
Last modified
Categories
Share
Instagram Post Scraper Pro
Instagram post intelligence: captions, hashtags, engagement, media URLs, hashtag categorization, and MCP-ready
providerHealthmetadata. Desktop+embed fallback chain. No login cookies required.
Why This Actor Is Better
Competitor comparison
| Feature | This Actor | Apify IG Post Scraper | Free scrapers | Social analytics SaaS |
|---|---|---|---|---|
| No login cookies required | β | β | β | n/a |
| Desktop+embed fallback | β | β | β | n/a |
| Hashtag categorization | β 7 categories | β | β | β paid |
| Engagement-rate-per-post | β | β | β | β paid |
| Caption sentiment | β | β | β | β paid |
| Optional Google Vision tags | β user key | β | β | β paid |
| Confidence score (0-1) | β | β | β | β |
| MCP-ready metadata | β
providerHealth | β | β | β |
| Price / 1K posts | $1.77 | ~$2.50 | free (rate-limited) | ~$5-20 |
Key Features
- π‘οΈ Multi-API fallback chain β Instagram desktop (Playwright) primary with automatic embed HTML fallback.
- π·οΈ Hashtag categorization β auto-maps hashtags to 7 categories (fitness, food, travel, fashion, beauty, business, tech).
- π Engagement scoring β engagement-rate-per-post + caption sentiment on every result.
- π¬ Media detection β image/video/carousel classification + media URL extraction.
- π― Confidence scoring β 0.0β1.0 reliability score.
- π Source attribution β know which providers contributed each field.
- β‘ Cache-first mode β
fast_lookuphits KVS cache (1h TTL). - π€ MCP-ready β
providerHealth{}on every result. - π Optional Google Vision β drop in
GOOGLE_VISION_API_KEYfor image labeling. - π° Transparent PPE pricing β pay only for successful posts.
Architecture
flowchart TDA[Input: usernames + mode] --> B{Cache hit?}B -- yes --> C[Return cached base posts]B -- no --> D[Primary: Instagram desktop Playwright]D -- fails --> E[Fallback: Instagram embed HTML httpx]D --> F[Normalize: caption/media/likes/comments]E --> FF --> G[Enrichment layer]G --> G1[Hashtag extraction + categorization]G --> G2[Engagement-rate scoring]G --> G3[Caption sentiment]G --> G4[Email/URL extraction]G --> G5[Optional: Google Vision image tags]G1 --> H[Confidence scoring + source attribution]G2 --> HG3 --> HG4 --> HG5 --> HH --> I[Progressive dataset push]I --> J[Webhook + MCP-ready metadata]C --> J
Modes
| Mode | Description | Target latency | Use case |
|---|---|---|---|
fast_lookup | Cache-first, base posts only | <800ms cached | Quick counts, dedup |
enrich | Hashtags + engagement + sentiment + optional Vision | ~2-4s/post | Content analysis, trend research |
batch | Queue-based, full enrichment, per-item isolation | varies | Large username lists (100+) |
Input
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
mode | string | β | enrich | fast_lookup | enrich | batch |
queries | array | β | ["cristiano"] | Usernames or profile URLs |
maxResults | integer | β | 25 | Max posts per user (1β1000) |
webhookUrl | string | β | β | Webhook for completion notification |
Example input
{"mode": "enrich","queries": ["cristiano", "leomessi"],"maxResults": 50,"webhookUrl": "https://your-app.com/webhook"}
Output
| Field | Type | Description |
|---|---|---|
query | string | Input query |
username | string | Profile handle |
postUrl | string | Post URL |
author | string | Author handle |
caption | string | Caption text (truncated to 1000 chars) |
imageUrl | string | Media thumbnail URL |
mediaType | string | image | video |
timestamp | string | Post timestamp |
likes | integer | Like count |
comments | integer | Comment count |
hashtags | array | Hashtags in caption |
hashtagCategories | array | Mapped categories (fitness/food/travel/fashion/beauty/business/tech) |
captionSentiment | string | positive | negative | neutral |
engagementRate | number | Likes-to-comments ratio |
imageLabels | array | null | Google Vision labels (if key provided) |
emails | array | Emails in caption |
urls | array | URLs in caption |
confidenceScore | number | 0.0β1.0 reliability |
sources | array | Provider attribution |
providerHealth | object | Per-provider status/latency |
cacheStatus | string | hit | miss | degraded |
mode | string | Execution mode |
extractedAt | string | ISO timestamp |
Example output
{"query": "cristiano","username": "cristiano","postUrl": "https://www.instagram.com/p/Cxxx/","author": "cristiano","caption": "Great game today! β½ #football #fitness","imageUrl": "https://...","mediaType": "image","timestamp": "2026-06-28T10:00:00Z","likes": 1200000,"comments": 8500,"hashtags": ["football", "fitness"],"hashtagCategories": ["fitness"],"captionSentiment": "positive","engagementRate": 141.18,"emails": [],"urls": [],"confidenceScore": 0.85,"sources": ["instagram", "hashtag_analyzer", "engagement_scorer"],"providerHealth": {"instagram_desktop": {"status": "ok", "latency_ms": 4500, "error": null},"hashtag_analyzer": {"status": "ok", "latency_ms": 0, "error": null},"engagement_scorer": {"status": "ok", "latency_ms": 0, "error": null}},"cacheStatus": "miss","mode": "enrich","extractedAt": "2026-06-28T23:55:00.000Z"}
Pricing
| Plan | Price per 1K posts | Savings vs. top competitor |
|---|---|---|
| Leading competitors | ~$2.50/1K | β |
| This actor (β€10K/mo) | $1.77/1K | 29% cheaper |
| This actor (10Kβ100K/mo) | $1.50/1K | 40% cheaper |
| This actor (100K+/mo) | $1.25/1K | 50% cheaper |
Optional event: media_url at $0.20/1K posts with extracted media URL.
Use Cases
- Content trend research β track hashtag categories and engagement across creators
- Competitor analysis β benchmark post engagement and caption sentiment
- Influencer vetting β measure post-level engagement rates before partnerships
- Hashtag strategy β find which categories drive engagement in your niche
- Media monitoring β extract image/video URLs for asset libraries
- MCP agent workflows β
providerHealthlets agents route around failures - Market research β map content categories by creator audience
- Image AI pipelines β optional Google Vision tagging for visual classification
Integration Examples
Python (Apify SDK)
from apify_client import ApifyClientclient = ApifyClient("YOUR_APIFY_TOKEN")run = client.actor("ayeeyee/instagram-post-scraper-pro").call(run_input={"mode": "enrich","queries": ["cristiano", "leomessi"],"maxResults": 50,})for item in client.dataset(run["defaultDatasetId"]).iterate_items():print(f"@{item['username']}: {item['likes']} likes, {item['hashtagCategories']}, {item['captionSentiment']}")
cURL
curl -X POST "https://api.apify.com/v2/acts/ayeeyee~instagram-post-scraper-pro/runs?token=YOUR_TOKEN" \-H "Content-Type: application/json" \-d '{"mode":"enrich","queries":["cristiano"],"maxResults":25}'
MCP (Model Context Protocol)
$npx -y @apify/actors-mcp-server --tools actors,ayeeyee/instagram-post-scraper-pro
Agents can call call-actor and use providerHealth + hashtagCategories + engagementRate to filter and route.
FAQ
Q: Do I need Instagram login cookies? No. Public posts are scraped via DOM and meta tags β no login required. Private accounts return no posts.
Q: How does the desktop+embed fallback work?
If Playwright fails (blocked, timeout, login wall), the actor falls back to fetching profile HTML for post URLs and thumbnails (degraded β no per-post captions). providerHealth shows which provider succeeded.
Q: How are hashtag categories assigned?
Open-source keyword mapping to 7 categories: fitness, food, travel, fashion, beauty, business, tech. A post with #fitness and #gym gets categorized as fitness.
Q: What is engagementRate? Likes-to-comments ratio. Higher = more passive engagement (likes without discussion). Useful for content-type benchmarking.
Q: Can I call this from an LLM agent?
Yes. MCP-ready with providerHealth{}, hashtagCategories, engagementRate, and confidenceScore.
Q: What is the cache TTL?
1 hour for fast_lookup. Enrichment results are not cached (always fresh).
Legal & Compliance
Scrapes publicly available Instagram post data. Does not access private data, bypass authentication, or store credentials. Users are responsible for complying with GDPR/CCPA and Instagram's ToS.
AI-DLC / Data Lifecycle
- Collection β Public data only; respects robots.txt and rate limits.
- Processing β In-memory normalization; no PII logging.
- Storage β Results in user's Apify dataset, not retained by actor.
- Usage β Content analysis, trend research, legitimate marketing.
- Disposal β No long-term caching (1h TTL for base results only).
Enhancement Roadmap (API / MCP Integrations)
- Google Vision image tagging MCP β auto-label post images (optional, user key)
- OpenAI caption summarization MCP β 1-line summaries for feed aggregation
- Chartmetric sound trends MCP β for Reels/audio trend correlation
- LangGraph workflow β IG posts β hashtag clustering β trend alerts
- Vector store β semantic post deduplication across creators
Changelog
- v3.0 β Multi-API orchestration edition: desktop+embed fallback, hashtag categorization, engagement scoring, MCP-ready
providerHealth, optional Google Vision, expanded FAQ, integration examples, volume pricing. - v2.0 β Premium README, AI-DLC docs, confidence scoring, source attribution.
- v1.0 β Initial release with Playwright scraping and hashtag extraction.
Links
- Apify Store: https://apify.com/ayeeyee/instagram-post-scraper-pro
- Actor ID:
CgkI4B4lKRsabJPyW