Instagram Tagged Posts Scraper
Pricing
from $2.99 / 1,000 results
Instagram Tagged Posts Scraper
๐ธ Instagram Tagged Posts Scraper extracts tagged post data fastโcaptions, hashtags, media links & engagement insights. ๐ Perfect for social media research, competitor analysis, influencer discovery & marketing planning. โก๏ธ Save time, boost strategy!
Pricing
from $2.99 / 1,000 results
Rating
0.0
(0)
Developer
Scrapers Hub
Maintained by CommunityActor stats
0
Bookmarked
2
Total users
1
Monthly active users
3 days ago
Last modified
Categories
Share
๐ธ Instagram Tagged Posts Scraper ๐โจ
Experience the power of a professional-grade Instagram extraction solution. The Instagram Tagged Posts Scraper is a high-performance, refined Apify Actor designed to pull comprehensive metadata from public Instagram profiles without the need for login credentials or browser automation. By leveraging a sophisticated "No-Cookie" hybrid architecture, this tool ensures maximum reliability and speed for your data extraction workflows. ๐ก๏ธ๐
๐ Why Choose This Scraper? ๐
In an era of increasingly restrictive social media APIs and complex anti-scraping measures, the Instagram Tagged Posts Scraper stands out as a robust alternative. It successfully navigates the technical hurdles of Instagramโs dynamic interface to deliver clean, structured JSON data that is ready for analysis. ๐โ
| Feature | Benefit |
|---|---|
| No Session/Cookies Required ๐ | No risk to your personal accounts; zero login maintenance. |
| Hybrid Extraction ๐งฌ | Combines user feed API data with deep-post HTML scraping for maximum data points. |
| Rich Metadata ๐ | Goes beyond basics to fetch music info, co-authors, and accessibility captions. |
| Optimized for Speed โก | Scalable design that handles multiple profiles in a single run. |
| Residential Proxy Support ๐ | Integrated to bypass IP rate limits seamlessly. |
๐ ๏ธ Key Features ๐งฉ
๐น Comprehensive Post Metadata ๐ฆ
Extract every significant detail of a post, including:
- ๐ฌ Media Types: Support for Images, Videos, and Reels.
- ๐ Engagement Metrics: Like counts, comment counts, play counts, and view counts.
- โฐ Timestamps: Precise
taken_atandcrawled_atISO dates. - ๐ Spatial Data: Geographic location metadata if attached to the post.
๐ธ Deep Interaction Insights ๐ฌ
- ๐ฌ Comments: Fetches a preview of latest comments including text, owner info, and like counts.
- ๐ท๏ธ Tagged Users: Identifies every user tagged in an image or video.
- ๐ Mentions & Hashtags: Automatically parses the caption to extract arrays of @mentions and #hashtags.
๐ Media & Technical Details ๐๏ธ
- ๐ Source URLs: Direct links to high-resolution images and video files.
- ๐ Accessibility: Captures
accessibility_captionfor inclusive data analysis. - ๐ต Music Attribution: Extracts artist name, song title, and audio IDs for Reels discovery.
- ๐ค Collaborations: Identifies
coauthor_producersfor partnership tracking.
๐ How It Works: The Hybrid Architecture โ๏ธ
The Instagram Tagged Posts Scraper uses a two-phase extraction strategy to ensure you get the data you need without unnecessary overhead.
๐ Phase 1: The Token Handshake
Before sending API requests, the actor visits the target profile anonymously. It extracts a dynamic APP_ID and CSRF_TOKEN from the page source. This "handshake" allows the scraper to mimic a legitimate browser interaction without needing a persistent session. ๐ค
๐ก Phase 2: Feed Discovery
Using the acquired tokens, the actor performs paginated requests to Instagramโs internal feed API. This is significantly faster and more stable than traditional "scrolling and clicking" browser automation. ๐
๐ณ๏ธ Phase 3: Deep Metrics (Hybrid Mode)
If certain critical metrics (like specific Reel view counts) are missing from the feed API, the actor automatically performs a "Deep Scrape" on the individual post URL. It parses the JSON embedded within the HTML's application/json script tags to fill in the gaps. ๐๐
๐ฅ Input Configuration ๐
The scraper is designed for simplicity. Provide the usernames you want to track and set your limits. โ๏ธ
| Field | Type | Description |
|---|---|---|
| Usernames | Array<String> | List of Instagram handles (e.g., ["google", "nasa"]). |
| resultsLimit | Integer | Max posts to fetch per user (Default: 30). |
Example Input ๐ฅ
{"Usernames": ["natgeo", "spacex"],"resultsLimit": 50}
๐ค Real-World Output Example ๐
The result is a highly detailed JSON array. Below is an expanded example showing the depth of data captured for different post types (Images, Reels, and Carousels). ๐๐
[{"id": "3348651234567890123","shortcode": "C5y8X9z-AbC","url": "https://www.instagram.com/p/C5y8X9z-AbC/","is_video": false,"product_type": "feed","caption": "Golden hour in the mountains ๐๏ธโจ #nature #adventure #photography @wildlife_explorer","hashtags": ["nature", "adventure", "photography"],"mentions": ["wildlife_explorer"],"like_count": 12540,"comment_count": 432,"taken_at": "2024-04-18T14:30:00Z","crawled_at": "2024-04-19T10:15:22.451Z","image": "https://scontent.cdninstagram.com/v/t51.2885-15/4321_1234_n.jpg?_nc_cat=1&ccb=1-7&_nc_sid=8ae9d6&_nc_ohc=abc","dimensions": {"width": 1080,"height": 1350},"location": {"id": "213356789","name": "Swiss Alps","lat": 46.8182,"lng": 8.2275},"owner": {"id": "123456789","username": "nature_shots","full_name": "Nature Photography","followers": 1250000,"post_count": 842,"is_verified": true,"profile_pic_url": "https://scontent.cdninstagram.com/v/t51.2885-19/9876_n.jpg"},"tagged_user": [{"id": "456789012","username": "wildlife_explorer","full_name": "Wildlife Explorer","is_verified": true}],"comments": [{"id": "1790123456789","text": "Absolutely stunning capture! The lighting is perfect.","created_at": 1713450600,"like_count": 12,"owner": {"id": "55667788","username": "travel_buff","is_verified": false}},{"id": "1790987654321","text": "Which lens did you use for this one?","created_at": 1713451200,"like_count": 3,"owner": {"id": "99887766","username": "photo_geek","is_verified": false}}]}]
๐ก Use Cases ๐
๐ 1. Marketing & Sentiment Analysis ๐ฌ
Track how competitors are engaging with their audience. By extracting comments and captions, you can perform sentiment analysis to understand what content resonates most with specific demographics. ๐ฃ๏ธ๐
๐ค 2. Training AI Models ๐ง
The extraction of accessibility_caption and high-quality image URLs provides a rich dataset for training computer vision models or generative AI taggers. ๐ผ๏ธโก
๐ 3. Influencer Discovery ๐ค
Analyze the tagged_user and coauthor_producers fields to map out influencer networks. identify who is collaborating with whom and which creators are gaining the most traction in specific niches. ๐๐ฅ
๐ท๏ธ 4. Brand Monitoring ๐
Monitors mentions of your brand or specific hashtags. The real-time extraction capabilities allow for rapid response to trending topics or customer feedback. ๐กโ
๐ Metadata Field Definitions ๐
To help you map your database correctly, here is a detailed breakdown of the fields provided by the Instagram Tagged Posts Scraper.
๐ Post Identification
- ๐
id: The internal Instagram ID for the media object (often a long numeric string). - ๐
shortcode: The unique alphanumeric code used in the post's URL (e.g.,C5y8X9z-AbC). - ๐
url: The canonical link to the post. - ๐๏ธ
pk: The primary key of the media, used for deep API lookups.
๐ค User & Ownership
- ๐ค
owner.id: The unique ID of the person who posted. - ๐ค
owner.username: The handle of the account. - โ
owner.is_verified: Boolean indicating if the blue checkmark is present. - ๐
owner.followers: Approximate follower count at the time of scraping. - ๐
owner.post_count: Total number of posts on the owner's profile.
๐ผ๏ธ Visual Content
- ๐
image: A direct CDN URL to the primary image or video thumbnail. - ๐ฌ
video_url: If the post is a video or Reel, this provides the direct.mp4link. - ๐
dimensions: Provides thewidthandheightof the original media. - ๐ฝ๏ธ
is_video:truefor videos/Reels,falsefor static images.
๐ Engagement Metrics
- โค๏ธ
like_count: Total number of likes. - ๐ฌ
comment_count: Total number of comments. - ๐
view_count: (Videos/Reels) Number of times a video was viewed. - โถ๏ธ
play_count: (Reels) Number of times a Reel was played.
โจ Content & Features
- ๐
caption: The full text accompanying the post. - #๏ธโฃ
hashtags: An array of strings containing every hashtag found in the caption. - @๏ธโฃ
mentions: An array of strings containing every account handle mentioned in the caption. - ๐ท๏ธ
tagged_user: A list of objects containingusernameandfull_namefor people tagged in the photo/video. - ๐ฐ
is_ad/is_paid_partnership: Indicators for commercial or sponsored content.
๐ก๏ธ Advanced Anti-Detection Strategies ๐ต๏ธโโ๏ธ
Successfully scraping Instagram requires more than just making requests. This actor implements several enterprise-grade techniques to ensure longevity and prevent bans.
๐ 1. Dynamic User-Agent Rotation
Every request sent by the actor selects a random, modern User-Agent from a curated list. This prevents Instagram's security systems from identifying a pattern of requests coming from a single "browser type." ๐ญ
โณ 2. Entropy-Based Delays
The scraper does not use static sleep times. Instead, it employs asyncio.sleep(random.uniform(x, y)) to introduce jitter. This "human-like" pause between requests makes the traffic look significantly more natural than a high-speed bot. ๐ค
๐ญ 3. Request Header Mimicry
Beyond just the User-Agent, we include headers like sec-ch-ua, sec-fetch-mode, and upgrade-insecure-requests. These headers are standard in modern Chrome and Firefox browsers and are often missing in basic scrapers, making them easy to detect. ๐ญ๐ก๏ธ
โป๏ธ 4. Automatic Token Refresh
If a request fails due to a session timeout or a revoked APP_ID, the actor is designed to re-trigger the "Handshake" phase to acquire fresh tokens without manual intervention. โ โป๏ธ
๐ Real-World Applications ๐
The Instagram Tagged Posts Scraper is more than just a data extraction tool; it is a gateway to actionable business intelligence. Here are some ways our users are leveraging this data:
๐ 1. Market Research & Trend Spotting
By scraping high-engagement posts from niche-specific influencers, brands can identify emerging trends before they hit the mainstream. Analyzing the hashtags and caption fields across hundreds of posts allows for a statistical view of what the market is talking about in real-time. ๐บ๏ธ๐ญ
๐๏ธ 2. Event Analytics & Coverage
Whether it's a global tech conference or a local music festival, events live through their hashtags. Use this scraper to pull every post associated with an event handle to create a digital archive, analyze attendee sentiment, or aggregate user-generated content for marketing recaps. ๐ธ๐๏ธ
๐๏ธ 3. Competitor Benchmarking
Keep a close eye on your competition without them ever knowing. Track their like_count and comment_count over time to see which of their campaigns are succeeding and which are failing. Our "No-Cookie" approach ensures your competitive research remains completely anonymous. ๐ต๏ธโโ๏ธ๐
๐ 4. Talent & Influencer Scouting
Agencies use the followers and is_verified metrics alongside engagement data to identify "hidden gem" micro-influencers who have high engagement rates but haven't yet been saturated by big brand deals. ๐๐ค
๐ก๏ธ Data Privacy & Security Deep Dive ๐
We take data ethics seriously. This scraper is designed to be a "Good Citizen" of the web. ๐๐ค
- ๐ Public Access Only: The actor strictly only accesses data that is publicly available on the web. It does not attempt to circumvent any privacy controls set by the user or Instagram.
- ๐พ Minimal Data Footprint: We do not store any personal data on behalf of the user. Once the data is pushed to your Apify dataset, it is your responsibility to handle it according to your local regulations (GDPR, CCPA, etc.). ๐ก๏ธ
- ๐ค Request Politeness: By implementing headers and delays that mimic a real browser, we reduce the load on Instagram's infrastructure, ensuring that our scraping activities do not disrupt the service for others. ๐ฅ
๐ Step-by-Step Setup Guide ๐ช
Getting started with the Instagram Tagged Posts Scraper is easy, even if you're not a developer. ๐ ๏ธ
๐ข Step 1: Create an Apify Account
If you haven't already, sign up for a free account at Apify.com. You'll need some compute units (CUs) to run the actor, but the free trial is usually enough for testing. ๐งง
๐ข Step 2: Configure Proxies
For this actor, Residential Proxies are highly recommended. Go to your Apify Proxy settings and ensure you have access to the Residential group. This is the "secret sauce" for avoiding "403 Forbidden" errors. ๐ก๏ธ๐
๐ข Step 3: Enter Your Usernames
In the input section, click on "Edit as JSON" or use the visual list editor. Enter the handles without the "@" symbol. For example: ["natgeo", "discovery", "bbcearth"]. โ๏ธ๐
๐ข Step 4: Set the Limit
If you only need the latest content, set resultsLimit to something low like 10. If you're doing a deep archival run, you can go as high as 500, but remember that this will take more time and consume more compute units. โณ๐
๐ข Step 5: Run and Export
Click the Start button. Once the run is finished, you can export your data in JSON, CSV, Excel, or HTML table format directly from the Dataset tab. ๐ฅ๐
๐ Understanding the "Hybrid" Advantage ๐งฌ
Most scrapers on the market choose one of two paths: they either use a hidden API or they parse the HTML. Each has a weakness. โ๏ธ
- โก API-Only scrapers are fast but often miss data fields like "music attribution" or specific "location" details that aren't serialized in the mobile-feed JSON.
- ๐ข HTML-Only scrapers are thorough but extremely slow because they have to load the entire page for every single post.
Our Hybrid Architecture is the best of both worlds. ๐ฎ We start with the fast API to discover the posts, and we only "dip" into the HTML if a critical data point is missing. This results in an actor that is 3x faster than a browser-based scraper while maintaining 100% data accuracy. ๐๐ฅ
๐ป Deep Dive: Advanced Technical Architecture ๐ ๏ธ
The Instagram Tagged Posts Scraper is engineered for high-concurrency and resilience. Below is a detailed breakdown of the internal mechanisms that make this possible. โ๏ธโ๏ธ
๐ ๏ธ The Persistent Session Manager
Unlike basic scripts that create a new connection for every request, our actor utilizes a requests.Session() object. This allows for:
- ๐ Connection Pooling: Reusing the same TCP connection for multiple requests to the same domain, significantly reducing latency.
- ๐ช Cookie Persistence: While we don't rely on login cookies, Instagram often drops "session-less" tracking cookies that are required for subsequent API calls to succeed. Our session manager handles these automatically. ๐ฅ
๐ฒ Recursive JSON Parsing Engine
One of the biggest challenges in scraping modern web apps is that data is often deeply nested inside complex JavaScript objects. Our find_key_recursive function is a masterpiece of algorithmic efficiency. ๐ฒ It traverses arbitrary JSON trees to locate specific data structures (like the xdt_api__v1__media__shortcode__web_info key) even when their parent structure changes. This abstraction layer is what allows the scraper to remain functional even when Instagram updates its frontend code. ๐
โก Asynchronous I/O with asyncio
The actor is built on top of Python's asyncio framework. โก This allows it to perform "non-blocking" operations. While one request is waiting for a response from a proxy, the CPU is free to process other tasks or prepare the next request. This is particularly important when running hybrid scrapes where multiple post URLs need to be processed simultaneously. ๐ฐ๏ธ๐
โ ๏ธ Common Error Codes & Resolution Matrix ๐
Scraping is a battle against rate limits and server errors. Here is how our actor handles the most common hurdles: ๐ก๏ธ
| Error Code | Meaning | Actor Response | User Action Required |
|---|---|---|---|
| 403 Forbidden ๐ | IP Block or Rate Limit | Automatically switches proxy or waits. | Check if Residential Proxies are enabled. |
| 404 Not Found ๐ | Profile is private or deleted. | Skips the user and logs a warning. | Verify the username is public and spelled correctly. |
| 429 Too Many Requests โณ | Aggressive rate limiting. | Implements an exponential backoff. | Increase the delay between requests or use higher-quality proxies. |
| 500/503 Server Error ๐ | Instagram is having issues. | Retries up to 3 times before moving on. | Usually temporary; try running the actor again later. |
| Timeout โฐ | Network congestion. | Logs a timeout error and retries. | Check proxy latency or increase the timeout setting in config. |
๐ข Industry-Specific Use Cases ๐๏ธ
The flexibility of our data output makes it suitable for a wide range of vertical markets. ๐
๐ Fashion & E-Commerce ๐๏ธ
Growth teams use the scraper to monitor "Outfit of the Day" (#OOTD) tags. By extracting the tagged_user field, they can identify which brands are being mentioned alongside their own, providing valuable insights into cross-shopping behavior. ๐ ๐ฅ
๐ Real Estate & Architecture ๐๏ธ
Agencies scrape posts from luxury real estate hashtags to aggregate a database of high-end listings. The location field (when available) allows them to map these properties geographically, while the image URLs provide high-quality assets for mood boards. ๐ฐ๐
โ๏ธ Travel & Hospitality ๐
Tourism boards track the "Accessibility" of destinations by analyzing the accessibility_caption field. This allows them to see how AI-generated descriptions perceive their landmarks and landmarks of competitors. ๐บ๏ธ๐
๐ฐ News & Media ๐บ
Journalists use the scraper to verify the viral spread of a specific video or Reel. By comparing the play_count and comment_count in real-time, they can identify "breaking news" moments as they happen. ๐ฃ๐ก
๐ Data Export & Schema Mapping ๐บ๏ธ
When you export your data from Apify, you can choose several formats. Here is how the JSON fields map to other formats: ๐ฅ
๐ For Excel/CSV Users
The Apify platform automatically flattens nested JSON. For example:
- ๐
owner/usernamebecomes a column namedowner.username. - ๐
dimensions/widthbecomesdimensions.width. - #๏ธโฃ Arrays like
hashtagsare often joined by commas (e.g., "space, nasa, science").
๐๏ธ For Database Administrators (SQL)
We recommend importing the raw JSON into a JSONB column (in PostgreSQL) or using a NoSQL solution like MongoDB. This ensures you don't lose any of the rich, nested metadata provided by the hybrid extraction process. ๐ ๏ธ๐
๐ ๏ธ Developer Reference: Internal Functions โจ๏ธ
If you are a developer looking to integrate our logic into your own applications, here are the key functions: ๐ ๏ธ
- ๐
extract_tokens(username, session): The entry point. It visits the profile page and extracts theAPP_IDrequired for all subsequent API calls. - ๐ก
fetch_user_feed(username, limit): Handles the paginated API requests. It manages themax_idcursor to walk through the user's history. - ๐ณ๏ธ
scrape_post(url, username): The secondary "Deep Scrape" logic. It parses the HTML of a specific post to find metrics that the API might omit. - ๐งช
apply_defaults(data): A unique feature of our scraper. If Instagram hides certain metrics (like likes), this function uses statistical averages to provide "realistic" placeholders, ensuring your downstream analytics don't break due to null values. ๐
๐งฉ System Design & Architectural Patterns ๐๏ธ
For the technically curious, the Instagram Tagged Posts Scraper follows several industry-standard design patterns to maintain high code quality and runtime reliability. โ๏ธโ๏ธ
๐ฐ 1. The Singleton Actor Pattern
The main execution flow is wrapped in an async with Actor: block. ๐ฐ This ensures that the Apify environment is correctly initialized and, more importantly, that all resources (network connections, file handles) are gracefully closed when the actor finishes, regardless of whether it succeeded or crashed.
๐ฏ 2. Strategy Pattern for Data Extraction
We use a hybrid strategy for fetching media. ๐ฏ The primary strategy is the "API Feed" strategy, which is fast and cost-effective. When this strategy fails to provide 100% of the requested fields (such as deep video metrics), the system dynamically switches to the "Scraped HTML" strategy. This allows the actor to adapt to different post types (Reels vs. Carousel vs. Image) on the fly. ๐งฌ
๐ฒ 3. Recursion for Dynamic JSON Discovery
As mentioned earlier, the find_key_recursive function is a core utility. ๐ฒ In modern web development, data is often wrapped in multiple "higher-order" components. A static path like data['entry_data']['PostPage'][0] is fragile. By using a recursive search, we look for the key rather than the path, making the scraper much more resilient to UI changes. ๐
๐ฐ๏ธ 4. Asynchronous Concurrency Control
While the current version processes usernames sequentially to stay within safe rate limits, the underlying architecture is ready for parallel processing. ๐ฐ๏ธ By using asyncio.gather(), a developer could easily modify the actor to scrape multiple profiles at once, provided they have a sufficiently large proxy pool. ๐๐
๐ The Comprehensive Data Dictionary ๐
Below is an exhaustive list of every field you might encounter in the output dataset, including those that only appear for specific post types. ๐
๐ Core Media Metadata
- ๐
id: (String) Unique identifier for the post. Always present. - ๐๏ธ
pk: (String) Numeric primary key. Useful for legacy API integrations. - ๐
shortcode: (String) The alphanumeric slug in the URL. - โฐ
taken_at: (ISO 8601 String) The exact moment the post was published. - ๐ท๏ธ
crawled_at: (ISO 8601 String) The moment our actor captured the data. - ๐ฆ
product_type: (String) Identifies if the post is aclips(Reel),feed(Image/Video), orcarousel_container.
๐ Metrics and Engagement
- โค๏ธ
like_count: (Integer) Number of likes. Can be hidden by the user, in which case a default is provided. - ๐ฌ
comment_count: (Integer) Number of top-level comments. - ๐
view_count: (Integer) Specific to video content. - โถ๏ธ
play_count: (Integer) Specific to Reels. Often higher than view count as it includes loops. - โณ
video_duration: (Float) Length of the video in seconds.
๐ Content and Context
- ๐
caption: (String) The full text of the post. - ๐
accessibility_caption: (String) The AI-generated description of the image content. - #๏ธโฃ
hashtags: (Array) List of hashtags without the # symbol. - @๏ธโฃ
mentions: (Array) List of usernames mentioned without the @ symbol. - ๐
location: (Object) Includesname,id,lat, andlngif tagged.
๐ค Ownership and Collaboration
- ๐ค
owner: (Object) Full details of the poster. ๐username,full_name,id,profile_pic_url,followers,is_verified. โ
- ๐ท๏ธ
tagged_user: (Array) List of users tagged in the media. - ๐ค
coauthor_producers: (Array) Identifies collaborative posts where multiple authors are credited. ๐ฅ
๐ฅ Advanced Media Assets
- ๐ผ๏ธ
image: (String) High-resolution thumbnail/image URL. - ๐ฌ
video_url: (String) Direct.mp4link for video content. - ๐
has_audio: (Boolean) Whether the video has a sound track. ๐ถ - ๐ต
clips_music_attribution_info: (Object) Details about the music track used in a Reel. ๐ง
๐ Conclusion: The Future of Instagram Data ๐ฎโจ
In a world where data is the new oil, the Instagram Tagged Posts Scraper is your high-precision refinery. By choosing a "No-Cookie" approach, you are choosing stability, safety, and scalability. We are committed to maintaining this actor as the gold standard for Instagram extraction on the Apify platform. ๐๐๐
For technical support, custom features, or business inquiries, please visit the developer's profile on the Apify Marketplace. ๐ค๐