Instagram Keyword Search Scraper
Pricing
from $5.00 / 1,000 results
Instagram Keyword Search Scraper
Extract posts from Instagram keyword search results. Scrape post URLs, captions, usernames, media URLs, hashtags, engagement metrics, and more. Supports multiple keywords with anti-detection features.
Pricing
from $5.00 / 1,000 results
Rating
5.0
(3)
Developer

Crawler Bros
Actor stats
0
Bookmarked
15
Total users
8
Monthly active users
6 days ago
Last modified
Categories
Share
Extract posts from Instagram keyword search results with this powerful Apify actor. Search for any keywords and scrape post URLs, captions, usernames, media URLs, hashtags, engagement metrics, and more.
Features
- Keyword-Based Search: Search Instagram for any keywords or phrases
- Multiple Keywords: Process multiple keywords in a single run
- Comprehensive Data Extraction: Extract post IDs, URLs, captions, usernames, media URLs, hashtags, mentions, and media types
- Guaranteed Username Extraction: Automatic fallback to individual post pages ensures 100% username availability
- Anti-Detection: Built-in human behavior simulation to avoid rate limiting
- Infinite Scroll: Automatically scrolls to load more results
- Deduplication: Automatically removes duplicate posts
- Flexible Configuration: Customize delays, post limits, and behavior settings
- Cookie Authentication: Support for authenticated sessions to access search results (required)
- Session Management: Save and reuse cookies between runs
- Multiple Extraction Methods: Tries JSON/GraphQL extraction first, falls back to HTML parsing
Authentication (Required)
Instagram requires authentication to access keyword search results. You must provide cookies from an active Instagram session.
How to Extract Instagram Cookies
Method 1: Using Browser DevTools (Recommended)
- Open Instagram in your browser and log in
- Press
F12to open Developer Tools - Go to the Application tab (Chrome) or Storage tab (Firefox)
- Click on Cookies →
https://www.instagram.com - Find and copy these important cookies:
sessionid(most important)ds_user_idcsrftoken
- Format them as JSON:
[{"name": "sessionid","value": "YOUR_SESSION_ID_VALUE","domain": ".instagram.com","path": "/","secure": true,"httpOnly": true},{"name": "ds_user_id","value": "YOUR_USER_ID","domain": ".instagram.com","path": "/","secure": true},{"name": "csrftoken","value": "YOUR_CSRF_TOKEN","domain": ".instagram.com","path": "/","secure": true}]
Method 2: Using EditThisCookie Extension (Easiest!)
- Install EditThisCookie for Chrome
- Log in to Instagram
- Click the EditThisCookie icon
- Click "Export" button (bottom right)
- Paste the entire JSON directly into the
cookiesinput field
The scraper automatically converts browser cookie formats! No need to manually clean or reformat - just paste the raw export.
Method 3: Using Cookie-Editor Extension
- Install Cookie-Editor for Chrome/Firefox
- Log in to Instagram
- Click the extension icon
- Click "Export" → "JSON"
- Copy and paste into the
cookiesfield
Security Notes
- Never share your cookies - they provide full access to your Instagram account
- Use a dedicated Instagram account for scraping (not your personal account)
- Cookies expire after some time - you'll need to refresh them periodically
- Store cookies securely and don't commit them to version control
Input Configuration
The scraper accepts the following input parameters:
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
keywords | Array | Yes | - | List of keywords or phrases to search for |
maxPosts | Integer | No | 20 | Maximum number of posts to extract per keyword (1-500) |
minDelayBetweenRequests | Integer | No | 2 | Minimum delay in seconds between actions (1-30) |
maxDelayBetweenRequests | Integer | No | 5 | Maximum delay in seconds between actions (1-60) |
humanizeBehavior | Boolean | No | true | Enable human-like behavior simulation |
cookies | String | Highly Recommended | - | Instagram cookies in JSON format (required for search access) |
sessionName | String | No | "default_session" | Session name for saving/loading cookies between runs |
Example Input
{"keywords": ["living in dubai","travel photography","food recipes"],"maxPosts": 50,"minDelayBetweenRequests": 2,"maxDelayBetweenRequests": 5,"humanizeBehavior": true,"cookies": "[{\"name\":\"sessionid\",\"value\":\"YOUR_SESSION_ID\",\"domain\":\".instagram.com\",\"path\":\"/\",\"secure\":true,\"httpOnly\":true}]","sessionName": "my_instagram_session"}
Output Format
The scraper outputs a dataset with one row per post. Each post contains:
{"post_id": "DBq4D_QIlEH","post_url": "https://www.instagram.com/p/DBq4D_QIlEH/","username": "travel_photographer","user_url": "https://www.instagram.com/travel_photographer/","caption": "Amazing sunset at the beach! #travel #photography @friend_username","posted_date": null,"location": null,"media_type": "image","media_count": 1,"thumbnail_url": "https://scontent.cdninstagram.com/v/t39.30808-6/...","media_urls": ["https://scontent.cdninstagram.com/v/t39.30808-6/..."],"hashtags": ["travel", "photography"],"mentions": ["friend_username"],"likes_count": 0,"comments_count": 0,"views_count": 0,"is_ad": false,"is_carousel": false,"search_keyword": "travel","scraped_at": "2025-11-21T12:28:25.052408","source": "instagram_keyword_search"}
Output Fields
| Field | Type | Availability | Description |
|---|---|---|---|
post_id | String | ✅ Always | Instagram post shortcode/ID |
post_url | String | ✅ Always | Full URL to the post |
username | String | ✅ Always | Username of the post author (fetched with fallback) |
user_url | String | ✅ Always | URL to the user's profile |
caption | String | ✅ Usually | Post caption/text (when available) |
media_type | String | ✅ Always | Type: "image", "video", or "carousel" |
media_count | Integer | ✅ Always | Number of media items (1 for single posts) |
thumbnail_url | String | ✅ Always | URL to post thumbnail image |
media_urls | Array | ✅ Always | List of media URLs (contains at least thumbnail) |
hashtags | Array | ✅ Always | List of hashtags used in caption (empty if none) |
mentions | Array | ✅ Always | List of mentioned usernames (empty if none) |
is_carousel | Boolean | ✅ Always | Whether the post contains multiple media items |
search_keyword | String | ✅ Always | The keyword used to find this post |
scraped_at | String | ✅ Always | ISO timestamp when data was scraped |
source | String | ✅ Always | Data source identifier |
posted_date | String | ⚠️ Limited* | ISO timestamp when post was created |
location | String | ⚠️ Limited* | Location tag (if available) |
likes_count | Integer | ⚠️ Limited* | Number of likes |
comments_count | Integer | ⚠️ Limited* | Number of comments |
views_count | Integer | ⚠️ Limited* | Number of views (videos only) |
is_ad | Boolean | ⚠️ Limited* | Whether the post is an advertisement |
*Limited Availability: These fields are often not available in Instagram's keyword search results. Instagram intentionally restricts access to engagement metrics, post dates, and location data in search results to prevent scraping. These fields may return null or 0 values. To access this data reliably, you would need to:
- Use Instagram's official Graph API (requires business account and API approval)
- Navigate to individual post pages (slower and may trigger rate limits)
- Access Instagram while logged in and parse dynamically loaded data (unreliable)
Note: The scraper automatically attempts to fill missing usernames by visiting individual post pages as a fallback, ensuring usernames are available for all posts.
Use Cases
- Market Research: Analyze trending topics and popular content
- Competitor Analysis: Monitor competitor activity and engagement
- Content Discovery: Find inspiration for your own content
- Brand Monitoring: Track mentions and hashtag usage
- Influencer Research: Discover influencers in specific niches
- Trend Analysis: Identify emerging trends and popular topics
Anti-Detection Features
The scraper includes several anti-detection measures:
- Human Behavior Simulation: Random mouse movements and scrolling
- Random Delays: Configurable delays between actions
- Stealth Mode: Browser fingerprint masking
- User-Agent Rotation: Realistic browser identification
- Rate Limit Handling: Automatic detection and response to blocks
Rate Limiting
Instagram has rate limits to prevent scraping. To minimize the risk:
- Use reasonable delays (2-5 seconds recommended)
- Enable
humanizeBehavioroption - Don't request too many posts at once
- Spread your scraping over time
- Monitor for "Action Blocked" warnings
Technical Details
- Browser: Firefox with Playwright
- Language: Python 3.12
- Dependencies: Apify SDK, Playwright, BeautifulSoup4
- Architecture: Async/await pattern for efficient I/O
Local Development
Prerequisites
- Python 3.12+
- Apify CLI (optional)
Installation
# Install dependenciespip install -r requirements.txt# Install Playwright browsersplaywright install firefoxplaywright install-deps firefox
Running Locally
# Run the scraperpython -m src
Input Configuration (Local)
Create storage/key_value_stores/default/INPUT.json:
{"keywords": ["test keyword"],"maxPosts": 10,"humanizeBehavior": true}
Limitations
Data Availability:
- Engagement metrics (likes, comments, views) are not available in keyword search results
- Post dates and locations are typically not included in search result HTML
- These limitations are due to Instagram's intentional restrictions to prevent scraping
- Use Instagram's official Graph API for reliable access to engagement data
Other Limitations:
- Instagram's search results are limited by their algorithm
- Posts from private accounts are not accessible
- Rate limiting may occur with excessive requests
- Instagram may change their page structure, requiring updates
- Cookies expire periodically and need to be refreshed
What IS Available:
- ✅ Post IDs and URLs
- ✅ Usernames (with automatic fallback extraction)
- ✅ Captions, hashtags, and mentions
- ✅ Media URLs and thumbnails
- ✅ Media types (image, video, carousel)
Support
For issues, questions, or feature requests:
- Check the logs for error messages
- Verify your input configuration
- Ensure keywords are valid and not empty
- Try reducing
maxPostsif encountering rate limits
Version History
1.0 (2025-11-21)
- Initial release
- Keyword-based search support
- Multiple extraction methods (JSON + HTML)
- Anti-detection features
- Comprehensive data extraction
License
This actor is provided as-is for educational and research purposes. Users are responsible for complying with Instagram's Terms of Service and robots.txt file.
Note: Web scraping may be subject to legal restrictions in your jurisdiction. Always ensure you have the right to scrape data and comply with the website's terms of service.