Instagram Keyword Search Scraper avatar
Instagram Keyword Search Scraper

Pricing

from $5.00 / 1,000 results

Go to Apify Store
Instagram Keyword Search Scraper

Instagram Keyword Search Scraper

Extract posts from Instagram keyword search results. Scrape post URLs, captions, usernames, media URLs, hashtags, engagement metrics, and more. Supports multiple keywords with anti-detection features.

Pricing

from $5.00 / 1,000 results

Rating

5.0

(3)

Developer

Crawler Bros

Crawler Bros

Maintained by Community

Actor stats

0

Bookmarked

15

Total users

8

Monthly active users

6 days ago

Last modified

Share

Extract posts from Instagram keyword search results with this powerful Apify actor. Search for any keywords and scrape post URLs, captions, usernames, media URLs, hashtags, engagement metrics, and more.

Features

  • Keyword-Based Search: Search Instagram for any keywords or phrases
  • Multiple Keywords: Process multiple keywords in a single run
  • Comprehensive Data Extraction: Extract post IDs, URLs, captions, usernames, media URLs, hashtags, mentions, and media types
  • Guaranteed Username Extraction: Automatic fallback to individual post pages ensures 100% username availability
  • Anti-Detection: Built-in human behavior simulation to avoid rate limiting
  • Infinite Scroll: Automatically scrolls to load more results
  • Deduplication: Automatically removes duplicate posts
  • Flexible Configuration: Customize delays, post limits, and behavior settings
  • Cookie Authentication: Support for authenticated sessions to access search results (required)
  • Session Management: Save and reuse cookies between runs
  • Multiple Extraction Methods: Tries JSON/GraphQL extraction first, falls back to HTML parsing

Authentication (Required)

Instagram requires authentication to access keyword search results. You must provide cookies from an active Instagram session.

How to Extract Instagram Cookies

Method 1: Using Browser DevTools (Recommended)

  1. Open Instagram in your browser and log in
  2. Press F12 to open Developer Tools
  3. Go to the Application tab (Chrome) or Storage tab (Firefox)
  4. Click on Cookieshttps://www.instagram.com
  5. Find and copy these important cookies:
    • sessionid (most important)
    • ds_user_id
    • csrftoken
  6. Format them as JSON:
[
{
"name": "sessionid",
"value": "YOUR_SESSION_ID_VALUE",
"domain": ".instagram.com",
"path": "/",
"secure": true,
"httpOnly": true
},
{
"name": "ds_user_id",
"value": "YOUR_USER_ID",
"domain": ".instagram.com",
"path": "/",
"secure": true
},
{
"name": "csrftoken",
"value": "YOUR_CSRF_TOKEN",
"domain": ".instagram.com",
"path": "/",
"secure": true
}
]

Method 2: Using EditThisCookie Extension (Easiest!)

  1. Install EditThisCookie for Chrome
  2. Log in to Instagram
  3. Click the EditThisCookie icon
  4. Click "Export" button (bottom right)
  5. Paste the entire JSON directly into the cookies input field

The scraper automatically converts browser cookie formats! No need to manually clean or reformat - just paste the raw export.

Method 3: Using Cookie-Editor Extension

  1. Install Cookie-Editor for Chrome/Firefox
  2. Log in to Instagram
  3. Click the extension icon
  4. Click "Export" → "JSON"
  5. Copy and paste into the cookies field

Security Notes

  • Never share your cookies - they provide full access to your Instagram account
  • Use a dedicated Instagram account for scraping (not your personal account)
  • Cookies expire after some time - you'll need to refresh them periodically
  • Store cookies securely and don't commit them to version control

Input Configuration

The scraper accepts the following input parameters:

FieldTypeRequiredDefaultDescription
keywordsArrayYes-List of keywords or phrases to search for
maxPostsIntegerNo20Maximum number of posts to extract per keyword (1-500)
minDelayBetweenRequestsIntegerNo2Minimum delay in seconds between actions (1-30)
maxDelayBetweenRequestsIntegerNo5Maximum delay in seconds between actions (1-60)
humanizeBehaviorBooleanNotrueEnable human-like behavior simulation
cookiesStringHighly Recommended-Instagram cookies in JSON format (required for search access)
sessionNameStringNo"default_session"Session name for saving/loading cookies between runs

Example Input

{
"keywords": [
"living in dubai",
"travel photography",
"food recipes"
],
"maxPosts": 50,
"minDelayBetweenRequests": 2,
"maxDelayBetweenRequests": 5,
"humanizeBehavior": true,
"cookies": "[{\"name\":\"sessionid\",\"value\":\"YOUR_SESSION_ID\",\"domain\":\".instagram.com\",\"path\":\"/\",\"secure\":true,\"httpOnly\":true}]",
"sessionName": "my_instagram_session"
}

Output Format

The scraper outputs a dataset with one row per post. Each post contains:

{
"post_id": "DBq4D_QIlEH",
"post_url": "https://www.instagram.com/p/DBq4D_QIlEH/",
"username": "travel_photographer",
"user_url": "https://www.instagram.com/travel_photographer/",
"caption": "Amazing sunset at the beach! #travel #photography @friend_username",
"posted_date": null,
"location": null,
"media_type": "image",
"media_count": 1,
"thumbnail_url": "https://scontent.cdninstagram.com/v/t39.30808-6/...",
"media_urls": ["https://scontent.cdninstagram.com/v/t39.30808-6/..."],
"hashtags": ["travel", "photography"],
"mentions": ["friend_username"],
"likes_count": 0,
"comments_count": 0,
"views_count": 0,
"is_ad": false,
"is_carousel": false,
"search_keyword": "travel",
"scraped_at": "2025-11-21T12:28:25.052408",
"source": "instagram_keyword_search"
}

Output Fields

FieldTypeAvailabilityDescription
post_idString✅ AlwaysInstagram post shortcode/ID
post_urlString✅ AlwaysFull URL to the post
usernameString✅ AlwaysUsername of the post author (fetched with fallback)
user_urlString✅ AlwaysURL to the user's profile
captionString✅ UsuallyPost caption/text (when available)
media_typeString✅ AlwaysType: "image", "video", or "carousel"
media_countInteger✅ AlwaysNumber of media items (1 for single posts)
thumbnail_urlString✅ AlwaysURL to post thumbnail image
media_urlsArray✅ AlwaysList of media URLs (contains at least thumbnail)
hashtagsArray✅ AlwaysList of hashtags used in caption (empty if none)
mentionsArray✅ AlwaysList of mentioned usernames (empty if none)
is_carouselBoolean✅ AlwaysWhether the post contains multiple media items
search_keywordString✅ AlwaysThe keyword used to find this post
scraped_atString✅ AlwaysISO timestamp when data was scraped
sourceString✅ AlwaysData source identifier
posted_dateString⚠️ Limited*ISO timestamp when post was created
locationString⚠️ Limited*Location tag (if available)
likes_countInteger⚠️ Limited*Number of likes
comments_countInteger⚠️ Limited*Number of comments
views_countInteger⚠️ Limited*Number of views (videos only)
is_adBoolean⚠️ Limited*Whether the post is an advertisement

*Limited Availability: These fields are often not available in Instagram's keyword search results. Instagram intentionally restricts access to engagement metrics, post dates, and location data in search results to prevent scraping. These fields may return null or 0 values. To access this data reliably, you would need to:

  • Use Instagram's official Graph API (requires business account and API approval)
  • Navigate to individual post pages (slower and may trigger rate limits)
  • Access Instagram while logged in and parse dynamically loaded data (unreliable)

Note: The scraper automatically attempts to fill missing usernames by visiting individual post pages as a fallback, ensuring usernames are available for all posts.

Use Cases

  • Market Research: Analyze trending topics and popular content
  • Competitor Analysis: Monitor competitor activity and engagement
  • Content Discovery: Find inspiration for your own content
  • Brand Monitoring: Track mentions and hashtag usage
  • Influencer Research: Discover influencers in specific niches
  • Trend Analysis: Identify emerging trends and popular topics

Anti-Detection Features

The scraper includes several anti-detection measures:

  • Human Behavior Simulation: Random mouse movements and scrolling
  • Random Delays: Configurable delays between actions
  • Stealth Mode: Browser fingerprint masking
  • User-Agent Rotation: Realistic browser identification
  • Rate Limit Handling: Automatic detection and response to blocks

Rate Limiting

Instagram has rate limits to prevent scraping. To minimize the risk:

  • Use reasonable delays (2-5 seconds recommended)
  • Enable humanizeBehavior option
  • Don't request too many posts at once
  • Spread your scraping over time
  • Monitor for "Action Blocked" warnings

Technical Details

  • Browser: Firefox with Playwright
  • Language: Python 3.12
  • Dependencies: Apify SDK, Playwright, BeautifulSoup4
  • Architecture: Async/await pattern for efficient I/O

Local Development

Prerequisites

  • Python 3.12+
  • Apify CLI (optional)

Installation

# Install dependencies
pip install -r requirements.txt
# Install Playwright browsers
playwright install firefox
playwright install-deps firefox

Running Locally

# Run the scraper
python -m src

Input Configuration (Local)

Create storage/key_value_stores/default/INPUT.json:

{
"keywords": ["test keyword"],
"maxPosts": 10,
"humanizeBehavior": true
}

Limitations

Data Availability:

  • Engagement metrics (likes, comments, views) are not available in keyword search results
  • Post dates and locations are typically not included in search result HTML
  • These limitations are due to Instagram's intentional restrictions to prevent scraping
  • Use Instagram's official Graph API for reliable access to engagement data

Other Limitations:

  • Instagram's search results are limited by their algorithm
  • Posts from private accounts are not accessible
  • Rate limiting may occur with excessive requests
  • Instagram may change their page structure, requiring updates
  • Cookies expire periodically and need to be refreshed

What IS Available:

  • ✅ Post IDs and URLs
  • ✅ Usernames (with automatic fallback extraction)
  • ✅ Captions, hashtags, and mentions
  • ✅ Media URLs and thumbnails
  • ✅ Media types (image, video, carousel)

Support

For issues, questions, or feature requests:

  1. Check the logs for error messages
  2. Verify your input configuration
  3. Ensure keywords are valid and not empty
  4. Try reducing maxPosts if encountering rate limits

Version History

1.0 (2025-11-21)

  • Initial release
  • Keyword-based search support
  • Multiple extraction methods (JSON + HTML)
  • Anti-detection features
  • Comprehensive data extraction

License

This actor is provided as-is for educational and research purposes. Users are responsible for complying with Instagram's Terms of Service and robots.txt file.


Note: Web scraping may be subject to legal restrictions in your jurisdiction. Always ensure you have the right to scrape data and comply with the website's terms of service.