Scrapeclaw - Instagram Scraper
Pricing
from $1.00 / actor start
Scrapeclaw - Instagram Scraper
Part of ScrapeClaw (https://scrapeclaw.cc/) — a suite of production-ready, agentic social media scrapers for Instagram, YouTube, X/Twitter, TikTok, and Facebook. Built with Python & Playwright. No API keys required.
Pricing
from $1.00 / actor start
Rating
0.0
(0)
Developer

Scrapeclaw
Actor stats
0
Bookmarked
1
Total users
0
Monthly active users
14 days ago
Last modified
Categories
Share
📸 Instagram Profile Scraper
Part of ScrapeClaw — a suite of production-ready, agentic social media scrapers for Instagram, YouTube, X/Twitter, and Facebook. Built with Python & Playwright. No API keys required.
What Is This?
A browser-based Instagram scraper that discovers and extracts structured data from public Instagram profiles — without any official API. It uses Playwright for full browser automation with built-in anti-detection, fingerprinting, and human behavior simulation to scrape at scale reliably.
Two-phase workflow:
- Discovery — Find Instagram profiles by location and category via Google Custom Search
- Scraping — Extract full profile data, stats, posts, and media using a real browser session
Features
| Feature | Description |
|---|---|
| 🔍 Discovery | Find profiles by city and category automatically |
| 🌐 Browser Simulation | Full Playwright browser — renders JavaScript, handles logins |
| 🛡️ Anti-Detection | Browser fingerprinting, stealth scripts, human behavior simulation |
| 📊 Rich Data | Profile info, follower counts, bios, posts, engagement stats |
| 🖼️ Media Download | Profile pics and content thumbnails saved locally |
| 💾 Flexible Export | JSON and CSV output formats |
| 🔄 Resume Support | Checkpoint-based resume for interrupted sessions |
| ⚡ Smart Filtering | Auto-skip private accounts, low-follower profiles, empty accounts |
| 🔁 Session Reuse | Saves login state to skip re-login on subsequent runs |
| 🌍 Residential Proxy | Built-in proxy manager supporting 4 major providers |
Installation
# Clone the repositorygit clone https://github.com/Scrapeclaw/instagram-scraper.gitcd instagram-scraper# Install Python dependenciespip install -r requirements.txt# Install Playwright browsersplaywright install chromium
Environment Setup
Create a .env file in the project root:
# Instagram credentials (required)INSTAGRAM_USERNAME=your_usernameINSTAGRAM_PASSWORD=your_password# Google Custom Search API (optional, for discovery)GOOGLE_API_KEY=your_google_api_keyGOOGLE_SEARCH_ENGINE_ID=your_search_engine_id# Residential proxy (optional — see Proxy section below)PROXY_ENABLED=falsePROXY_PROVIDER=brightdataPROXY_USERNAME=your_proxy_userPROXY_PASSWORD=your_proxy_passPROXY_COUNTRY=usPROXY_STICKY=true
Usage
Discover Profiles
# Discover fashion influencers in Miamipython main.py discover --location "Miami" --category "fashion"# Discover fitness influencers in New Yorkpython main.py discover --location "New York" --category "fitness"# Return JSON output (for agent integration)python main.py discover --location "Miami" --category "fitness" --output json
Scrape
# Scrape a single profile by usernamepython main.py scrape --username influencer123# Scrape from a discovery queue filepython main.py scrape data/queue/Miami_fashion_20260220.json# Run headlesspython main.py scrape --username influencer123 --headless
Manage & Export
# List available queue filespython main.py list# Export all scraped data to JSON + CSVpython main.py export --format both
Output Data
Each scraped profile is saved to data/output/{username}.json:
{"username": "example_user","full_name": "Example User","bio": "Fashion blogger | NYC","followers": 125000,"following": 1500,"posts_count": 450,"is_verified": false,"is_private": false,"influencer_tier": "mid","category": "fashion","location": "New York","profile_pic_local": "thumbnails/example_user/profile_abc123.jpg","content_thumbnails": ["thumbnails/example_user/content_1_def456.jpg","thumbnails/example_user/content_2_ghi789.jpg"],"post_engagement": [{"post_url": "https://instagram.com/p/ABC123/", "likes": 5420, "comments": 89}],"scrape_timestamp": "2026-02-20T14:30:00"}
Influencer Tiers
| Tier | Followers |
|---|---|
| nano | < 1,000 |
| micro | 1,000 – 10,000 |
| mid | 10,000 – 100,000 |
| macro | 100,000 – 1M |
| mega | > 1,000,000 |
🌐 Residential Proxy (Recommended for Scale)
Running long scraping sessions without a residential proxy will get your IP blocked. The built-in proxy manager handles rotation, sticky sessions, and country targeting automatically.
Why Use a Residential Proxy?
- ✅ Avoid IP bans — residential IPs look like real users to Instagram
- ✅ Rotate IPs automatically on every request or session
- ✅ Sticky sessions — keep the same IP during a login session
- ✅ Geo-target by country for locale-accurate content
- ✅ 95%+ success rates vs ~30% with datacenter proxies
Recommended Providers
We have affiliate partnerships with the following providers. Using these links supports this project at no extra cost to you:
| Provider | Highlights | Sign Up |
|---|---|---|
| Bright Data | World's largest network, 72M+ IPs, enterprise-grade | 👉 Get Bright Data |
| IProyal | Pay-as-you-go, 195+ countries, no traffic expiry | 👉 Get IProyal |
| Storm Proxies | Fast & reliable, developer-friendly API, competitive pricing | 👉 Get Storm Proxies |
| NetNut | ISP-grade network, 52M+ IPs, direct connectivity | 👉 Get NetNut |
These are affiliate links. We may earn a commission at no extra cost to you.
Enabling the Proxy
Option 1 — Environment variables (recommended):
export PROXY_ENABLED=trueexport PROXY_PROVIDER=brightdata # brightdata | iproyal | stormproxies | netnut | customexport PROXY_USERNAME=your_proxy_userexport PROXY_PASSWORD=your_proxy_passexport PROXY_COUNTRY=us # optionalexport PROXY_STICKY=true # keeps same IP per session
Option 2 — config/scraper_config.json:
{"proxy": {"enabled": true,"provider": "brightdata","country": "us","sticky": true,"sticky_ttl_minutes": 10}}
Set credentials via env vars (PROXY_USERNAME, PROXY_PASSWORD) — never hardcode them in the config file.
Provider Host/Port Reference
| Provider | Host | Port |
|---|---|---|
| Bright Data | brd.superproxy.io | 22225 |
| IProyal | proxy.iproyal.com | 12321 |
| Storm Proxies | rotating.stormproxies.com | 9999 |
| NetNut | gw-resi.netnut.io | 5959 |
Once configured, the scraper uses the proxy automatically — no extra flags needed. The log confirms it:
INFO - Proxy enabled: <ProxyManager provider=brightdata enabled host=brd.superproxy.io:22225>INFO - Browser using proxy: brightdata → brd.superproxy.io:22225
Configuration Reference
Edit config/scraper_config.json to customise behaviour:
{"proxy": {"enabled": false,"provider": "brightdata","country": "","sticky": true,"sticky_ttl_minutes": 10},"google_search": {"enabled": true,"api_key": "","search_engine_id": "","queries_per_location": 3},"scraper": {"headless": false,"min_followers": 1000,"download_thumbnails": true,"max_thumbnails": 6,"delay_between_profiles": [5, 10],"timeout": 60000}}
Project Structure
instagram-scraper/├── main.py # CLI entry point├── scraper.py # Playwright browser scraper├── discovery.py # Google-based profile discovery├── anti_detection.py # Fingerprinting & stealth├── proxy_manager.py # Residential proxy integration├── config/│ └── scraper_config.json├── data/│ ├── output/ # Scraped JSON files│ ├── queue/ # Discovery queue files│ └── browser_fingerprints.json└── thumbnails/ # Downloaded profile & content images
Part of ScrapeClaw
This scraper is one of several tools in the ScrapeClaw collection:
| Scraper | Description | Links |
|---|---|---|
| Profiles, posts, media & follower counts | GitHub · ClawHub | |
| Pages, groups, posts & engagement data | GitHub · ClawHub | |
| 🎥 YouTube | Channels, subscribers & video metadata | GitHub · ClawHub |
| 🐦 X / Twitter | Tweets, profiles & engagement metrics | GitHub · ClawHub |
All scrapers share the same anti-detection foundation, proxy support, and JSON/CSV export pipeline.
☕ Support This Project
If this tool saves you time or helps your workflow, consider buying me a coffee — it keeps the project maintained and new scrapers coming!
Disclaimer
This tool is intended for scraping publicly available data only. Always comply with Instagram's Terms of Service and your local data privacy regulations. The author is not responsible for any misuse.
Built by ScrapeClaw · View all scrapers


