Scrapeclaw - Twitter Scraper
Pricing
from $1.00 / actor start
Scrapeclaw - Twitter Scraper
Part of ScrapeClaw (https://scrapeclaw.cc/) — a suite of production-ready, agentic social media scrapers for Instagram, YouTube, X/Twitter, TikTok, and Facebook. Built with Python & Playwright. No API keys required.
Pricing
from $1.00 / actor start
Rating
0.0
(0)
Developer

Scrapeclaw
Actor stats
0
Bookmarked
2
Total users
0
Monthly active users
14 days ago
Last modified
Categories
Share
🐦 Twitter/X Profile Scraper
Part of ScrapeClaw — a suite of production-ready, agentic social media scrapers for Instagram, YouTube, X/Twitter, and Facebook. Built with Python & Playwright. No API keys required.
What Is This?
A browser-based Twitter/X scraper that discovers and extracts structured data from public Twitter profiles — without any official API or login. It uses Playwright for full browser automation with built-in anti-detection, fingerprinting, and human behavior simulation to scrape at scale reliably.
Two-phase workflow:
- Discovery — Find Twitter/X profiles by location and category via Google Custom Search or DuckDuckGo
- Scraping — Extract full profile data, tweets, engagement metrics, and media using a real browser session
Features
| Feature | Description |
|---|---|
| 🔍 Discovery | Find profiles by city and category automatically |
| 🌐 Browser Simulation | Full Playwright browser — renders JavaScript, handles login walls |
| 🛡️ Anti-Detection | Browser fingerprinting, stealth scripts, human behavior simulation |
| 📊 Rich Data | Profile info, follower counts, tweets, engagement stats, media |
| 🖼️ Media Download | Profile pics and tweet media saved locally |
| 💾 Flexible Export | JSON and CSV output formats |
| 🔄 Resume Support | Checkpoint-based resume for interrupted sessions |
| ⚡ Smart Filtering | Auto-skip private accounts, suspended users, low-follower profiles |
| 🚫 No Login Required | Scrapes only publicly visible content |
| 🌍 Residential Proxy | Built-in proxy manager supporting 4 major providers |
Installation
# Clone the repositorygit clone https://github.com/Scrapeclaw/twitter-scraper.gitcd twitter-scraper# Install Python dependenciespip install -r requirements.txt# Install Playwright browsersplaywright install chromium
Environment Setup
Create a .env file in the project root:
# Google Custom Search API (optional, for discovery)GOOGLE_API_KEY=your_google_api_keyGOOGLE_SEARCH_ENGINE_ID=your_search_engine_id# Residential proxy (optional — see Proxy section below)PROXY_ENABLED=falsePROXY_PROVIDER=brightdataPROXY_USERNAME=your_proxy_userPROXY_PASSWORD=your_proxy_passPROXY_COUNTRY=usPROXY_STICKY=true
Usage
Discover Profiles
# Discover tech profiles in Miamipython main.py discover --location "Miami" --category "tech"# Discover crypto profiles in New Yorkpython main.py discover --location "New York" --category "crypto" --count 15# Batch mode (multiple cities x categories)python main.py discover --batch# Return JSON output (for agent integration)python main.py discover --location "Miami" --category "tech" --output json
Scrape
# Scrape a single profile by usernamepython main.py scrape --username elonmusk# Scrape multiple usernamespython main.py scrape --usernames nasa,spacex,openai --category tech# Scrape from a discovery queue filepython main.py scrape data/queue/Miami_tech_20260220_120000.json# Run headlesspython main.py scrape --username elonmusk --headless
Manage & Export
# List available queue filespython main.py list# Export all scraped data to JSON + CSVpython main.py export --format both
Output Data
Each scraped profile is saved to data/output/{username}.json:
{"username": "elonmusk","display_name": "Elon Musk","bio": "...","followers": 180000000,"following": 800,"tweets_count": 45000,"is_verified": true,"profile_pic_url": "https://...","profile_pic_local": "thumbnails/elonmusk/profile_abc123.jpg","user_location": "Mars & Earth","join_date": "June 2009","website": "https://x.ai","influencer_tier": "mega","category": "tech","scrape_location": "New York","scraped_at": "2026-02-20T12:00:00","recent_tweets": [{"id": "1234567890","text": "Tweet content...","timestamp": "2026-02-17T10:30:00.000Z","likes": 50000,"retweets": 12000,"replies": 3000,"views": "5.2M","media_urls": ["https://..."],"media_local": ["thumbnails/elonmusk/tweet_media_0_def456.jpg"],"is_retweet": false,"is_reply": false,"url": "https://x.com/elonmusk/status/1234567890"}]}
Influencer Tiers
| Tier | Followers |
|---|---|
| nano | < 1,000 |
| micro | 1,000 – 10,000 |
| mid | 10,000 – 100,000 |
| macro | 100,000 – 1M |
| mega | > 1,000,000 |
🌐 Residential Proxy (Recommended for Scale)
Running long scraping sessions without a residential proxy will get your IP blocked. The built-in proxy manager handles rotation, sticky sessions, and country targeting automatically.
Why Use a Residential Proxy?
- ✅ Avoid IP bans — residential IPs look like real users to Twitter/X
- ✅ Rotate IPs automatically on every request or session
- ✅ Sticky sessions — keep the same IP during a browsing session
- ✅ Geo-target by country for locale-accurate content
- ✅ 95%+ success rates vs ~30% with datacenter proxies
Recommended Providers
We have affiliate partnerships with the following providers. Using these links supports this project at no extra cost to you:
| Provider | Highlights | Sign Up |
|---|---|---|
| Bright Data | World's largest network, 72M+ IPs, enterprise-grade | 👉 Get Bright Data |
| IProyal | Pay-as-you-go, 195+ countries, no traffic expiry | 👉 Get IProyal |
| Storm Proxies | Fast & reliable, developer-friendly API, competitive pricing | 👉 Get Storm Proxies |
| NetNut | ISP-grade network, 52M+ IPs, direct connectivity | 👉 Get NetNut |
These are affiliate links. We may earn a commission at no extra cost to you.
Enabling the Proxy
Option 1 — Environment variables (recommended):
export PROXY_ENABLED=trueexport PROXY_PROVIDER=brightdata # brightdata | iproyal | stormproxies | netnut | customexport PROXY_USERNAME=your_proxy_userexport PROXY_PASSWORD=your_proxy_passexport PROXY_COUNTRY=us # optionalexport PROXY_STICKY=true # keeps same IP per session
Option 2 — config/scraper_config.json:
{"proxy": {"enabled": true,"provider": "brightdata","country": "us","sticky": true,"sticky_ttl_minutes": 10}}
Set credentials via env vars (PROXY_USERNAME, PROXY_PASSWORD) — never hardcode them in the config file.
Provider Host/Port Reference
| Provider | Host | Port |
|---|---|---|
| Bright Data | brd.superproxy.io | 22225 |
| IProyal | proxy.iproyal.com | 12321 |
| Storm Proxies | rotating.stormproxies.com | 9999 |
| NetNut | gw-resi.netnut.io | 5959 |
Once configured, the scraper uses the proxy automatically — no extra flags needed. The log confirms it:
INFO - Proxy enabled: <ProxyManager provider=brightdata enabled host=brd.superproxy.io:22225>INFO - Browser using proxy: brightdata → brd.superproxy.io:22225
Configuration Reference
Edit config/scraper_config.json to customise behaviour:
{"proxy": {"enabled": false,"provider": "brightdata","country": "","sticky": true,"sticky_ttl_minutes": 10},"google_search": {"enabled": true,"api_key": "","search_engine_id": "","queries_per_location": 3},"scraper": {"headless": false,"min_followers": 500,"max_tweets": 20,"download_thumbnails": true,"max_thumbnails": 6,"delay_between_profiles": [4, 8],"timeout": 60000}}
Project Structure
twitter-scraper/├── main.py # CLI entry point├── scraper.py # Playwright browser scraper├── discovery.py # Google/DuckDuckGo profile discovery├── anti_detection.py # Fingerprinting & stealth├── proxy_manager.py # Residential proxy integration├── config/│ └── scraper_config.json├── data/│ ├── output/ # Scraped JSON files│ ├── queue/ # Discovery queue files│ └── browser_fingerprints.json└── thumbnails/ # Downloaded profile & tweet media
Part of ScrapeClaw
This scraper is one of several tools in the ScrapeClaw collection:
| Scraper | Description | Links |
|---|---|---|
| 🐦 X / Twitter | Tweets, profiles & engagement metrics | GitHub · ClawHub |
| Profiles, posts, media & follower counts | GitHub · ClawHub | |
| 🎥 YouTube | Channels, subscribers & video metadata | GitHub · ClawHub |
| Pages, groups, posts & engagement data | GitHub · ClawHub |
All scrapers share the same anti-detection foundation, proxy support, and JSON/CSV export pipeline.
☕ Support This Project
If this tool saves you time or helps your workflow, consider buying me a coffee — it keeps the project maintained and new scrapers coming!
Disclaimer
This tool is intended for scraping publicly available data only. No login is required or used. Always comply with Twitter/X's Terms of Service and your local data privacy regulations. The author is not responsible for any misuse.
Built by ScrapeClaw · View all scrapers
