Scrapeclaw - Twitter Scraper avatar

Scrapeclaw - Twitter Scraper

Pricing

from $1.00 / actor start

Go to Apify Store
Scrapeclaw - Twitter Scraper

Scrapeclaw - Twitter Scraper

Part of ScrapeClaw (https://scrapeclaw.cc/) — a suite of production-ready, agentic social media scrapers for Instagram, YouTube, X/Twitter, TikTok, and Facebook. Built with Python & Playwright. No API keys required.

Pricing

from $1.00 / actor start

Rating

0.0

(0)

Developer

Scrapeclaw

Scrapeclaw

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

0

Monthly active users

14 days ago

Last modified

Share

🐦 Twitter/X Profile Scraper

Part of ScrapeClaw — a suite of production-ready, agentic social media scrapers for Instagram, YouTube, X/Twitter, and Facebook. Built with Python & Playwright. No API keys required.

ScrapeClaw ClawHub Buy Me a Coffee


What Is This?

A browser-based Twitter/X scraper that discovers and extracts structured data from public Twitter profiles — without any official API or login. It uses Playwright for full browser automation with built-in anti-detection, fingerprinting, and human behavior simulation to scrape at scale reliably.

Two-phase workflow:

  1. Discovery — Find Twitter/X profiles by location and category via Google Custom Search or DuckDuckGo
  2. Scraping — Extract full profile data, tweets, engagement metrics, and media using a real browser session

Features

FeatureDescription
🔍 DiscoveryFind profiles by city and category automatically
🌐 Browser SimulationFull Playwright browser — renders JavaScript, handles login walls
🛡️ Anti-DetectionBrowser fingerprinting, stealth scripts, human behavior simulation
📊 Rich DataProfile info, follower counts, tweets, engagement stats, media
🖼️ Media DownloadProfile pics and tweet media saved locally
💾 Flexible ExportJSON and CSV output formats
🔄 Resume SupportCheckpoint-based resume for interrupted sessions
Smart FilteringAuto-skip private accounts, suspended users, low-follower profiles
🚫 No Login RequiredScrapes only publicly visible content
🌍 Residential ProxyBuilt-in proxy manager supporting 4 major providers

Installation

# Clone the repository
git clone https://github.com/Scrapeclaw/twitter-scraper.git
cd twitter-scraper
# Install Python dependencies
pip install -r requirements.txt
# Install Playwright browsers
playwright install chromium

Environment Setup

Create a .env file in the project root:

# Google Custom Search API (optional, for discovery)
GOOGLE_API_KEY=your_google_api_key
GOOGLE_SEARCH_ENGINE_ID=your_search_engine_id
# Residential proxy (optional — see Proxy section below)
PROXY_ENABLED=false
PROXY_PROVIDER=brightdata
PROXY_USERNAME=your_proxy_user
PROXY_PASSWORD=your_proxy_pass
PROXY_COUNTRY=us
PROXY_STICKY=true

Usage

Discover Profiles

# Discover tech profiles in Miami
python main.py discover --location "Miami" --category "tech"
# Discover crypto profiles in New York
python main.py discover --location "New York" --category "crypto" --count 15
# Batch mode (multiple cities x categories)
python main.py discover --batch
# Return JSON output (for agent integration)
python main.py discover --location "Miami" --category "tech" --output json

Scrape

# Scrape a single profile by username
python main.py scrape --username elonmusk
# Scrape multiple usernames
python main.py scrape --usernames nasa,spacex,openai --category tech
# Scrape from a discovery queue file
python main.py scrape data/queue/Miami_tech_20260220_120000.json
# Run headless
python main.py scrape --username elonmusk --headless

Manage & Export

# List available queue files
python main.py list
# Export all scraped data to JSON + CSV
python main.py export --format both

Output Data

Each scraped profile is saved to data/output/{username}.json:

{
"username": "elonmusk",
"display_name": "Elon Musk",
"bio": "...",
"followers": 180000000,
"following": 800,
"tweets_count": 45000,
"is_verified": true,
"profile_pic_url": "https://...",
"profile_pic_local": "thumbnails/elonmusk/profile_abc123.jpg",
"user_location": "Mars & Earth",
"join_date": "June 2009",
"website": "https://x.ai",
"influencer_tier": "mega",
"category": "tech",
"scrape_location": "New York",
"scraped_at": "2026-02-20T12:00:00",
"recent_tweets": [
{
"id": "1234567890",
"text": "Tweet content...",
"timestamp": "2026-02-17T10:30:00.000Z",
"likes": 50000,
"retweets": 12000,
"replies": 3000,
"views": "5.2M",
"media_urls": ["https://..."],
"media_local": ["thumbnails/elonmusk/tweet_media_0_def456.jpg"],
"is_retweet": false,
"is_reply": false,
"url": "https://x.com/elonmusk/status/1234567890"
}
]
}

Influencer Tiers

TierFollowers
nano< 1,000
micro1,000 – 10,000
mid10,000 – 100,000
macro100,000 – 1M
mega> 1,000,000

Running long scraping sessions without a residential proxy will get your IP blocked. The built-in proxy manager handles rotation, sticky sessions, and country targeting automatically.

Why Use a Residential Proxy?

  • ✅ Avoid IP bans — residential IPs look like real users to Twitter/X
  • ✅ Rotate IPs automatically on every request or session
  • ✅ Sticky sessions — keep the same IP during a browsing session
  • ✅ Geo-target by country for locale-accurate content
  • ✅ 95%+ success rates vs ~30% with datacenter proxies

We have affiliate partnerships with the following providers. Using these links supports this project at no extra cost to you:

ProviderHighlightsSign Up
Bright DataWorld's largest network, 72M+ IPs, enterprise-grade👉 Get Bright Data
IProyalPay-as-you-go, 195+ countries, no traffic expiry👉 Get IProyal
Storm ProxiesFast & reliable, developer-friendly API, competitive pricing👉 Get Storm Proxies
NetNutISP-grade network, 52M+ IPs, direct connectivity👉 Get NetNut

These are affiliate links. We may earn a commission at no extra cost to you.

Enabling the Proxy

Option 1 — Environment variables (recommended):

export PROXY_ENABLED=true
export PROXY_PROVIDER=brightdata # brightdata | iproyal | stormproxies | netnut | custom
export PROXY_USERNAME=your_proxy_user
export PROXY_PASSWORD=your_proxy_pass
export PROXY_COUNTRY=us # optional
export PROXY_STICKY=true # keeps same IP per session

Option 2 — config/scraper_config.json:

{
"proxy": {
"enabled": true,
"provider": "brightdata",
"country": "us",
"sticky": true,
"sticky_ttl_minutes": 10
}
}

Set credentials via env vars (PROXY_USERNAME, PROXY_PASSWORD) — never hardcode them in the config file.

Provider Host/Port Reference

ProviderHostPort
Bright Databrd.superproxy.io22225
IProyalproxy.iproyal.com12321
Storm Proxiesrotating.stormproxies.com9999
NetNutgw-resi.netnut.io5959

Once configured, the scraper uses the proxy automatically — no extra flags needed. The log confirms it:

INFO - Proxy enabled: <ProxyManager provider=brightdata enabled host=brd.superproxy.io:22225>
INFO - Browser using proxy: brightdata → brd.superproxy.io:22225

Configuration Reference

Edit config/scraper_config.json to customise behaviour:

{
"proxy": {
"enabled": false,
"provider": "brightdata",
"country": "",
"sticky": true,
"sticky_ttl_minutes": 10
},
"google_search": {
"enabled": true,
"api_key": "",
"search_engine_id": "",
"queries_per_location": 3
},
"scraper": {
"headless": false,
"min_followers": 500,
"max_tweets": 20,
"download_thumbnails": true,
"max_thumbnails": 6,
"delay_between_profiles": [4, 8],
"timeout": 60000
}
}

Project Structure

twitter-scraper/
├── main.py # CLI entry point
├── scraper.py # Playwright browser scraper
├── discovery.py # Google/DuckDuckGo profile discovery
├── anti_detection.py # Fingerprinting & stealth
├── proxy_manager.py # Residential proxy integration
├── config/
│ └── scraper_config.json
├── data/
│ ├── output/ # Scraped JSON files
│ ├── queue/ # Discovery queue files
│ └── browser_fingerprints.json
└── thumbnails/ # Downloaded profile & tweet media

Part of ScrapeClaw

This scraper is one of several tools in the ScrapeClaw collection:

ScraperDescriptionLinks
🐦 X / TwitterTweets, profiles & engagement metricsGitHub · ClawHub
📸 InstagramProfiles, posts, media & follower countsGitHub · ClawHub
🎥 YouTubeChannels, subscribers & video metadataGitHub · ClawHub
📘 FacebookPages, groups, posts & engagement dataGitHub · ClawHub

All scrapers share the same anti-detection foundation, proxy support, and JSON/CSV export pipeline.


☕ Support This Project

If this tool saves you time or helps your workflow, consider buying me a coffee — it keeps the project maintained and new scrapers coming!

Buy Me a Coffee via PayPal

👉 paypal.me/arulmozhivelu


Disclaimer

This tool is intended for scraping publicly available data only. No login is required or used. Always comply with Twitter/X's Terms of Service and your local data privacy regulations. The author is not responsible for any misuse.


Built by ScrapeClaw · View all scrapers