X/Twitter Scraper — Tweets, Profiles & Engagement Data
Pricing
$15.00 / 1,000 results
Go to Apify Store

X/Twitter Scraper — Tweets, Profiles & Engagement Data
Scrape Twitter/X data at scale. Extract tweets, profiles, hashtags, trends, and engagement metrics for social media analytics.
Pricing
$15.00 / 1,000 results
Rating
0.0
(0)
Developer
Luan M.
Maintained by CommunityActor stats
0
Bookmarked
2
Total users
1
Monthly active users
15 hours ago
Last modified
Categories
Share
Twitter/X Data Scraper
A powerful, production-ready Twitter/X Data Scraper built on Apify and Crawlee. Extracts tweets, retweets, replies, likes, user profiles, hashtags, mentions, and media URLs from Twitter/X — all with minimal configuration.
Features
- Multi-mode Scraping — Search by keyword, scrape user timelines, or provide arbitrary X.com URLs
- Rich Tweet Data — Extracts tweet text, timestamp, engagement metrics (likes, retweets, replies, views), tweet IDs, and URLs
- User Profile Info — Bio, followers count, following count, location, website, join date, avatar, and banner image
- Media Extraction — URLs for images, videos, and GIFs embedded in tweets
- Hashtags & Mentions — Extracted automatically from each tweet
- Reply & Retweet Filters — Optionally include or exclude replies and retweets
- Proxy Support — Built-in Apify proxy integration with residential proxy group support
- Configurable — Max tweets, concurrency, retries, and more
- Headless Browser — Uses Playwright with Chromium for reliable JavaScript-rendered page extraction
Input Configuration
| Field | Type | Default | Description |
|---|---|---|---|
startUrls | Array | ["https://x.com/elonmusk"] | Starting URLs (profile pages, search results) |
searchQuery | String | — | Search query (overrides startUrls). Supports operators like from:username, has:hashtags, lang:en |
username | String | — | Scrape a specific user's timeline (without @). Overrides startUrls |
maxTweets | Integer | 100 | Maximum tweets to scrape (0 = unlimited) |
includeReplies | Boolean | false | Include replies in user timeline scrape |
includeRetweets | Boolean | true | Include retweets in user timeline scrape |
proxyConfiguration | Object | Apify RESIDENTIAL | Proxy settings to avoid IP bans |
maxRequestRetries | Integer | 3 | Retry limit for failed requests |
maxConcurrency | Integer | 5 | Concurrent browser pages |
extractTweetIds | Boolean | true | Include tweet IDs and URLs |
extractUserInfo | Boolean | true | Include author profile data |
extractMedia | Boolean | true | Extract image/video URLs |
Output Dataset
Each tweet is stored as a dataset item with the following structure:
{"text": "The future of AI is exciting!","timestamp": "2025-05-30T12:00:00.000Z","tweetId": "1234567890123456789","tweetUrl": "https://x.com/username/status/1234567890123456789","replyCount": 42,"retweetCount": 128,"likeCount": 1024,"viewCount": 50000,"isReply": false,"isRetweet": false,"hashtags": ["AI", "tech"],"mentions": ["@openai"],"mediaUrls": ["https://pbs.twimg.com/media/..."],"user": {"username": "elonmusk","displayName": "Elon Musk","avatarUrl": "https://pbs.twimg.com/profile_images/...","profileUrl": "https://x.com/elonmusk"},"profile": {"username": "elonmusk","displayName": "Elon Musk","bio": "Technology entrepreneur","followersCount": 180000000,"followingCount": 1500,"location": "Austin, TX","website": "https://example.com","avatarUrl": "https://pbs.twimg.com/profile_images/...","bannerUrl": "https://pbs.twimg.com/profile_banners/..."},"sourceUrl": "https://x.com/elonmusk","scrapedAt": "2025-05-30T12:05:00.000Z"}
Quick Start
# Install dependenciesnpm install# Run locally (requires Apify token)npx apify run -p# Or run directlynode src/main.js
Deployment to Apify
- Push this repository to GitHub
- Go to Apify Console → Create Actor → Import from GitHub
- Set up environment variables in Apify Console as needed
- Build and run!
Environment Variables
| Variable | Description |
|---|---|
APIFY_TOKEN | Your Apify API token (required for cloud proxy) |
APIFY_PROXY_PASSWORD | Apify proxy password |
APIFY_LOCAL_STORAGE_DIR | Local storage directory for development |
Technical Details
- Runtime: Node.js 18+
- Browser Engine: Chromium via Playwright
- Crawler: Crawlee PlaywrightCrawler
- Data Storage: Apify dataset
- Proxy: Apify Proxy (RESIDENTIAL recommended)
Limitations & Best Practices
- Rate Limiting: Twitter/X aggressively rate-limits scraping. Use residential proxies and reasonable concurrency.
- Login Walls: Some pages may require authentication. For full access, consider adding cookie-based session management.
- DOM Changes: This scraper relies on Twitter's DOM structure (
data-testidattributes). If Twitter updates their UI, selectors may need adjustment. - Ethical Use: Respect Twitter's Terms of Service and robots.txt. Use responsibly and consider rate limiting.
License
Apache 2.0