X (Twitter) Bulk Scraper/Monitor/Alerts + Vision
Pricing
Pay per usage
X (Twitter) Bulk Scraper/Monitor/Alerts + Vision
Monitor X (formerly Twitter) for specific content. Extract data, monitor, and optionally run image-based alerts using cloud vision APIs. Perfect for brand reputation management, tracking tweets, hashtags, specific images, and user activity.
Pricing
Pay per usage
Rating
0.0
(0)
Developer

⠀Advanced Automation
Actor stats
0
Bookmarked
1
Total users
0
Monthly active users
6 hours ago
Last modified
Categories
Share
X (Twitter) Bulk Scrape/Monitor + Vision AI
Monitor X/Twitter accounts, extract tweets, filter by keywords/hashtags, and run AI vision analysis on images using 6 different AI providers.
🎯 Overview
This Apify Actor scrapes X (formerly Twitter) posts from multiple accounts, filters by keywords/hashtags, and optionally runs AI vision analysis on images to detect objects, brands, or custom content patterns. Perfect for social media monitoring, brand tracking, and competitive intelligence.
✨ Key Features
Core Scraping
- 🚀 Hyperdrive Mode: Lightning-fast RSS-based scraping with automatic fallback to web scraping
- 👥 Bulk Processing: Monitor up to 100 Twitter accounts simultaneously
- 🔍 Smart Filtering: Filter by keywords, hashtags, or require images
- 📊 Dual Datasets: Separate outputs for tweets and vision alerts
- 🔄 Automatic Retry: Robust error handling with multiple Nitter instance fallbacks
AI Vision Analysis (Optional)
Analyze tweet images using 6 industry-leading AI providers:
- 🤖 Google Gemini 2.0 Flash - Latest multimodal AI with base64 encoding
- 🎨 OpenAI GPT-4o Vision - Advanced image understanding and analysis
- 👁️ Google Cloud Vision - Label detection, OCR, safe search, object localization
- ☁️ Azure Computer Vision - Tags, objects, brands, faces, adult content detection
- 📸 AWS Rekognition - Label detection and content moderation
- 🔗 Custom Webhooks - Integrate your own vision API
Alert System
- 🔔 Webhook Notifications: Get instant alerts when vision pipelines trigger
- 🎯 Flexible Configuration: Per-pipeline or global webhook URLs
- 📈 Confidence Scoring: Filter alerts by AI confidence thresholds
- 🏷️ Label Matching: Trigger on specific detected objects or keywords
📥 Input Configuration
Basic Example
{"usernames": ["apify", "openai"],"maxItems": 100,"preferRss": true}
Complete Example with Vision Analysis
{"usernames": ["apify", "elonmusk", "openai"],"searchTerms": ["AI", "automation", "web scraping"],"hashtags": ["webscraping", "machinelearning"],"maxItems": 500,"preferRss": true,"requireImages": false,"rssTimeoutSecs": 10,"visionPipelines": [{"name": "Product Launch Detector","provider": "gemini_vision","enabled": true,"configJson": "{\"prompt\":\"Is this a product launch or announcement?\",\"triggerKeywords\":[\"launch\",\"new\",\"announcement\"],\"model\":\"gemini-2.0-flash-exp\"}","alertWebhookUrl": "https://your-webhook.com/product-alerts"},{"name": "Brand Monitor","provider": "openai_vision","enabled": true,"configJson": "{\"prompt\":\"Identify brands and logos\",\"triggerKeywords\":[\"Tesla\",\"Apple\",\"Nike\"],\"model\":\"gpt-4o\"}","alertWebhookUrl": ""},{"name": "Object Detector","provider": "google_vision","enabled": true,"configJson": "{\"threshold\":0.8,\"triggerLabels\":[\"car\",\"vehicle\"],\"maxLabels\":10}"}]}
Input Fields
| Field | Type | Required | Description |
|---|---|---|---|
usernames | array | ✅ Yes | X/Twitter usernames to monitor (without @ symbol) |
searchTerms | array | No | Filter tweets containing these keywords |
hashtags | array | No | Filter tweets containing these hashtags |
maxItems | integer | No | Maximum tweets to collect (default: 100) |
preferRss | boolean | No | Use RSS scraping first (default: true) |
requireImages | boolean | No | Only collect tweets with images (default: false) |
rssTimeoutSecs | integer | No | RSS fetch timeout in seconds (default: 10) |
visionPipelines | array | No | AI vision analysis configuration |
Vision Pipeline Configuration
Each pipeline in visionPipelines array:
| Field | Type | Required | Description |
|---|---|---|---|
name | string | ✅ Yes | Descriptive name for the pipeline |
provider | string | ✅ Yes | AI provider: gemini_vision, openai_vision, google_vision, azure_cv, aws_rekognition, custom_webhook |
enabled | boolean | No | Enable/disable this pipeline (default: true) |
configJson | string | No | Provider-specific configuration as JSON string |
alertWebhookUrl | string | No | Webhook URL for alerts (overrides env var) |
Provider-Specific Configuration
Gemini Vision
{"prompt": "Describe what you see in detail","triggerKeywords": ["product", "launch"],"model": "gemini-2.0-flash-exp"}
OpenAI Vision
{"prompt": "Identify brands and logos","triggerKeywords": ["Nike", "Apple"],"model": "gpt-4o","maxTokens": 500}
Google Cloud Vision
{"threshold": 0.8,"triggerLabels": ["car", "vehicle"],"maxLabels": 10}
Azure Computer Vision
{"minConfidence": 0.7,"targetTags": ["car", "person"],"blockAdult": false}
AWS Rekognition
{"minConfidence": 0.7,"targetLabels": ["Car", "Person"],"blockUnsafe": true}
Custom Webhook
{"webhookUrl": "https://your-api.com/analyze","timeout": 20000,"headers": {"Authorization": "Bearer YOUR_TOKEN"}}
📤 Output
Main Dataset (Tweets)
Each scraped tweet contains:
{"title": "Check out our new Actor for web scraping!","link": "https://x.com/apify/status/1234567890","author": "apify","published": "2026-01-15T10:30:00Z","description": "Check out our new Actor...","tags": ["#webscraping", "#automation"],"imageUrl": "https://pbs.twimg.com/media/abc123.jpg","visionAlertsCount": 2,"scrapedUsername": "apify","collectedAt": "2026-01-15T10:35:00Z","sourceType": "rss","instance": "nitter.net"}
Alerts Dataset (Vision Triggers)
Each triggered alert contains:
{"pipelineName": "Product Launch Detector","provider": "gemini_vision","itemLink": "https://x.com/apify/status/1234567890","imageUrl": "https://pbs.twimg.com/media/abc123.jpg","labels": [{"name": "product", "score": 0.95},{"name": "announcement", "score": 0.88}],"score": 0.95,"analysis": "This image shows a new product launch announcement...","triggeredAt": "2026-01-15T10:35:00Z"}
Output Views
The Actor provides multiple pre-configured output views:
- tweets - Full dataset JSON
- tweetsTable - Simplified table view
- tweetsCSV - CSV export
- tweetsWithImages - Images only
- visionAlerts - All vision alerts
- visionAlertsTable - Simplified alerts view
- visionAlertsCSV - Alerts CSV export
- highConfidenceAlerts - 90%+ confidence only
- runStats - Actor run statistics
🔐 Environment Variables
Configure AI providers via environment variables in the Actor settings:
Required (if using vision analysis)
| Variable | Description | Example |
|---|---|---|
OPENAI_API_KEY | OpenAI API key for GPT-4o Vision | sk-... |
GEMINI_API_KEY | Google Gemini API key | AIza... |
GOOGLE_APPLICATION_CREDENTIALS | Google Cloud credentials JSON | {"type":"service_account",...} |
AZURE_CV_ENDPOINT | Azure Computer Vision endpoint | https://your-resource.cognitiveservices.azure.com/ |
AZURE_CV_KEY | Azure Computer Vision API key | abc123... |
AWS_ACCESS_KEY_ID | AWS access key for Rekognition | AKIA... |
AWS_SECRET_ACCESS_KEY | AWS secret key | abc123... |
AWS_REGION | AWS region (optional) | us-east-1 (default) |
Optional
| Variable | Description |
|---|---|
ALERT_WEBHOOK_URL | Global webhook URL for all alerts |
WEBHOOK_<PIPELINE_NAME> | Pipeline-specific webhook (e.g., WEBHOOK_PRODUCT_DETECTOR) |
Setting Environment Variables
Via Apify Console:
- Go to your Actor → Settings → Environment variables
- Click "Add variable"
- Enter name and value
- Check "Secret" for sensitive data
Via .actor/actor.json:
{"environmentVariables": {"OPENAI_API_KEY": "@openai-key","GEMINI_API_KEY": "@gemini-key"}}
Note: Use @secret-name syntax to reference Apify secrets.
🎯 Use Cases
1. Brand Monitoring
Monitor brand mentions and visual content across competitor accounts:
- Track logo appearances in images
- Detect product placements
- Monitor sentiment around brand discussions
2. Product Launch Detection
Get instant alerts when competitors announce new products:
- Analyze images for product unveils
- Detect "new" or "launching" keywords
- Track announcement patterns
3. Content Moderation
Filter and flag inappropriate content:
- Adult content detection (Azure/AWS)
- Unsafe content filtering
- Brand safety monitoring
4. Competitor Analysis
Track competitor social media activity:
- Monitor posting frequency
- Analyze content themes
- Track image-based campaigns
5. Social Media Intelligence
Aggregate insights from multiple accounts:
- Trending topics detection
- Hashtag performance tracking
- Engagement pattern analysis
6. Market Research
Gather visual data for market analysis:
- Product feature comparisons
- Packaging design trends
- Campaign creative analysis
🚀 Quick Start
1. Basic Tweet Scraping (No Vision)
{"usernames": ["apify"],"maxItems": 50}
2. Keyword Filtering
{"usernames": ["techcrunch", "theverge"],"searchTerms": ["AI", "ChatGPT"],"maxItems": 100}
3. Image-Only Collection
{"usernames": ["nasa", "spacex"],"requireImages": true,"maxItems": 50}
4. With Gemini Vision
{"usernames": ["producthunt"],"requireImages": true,"visionPipelines": [{"name": "Product Detector","provider": "gemini_vision","enabled": true,"configJson": "{\"prompt\":\"Describe this product\",\"triggerKeywords\":[\"app\",\"software\"]}"}]}
📊 Performance & Limits
- Speed: 50-100 tweets per minute (RSS mode)
- Concurrent Accounts: Up to 100 usernames
- Vision Processing: ~2-5 seconds per image per provider
- Memory: 512MB recommended (1GB for heavy vision usage)
- Timeout: 300 seconds default (adjust in Actor settings)
🔧 Troubleshooting
No Items Collected
Possible causes:
- Bot protection blocking Nitter instances
- Invalid usernames
- User accounts have no recent posts
- Filters are too restrictive
Solutions:
- Verify usernames are correct (without @ symbol)
- Try different time of day
- Reduce filter restrictions
- Check Actor logs for specific errors
Vision Analysis Not Working
Possible causes:
- Missing API credentials in environment variables
- Invalid API keys
- API rate limits exceeded
- Image URLs inaccessible
Solutions:
- Verify all required environment variables are set
- Check API key validity in provider dashboard
- Review Actor logs for specific API errors
- Ensure images are publicly accessible
Webhook Alerts Not Received
Possible causes:
- Invalid webhook URL
- Webhook endpoint timeout
- Firewall blocking Apify IPs
Solutions:
- Test webhook URL with curl/Postman
- Increase webhook timeout in config
- Verify webhook endpoint accepts POST requests
- Check webhook logs for incoming requests
🏗️ Architecture
Data Flow
- Input Validation - Verify usernames and configuration
- Instance Discovery - Fetch working Nitter instances from status page
- RSS Scraping - Try RSS feeds from multiple instances
- Web Scraping Fallback - Parse HTML if RSS fails
- Content Filtering - Apply keyword/hashtag filters
- Vision Processing - Run enabled AI pipelines on images
- Alert Triggering - Send webhooks for matched patterns
- Data Storage - Save to Apify datasets
Technical Stack
- Runtime: Node.js 18 (Apify SDK 3.x)
- HTTP Client: Axios
- HTML Parsing: Cheerio
- RSS Parsing: rss-parser
- AI Providers: Native REST APIs
- Image Processing: Base64 encoding for Gemini/OpenAI
📝 Changelog
Version 1.0.0 (2026-02-01)
- ✨ Initial release
- 🚀 RSS-first scraping with web fallback
- 🤖 6 AI vision providers
- 🔔 Webhook alert system
- 📊 Dual dataset output
📄 License
Apache-2.0
🆘 Support & Resources
- 📚 Apify Documentation
- 💬 Apify Discord Community
- 🐛 Report Issues
- 💡 Feature Requests
- 📧 Contact Support
🙏 Credits
Built with ❤️ using:
- Apify Platform
- Nitter instances
- OpenAI GPT-4o Vision
- Google Gemini
- Google Cloud Vision
- Azure Computer Vision
- AWS Rekognition
Made by [dubz]