X (Twitter) Bulk Scraper/Monitor/Alerts + Vision avatar
X (Twitter) Bulk Scraper/Monitor/Alerts + Vision

Pricing

Pay per usage

Go to Apify Store
X (Twitter) Bulk Scraper/Monitor/Alerts + Vision

X (Twitter) Bulk Scraper/Monitor/Alerts + Vision

Monitor X (formerly Twitter) for specific content. Extract data, monitor, and optionally run image-based alerts using cloud vision APIs. Perfect for brand reputation management, tracking tweets, hashtags, specific images, and user activity.

Pricing

Pay per usage

Rating

0.0

(0)

Developer

⠀Advanced Automation

⠀Advanced Automation

Maintained by Community

Actor stats

0

Bookmarked

1

Total users

0

Monthly active users

6 hours ago

Last modified

Share

X (Twitter) Bulk Scrape/Monitor + Vision AI

Monitor X/Twitter accounts, extract tweets, filter by keywords/hashtags, and run AI vision analysis on images using 6 different AI providers.

Apify Actor Node.js LICENSE

🎯 Overview

This Apify Actor scrapes X (formerly Twitter) posts from multiple accounts, filters by keywords/hashtags, and optionally runs AI vision analysis on images to detect objects, brands, or custom content patterns. Perfect for social media monitoring, brand tracking, and competitive intelligence.

✨ Key Features

Core Scraping

  • 🚀 Hyperdrive Mode: Lightning-fast RSS-based scraping with automatic fallback to web scraping
  • 👥 Bulk Processing: Monitor up to 100 Twitter accounts simultaneously
  • 🔍 Smart Filtering: Filter by keywords, hashtags, or require images
  • 📊 Dual Datasets: Separate outputs for tweets and vision alerts
  • 🔄 Automatic Retry: Robust error handling with multiple Nitter instance fallbacks

AI Vision Analysis (Optional)

Analyze tweet images using 6 industry-leading AI providers:

  • 🤖 Google Gemini 2.0 Flash - Latest multimodal AI with base64 encoding
  • 🎨 OpenAI GPT-4o Vision - Advanced image understanding and analysis
  • 👁️ Google Cloud Vision - Label detection, OCR, safe search, object localization
  • ☁️ Azure Computer Vision - Tags, objects, brands, faces, adult content detection
  • 📸 AWS Rekognition - Label detection and content moderation
  • 🔗 Custom Webhooks - Integrate your own vision API

Alert System

  • 🔔 Webhook Notifications: Get instant alerts when vision pipelines trigger
  • 🎯 Flexible Configuration: Per-pipeline or global webhook URLs
  • 📈 Confidence Scoring: Filter alerts by AI confidence thresholds
  • 🏷️ Label Matching: Trigger on specific detected objects or keywords

📥 Input Configuration

Basic Example

{
"usernames": ["apify", "openai"],
"maxItems": 100,
"preferRss": true
}

Complete Example with Vision Analysis

{
"usernames": ["apify", "elonmusk", "openai"],
"searchTerms": ["AI", "automation", "web scraping"],
"hashtags": ["webscraping", "machinelearning"],
"maxItems": 500,
"preferRss": true,
"requireImages": false,
"rssTimeoutSecs": 10,
"visionPipelines": [
{
"name": "Product Launch Detector",
"provider": "gemini_vision",
"enabled": true,
"configJson": "{\"prompt\":\"Is this a product launch or announcement?\",\"triggerKeywords\":[\"launch\",\"new\",\"announcement\"],\"model\":\"gemini-2.0-flash-exp\"}",
"alertWebhookUrl": "https://your-webhook.com/product-alerts"
},
{
"name": "Brand Monitor",
"provider": "openai_vision",
"enabled": true,
"configJson": "{\"prompt\":\"Identify brands and logos\",\"triggerKeywords\":[\"Tesla\",\"Apple\",\"Nike\"],\"model\":\"gpt-4o\"}",
"alertWebhookUrl": ""
},
{
"name": "Object Detector",
"provider": "google_vision",
"enabled": true,
"configJson": "{\"threshold\":0.8,\"triggerLabels\":[\"car\",\"vehicle\"],\"maxLabels\":10}"
}
]
}

Input Fields

FieldTypeRequiredDescription
usernamesarray✅ YesX/Twitter usernames to monitor (without @ symbol)
searchTermsarrayNoFilter tweets containing these keywords
hashtagsarrayNoFilter tweets containing these hashtags
maxItemsintegerNoMaximum tweets to collect (default: 100)
preferRssbooleanNoUse RSS scraping first (default: true)
requireImagesbooleanNoOnly collect tweets with images (default: false)
rssTimeoutSecsintegerNoRSS fetch timeout in seconds (default: 10)
visionPipelinesarrayNoAI vision analysis configuration

Vision Pipeline Configuration

Each pipeline in visionPipelines array:

FieldTypeRequiredDescription
namestring✅ YesDescriptive name for the pipeline
providerstring✅ YesAI provider: gemini_vision, openai_vision, google_vision, azure_cv, aws_rekognition, custom_webhook
enabledbooleanNoEnable/disable this pipeline (default: true)
configJsonstringNoProvider-specific configuration as JSON string
alertWebhookUrlstringNoWebhook URL for alerts (overrides env var)

Provider-Specific Configuration

Gemini Vision

{
"prompt": "Describe what you see in detail",
"triggerKeywords": ["product", "launch"],
"model": "gemini-2.0-flash-exp"
}

OpenAI Vision

{
"prompt": "Identify brands and logos",
"triggerKeywords": ["Nike", "Apple"],
"model": "gpt-4o",
"maxTokens": 500
}

Google Cloud Vision

{
"threshold": 0.8,
"triggerLabels": ["car", "vehicle"],
"maxLabels": 10
}

Azure Computer Vision

{
"minConfidence": 0.7,
"targetTags": ["car", "person"],
"blockAdult": false
}

AWS Rekognition

{
"minConfidence": 0.7,
"targetLabels": ["Car", "Person"],
"blockUnsafe": true
}

Custom Webhook

{
"webhookUrl": "https://your-api.com/analyze",
"timeout": 20000,
"headers": {
"Authorization": "Bearer YOUR_TOKEN"
}
}

📤 Output

Main Dataset (Tweets)

Each scraped tweet contains:

{
"title": "Check out our new Actor for web scraping!",
"link": "https://x.com/apify/status/1234567890",
"author": "apify",
"published": "2026-01-15T10:30:00Z",
"description": "Check out our new Actor...",
"tags": ["#webscraping", "#automation"],
"imageUrl": "https://pbs.twimg.com/media/abc123.jpg",
"visionAlertsCount": 2,
"scrapedUsername": "apify",
"collectedAt": "2026-01-15T10:35:00Z",
"sourceType": "rss",
"instance": "nitter.net"
}

Alerts Dataset (Vision Triggers)

Each triggered alert contains:

{
"pipelineName": "Product Launch Detector",
"provider": "gemini_vision",
"itemLink": "https://x.com/apify/status/1234567890",
"imageUrl": "https://pbs.twimg.com/media/abc123.jpg",
"labels": [
{"name": "product", "score": 0.95},
{"name": "announcement", "score": 0.88}
],
"score": 0.95,
"analysis": "This image shows a new product launch announcement...",
"triggeredAt": "2026-01-15T10:35:00Z"
}

Output Views

The Actor provides multiple pre-configured output views:

  • tweets - Full dataset JSON
  • tweetsTable - Simplified table view
  • tweetsCSV - CSV export
  • tweetsWithImages - Images only
  • visionAlerts - All vision alerts
  • visionAlertsTable - Simplified alerts view
  • visionAlertsCSV - Alerts CSV export
  • highConfidenceAlerts - 90%+ confidence only
  • runStats - Actor run statistics

🔐 Environment Variables

Configure AI providers via environment variables in the Actor settings:

Required (if using vision analysis)

VariableDescriptionExample
OPENAI_API_KEYOpenAI API key for GPT-4o Visionsk-...
GEMINI_API_KEYGoogle Gemini API keyAIza...
GOOGLE_APPLICATION_CREDENTIALSGoogle Cloud credentials JSON{"type":"service_account",...}
AZURE_CV_ENDPOINTAzure Computer Vision endpointhttps://your-resource.cognitiveservices.azure.com/
AZURE_CV_KEYAzure Computer Vision API keyabc123...
AWS_ACCESS_KEY_IDAWS access key for RekognitionAKIA...
AWS_SECRET_ACCESS_KEYAWS secret keyabc123...
AWS_REGIONAWS region (optional)us-east-1 (default)

Optional

VariableDescription
ALERT_WEBHOOK_URLGlobal webhook URL for all alerts
WEBHOOK_<PIPELINE_NAME>Pipeline-specific webhook (e.g., WEBHOOK_PRODUCT_DETECTOR)

Setting Environment Variables

Via Apify Console:

  1. Go to your Actor → Settings → Environment variables
  2. Click "Add variable"
  3. Enter name and value
  4. Check "Secret" for sensitive data

Via .actor/actor.json:

{
"environmentVariables": {
"OPENAI_API_KEY": "@openai-key",
"GEMINI_API_KEY": "@gemini-key"
}
}

Note: Use @secret-name syntax to reference Apify secrets.

🎯 Use Cases

1. Brand Monitoring

Monitor brand mentions and visual content across competitor accounts:

  • Track logo appearances in images
  • Detect product placements
  • Monitor sentiment around brand discussions

2. Product Launch Detection

Get instant alerts when competitors announce new products:

  • Analyze images for product unveils
  • Detect "new" or "launching" keywords
  • Track announcement patterns

3. Content Moderation

Filter and flag inappropriate content:

  • Adult content detection (Azure/AWS)
  • Unsafe content filtering
  • Brand safety monitoring

4. Competitor Analysis

Track competitor social media activity:

  • Monitor posting frequency
  • Analyze content themes
  • Track image-based campaigns

5. Social Media Intelligence

Aggregate insights from multiple accounts:

  • Trending topics detection
  • Hashtag performance tracking
  • Engagement pattern analysis

6. Market Research

Gather visual data for market analysis:

  • Product feature comparisons
  • Packaging design trends
  • Campaign creative analysis

🚀 Quick Start

1. Basic Tweet Scraping (No Vision)

{
"usernames": ["apify"],
"maxItems": 50
}

2. Keyword Filtering

{
"usernames": ["techcrunch", "theverge"],
"searchTerms": ["AI", "ChatGPT"],
"maxItems": 100
}

3. Image-Only Collection

{
"usernames": ["nasa", "spacex"],
"requireImages": true,
"maxItems": 50
}

4. With Gemini Vision

{
"usernames": ["producthunt"],
"requireImages": true,
"visionPipelines": [{
"name": "Product Detector",
"provider": "gemini_vision",
"enabled": true,
"configJson": "{\"prompt\":\"Describe this product\",\"triggerKeywords\":[\"app\",\"software\"]}"
}]
}

📊 Performance & Limits

  • Speed: 50-100 tweets per minute (RSS mode)
  • Concurrent Accounts: Up to 100 usernames
  • Vision Processing: ~2-5 seconds per image per provider
  • Memory: 512MB recommended (1GB for heavy vision usage)
  • Timeout: 300 seconds default (adjust in Actor settings)

🔧 Troubleshooting

No Items Collected

Possible causes:

  • Bot protection blocking Nitter instances
  • Invalid usernames
  • User accounts have no recent posts
  • Filters are too restrictive

Solutions:

  • Verify usernames are correct (without @ symbol)
  • Try different time of day
  • Reduce filter restrictions
  • Check Actor logs for specific errors

Vision Analysis Not Working

Possible causes:

  • Missing API credentials in environment variables
  • Invalid API keys
  • API rate limits exceeded
  • Image URLs inaccessible

Solutions:

  • Verify all required environment variables are set
  • Check API key validity in provider dashboard
  • Review Actor logs for specific API errors
  • Ensure images are publicly accessible

Webhook Alerts Not Received

Possible causes:

  • Invalid webhook URL
  • Webhook endpoint timeout
  • Firewall blocking Apify IPs

Solutions:

  • Test webhook URL with curl/Postman
  • Increase webhook timeout in config
  • Verify webhook endpoint accepts POST requests
  • Check webhook logs for incoming requests

🏗️ Architecture

Data Flow

  1. Input Validation - Verify usernames and configuration
  2. Instance Discovery - Fetch working Nitter instances from status page
  3. RSS Scraping - Try RSS feeds from multiple instances
  4. Web Scraping Fallback - Parse HTML if RSS fails
  5. Content Filtering - Apply keyword/hashtag filters
  6. Vision Processing - Run enabled AI pipelines on images
  7. Alert Triggering - Send webhooks for matched patterns
  8. Data Storage - Save to Apify datasets

Technical Stack

  • Runtime: Node.js 18 (Apify SDK 3.x)
  • HTTP Client: Axios
  • HTML Parsing: Cheerio
  • RSS Parsing: rss-parser
  • AI Providers: Native REST APIs
  • Image Processing: Base64 encoding for Gemini/OpenAI

📝 Changelog

Version 1.0.0 (2026-02-01)

  • ✨ Initial release
  • 🚀 RSS-first scraping with web fallback
  • 🤖 6 AI vision providers
  • 🔔 Webhook alert system
  • 📊 Dual dataset output

📄 License

Apache-2.0

🆘 Support & Resources

🙏 Credits

Built with ❤️ using:


Made by [dubz]