Twitter Hashtag Tweet Scraper avatar
Twitter Hashtag Tweet Scraper

Pricing

$3.00 / 1,000 results

Go to Apify Store
Twitter Hashtag Tweet Scraper

Twitter Hashtag Tweet Scraper

Scrapes tweets by hashtags with comprehensive metadata extraction and intelligent rate limit handling.

Pricing

$3.00 / 1,000 results

Rating

0.0

(0)

Developer

Deepanshu Sharma

Deepanshu Sharma

Maintained by Community

Actor stats

0

Bookmarked

3

Total users

1

Monthly active users

3 days ago

Last modified

Share

Twitter Hashtag Scraper

A actor that scrapes tweets based on hashtags using authenticated Twitter sessions. This scraper respects rate limits, handles deduplication, and provides comprehensive tweet data extraction.

๐Ÿš€ Features

  • Multi-hashtag support: Scrape tweets from multiple hashtags in a single run
  • Rate limit handling: Automatically handles Twitter rate limits with smart waiting
  • Deduplication: Prevents duplicate tweets using unique ID tracking
  • Real-time data streaming: Pushes data to Apify dataset in batches for immediate access
  • Comprehensive tweet data: Extracts detailed metadata including engagement metrics
  • Time-based filtering: Configurable tweet age limits
  • Authentication via cookies: Uses Twitter session cookies for reliable access

๐Ÿ“‹ Input Parameters

Required Parameters

ParameterTypeDescription
hashtagsArray[String]List of hashtags to search (without # symbol)
cookiesArray[Object]Twitter session cookies for authentication

Optional Parameters

ParameterTypeDefaultDescription
max_tweetsInteger5000Maximum total number of tweets to collect
max_age_hint_minutesInteger1440Maximum age of tweets in minutes (24 hours)

Input Example

{
"hashtags": ["AI", "MachineLearning", "DataScience"],
"max_tweets": 3000,
"max_age_hint_minutes": 720,
"cookies": [
{
"name": "auth_token",
"value": "your_auth_token_here",
"domain": ".x.com"
},
{
"name": "ct0",
"value": "your_ct0_token_here",
"domain": ".x.com"
}
]
}

๐Ÿช How to Get Twitter Cookies

  1. Log into Twitter/X in your browser
  2. Open Developer Tools (F12)
  3. Go to Application/Storage tab
  4. Find Cookies for x.com or twitter.com
  5. Copy required cookies:
    • auth_token - Main authentication token
    • ct0 - CSRF token
    • twid - Twitter ID (optional but recommended)
{
"name": "cookie_name",
"value": "cookie_value",
"domain": ".x.com"
}

๐Ÿ“Š Output Data Structure

Each tweet returns the following data structure:

{
"id": "1234567890123456789",
"text": "This is a sample tweet with #hashtag",
"author": "John Doe",
"username": "johndoe",
"created_at": "2024-01-15T10:30:00Z",
"retweet_count": 42,
"like_count": 156,
"reply_count": 23,
"quote_count": 8,
"url": "https://twitter.com/johndoe/status/1234567890123456789",
"hashtags": ["hashtag", "example"],
"mentions": ["mention1", "mention2"],
"is_retweet": false,
"language": "en",
"user_followers": 1500,
"user_verified": false,
"search_hashtag": "AI",
"scraped_at": "2024-01-15T11:00:00Z"
}

Output Fields Explanation

FieldDescription
idUnique tweet identifier
textFull tweet content
authorDisplay name of tweet author
usernameTwitter handle (@username)
created_atTweet creation timestamp
retweet_countNumber of retweets
like_countNumber of likes/favorites
reply_countNumber of replies
quote_countNumber of quote tweets
urlDirect link to the tweet
hashtagsArray of hashtags found in tweet
mentionsArray of mentioned users
is_retweetBoolean indicating if it's a retweet
languageDetected language code
user_followersAuthor's follower count
user_verifiedAuthor's verification status
search_hashtagWhich hashtag query found this tweet
scraped_atWhen the tweet was scraped

โšก Performance & Limits

Rate Limiting

  • Automatic handling: Actor automatically waits when rate limits are hit
  • Smart delays: Random delays between requests to avoid detection
  • Batch processing: Processes tweets in batches for efficiency

Tweet Distribution

  • Even distribution: Tweets are distributed evenly across hashtags
  • Global limit: Total tweet count never exceeds max_tweets
  • Deduplication: Duplicate tweets across hashtags are filtered out

Data Streaming

  • Real-time updates: Data is pushed to dataset every 50 tweets
  • Progress tracking: Detailed logging of collection progress
  • Error recovery: Continues scraping even if individual tweets fail

๐Ÿ”ง Advanced Configuration

Search Query Optimization

The actor automatically builds optimized search queries:

  • Filters out retweets by default
  • Includes time constraints based on max_age_hint_minutes
  • Uses Twitter's "Latest" product for recent tweets

Error Handling

  • Retry logic: Up to 5 retries per hashtag on failures
  • Graceful degradation: Continues with other hashtags if one fails
  • Data preservation: Saves collected data even if scraping is interrupted

๐Ÿšจ Important Notes

Authentication

  • Required: Valid Twitter session cookies are mandatory
  • Session management: Uses cookies to maintain authenticated session
  • Security: Keep your cookies secure and don't share them

Rate Limits

  • Twitter limits: Respects Twitter's rate limiting policies
  • Wait times: May pause for up to 9 minutes when rate limited
  • Patience required: Large scraping jobs may take time

Content Policy

  • Public tweets only: Only scrapes publicly available tweets
  • No private data: Cannot access protected accounts
  • Respect ToS: Use responsibly and respect Twitter's Terms of Service

๐Ÿ“ˆ Usage Examples

Small Scale Scraping

{
"hashtags": ["startup"],
"max_tweets": 100,
"max_age_hint_minutes": 60,
"cookies": [...]
}

Multi-hashtag Research

{
"hashtags": ["climate", "sustainability", "renewableenergy"],
"max_tweets": 5000,
"max_age_hint_minutes": 2880,
"cookies": [...]
}
{
"hashtags": ["breaking", "news", "trending"],
"max_tweets": 1000,
"max_age_hint_minutes": 30,
"cookies": [...]
}

๐Ÿ› ๏ธ Troubleshooting

Common Issues

  1. Authentication Failed

    • Verify cookies are valid and recent
    • Check cookie format and domain settings
    • Ensure you're logged into Twitter in the same browser
  2. No Tweets Found

    • Check hashtag spelling
    • Verify hashtags exist and have recent activity
    • Adjust max_age_hint_minutes to include older tweets
  3. Rate Limited

    • Wait for the automatic rate limit handling
    • Consider reducing max_tweets for faster completion
    • Use fewer hashtags to reduce API calls

Performance Tips

  • Start small: Test with low max_tweets first
  • Use specific hashtags: More specific hashtags often yield better results
  • Monitor progress: Check Apify logs for real-time progress updates
  • Be patient: Large scraping jobs require time due to rate limits

Support

For issues or questions:

  1. Check the Apify logs for detailed error messages
  2. Verify your input format matches the examples
  3. Ensure your cookies are valid and up-to-date
  4. Contact support through Apify platform if issues persist

Note: This actor is designed for research and analysis purposes. Please ensure compliance with Twitter's Terms of Service and applicable data protection regulations.