Twitter Hashtag Scraper - X Posts by Hashtag avatar

Twitter Hashtag Scraper - X Posts by Hashtag

Pricing

Pay per usage

Go to Apify Store
Twitter Hashtag Scraper - X Posts by Hashtag

Twitter Hashtag Scraper - X Posts by Hashtag

Scrape Twitter/X posts by hashtag using Nitter instances and web scraping. Extract tweet text, authors, handles, engagement metrics (likes, retweets, replies, quotes), images, videos, and posting dates. Includes hashtag analytics with top authors, engagement summaries, and posting frequency.

Pricing

Pay per usage

Rating

0.0

(0)

Developer

Ricardo Akiyoshi

Ricardo Akiyoshi

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

3 hours ago

Last modified

Categories

Share

Scrape Twitter/X posts by hashtag at scale using Nitter instances and web scraping. No Twitter API keys or authentication required. Extract full tweet metadata including text, authors, engagement metrics, images, and videos.

What does it do?

This actor searches Twitter/X for posts containing specific hashtags and extracts comprehensive data including:

  • Tweet content: Full text, author name, handle, posting date
  • Engagement metrics: Likes, retweets, replies, quotes
  • Media: Image URLs, video URLs
  • Tweet metadata: Tweet ID, direct URL, retweet/reply status
  • Hashtag analytics: Top authors, engagement distribution, posting frequency, co-occurring hashtags

How it works

The scraper uses a multi-layered strategy for maximum reliability:

  1. Nitter instances (primary) — Scrapes public Nitter mirrors that proxy Twitter data. Automatically rotates through 15+ instances and falls back to the next if one is down.
  2. CheerioCrawler (secondary) — Uses Crawlee's built-in retry and proxy management for more resilient scraping of Nitter pages.
  3. Google search (fallback) — When all Nitter instances are unavailable, searches Google for site:twitter.com #hashtag to find tweet URLs and basic metadata.

All tweets are deduplicated by tweet ID across all sources.

Use Cases

Social Listening

Monitor what people are saying about your brand, product, or industry. Track hashtag conversations in real-time and identify emerging sentiments before they trend.

Campaign Tracking

Measure the reach and engagement of your marketing campaigns. Track branded hashtags to see how many people are using them, who the top contributors are, and what content performs best.

Trend Analysis

Discover trending topics and conversations around specific hashtags. Analyze posting frequency, engagement patterns, and co-occurring hashtags to understand how conversations evolve.

Influencer Discovery

Find the most active and influential voices in any hashtag conversation. The built-in analytics identify top authors by tweet count and total engagement, making it easy to spot potential collaborators.

Competitive Intelligence

Monitor competitor brand hashtags, product launches, and campaign performance. Compare engagement metrics across different hashtags to benchmark your performance.

Market Research

Understand public opinion on topics relevant to your business. Analyze the language, sentiment, and themes within hashtag conversations to inform product development and positioning.

Academic Research

Collect large-scale social media datasets for research. The structured JSON output integrates easily with data analysis tools and NLP pipelines.

Content Strategy

Analyze which types of content (images, videos, text-only) get the most engagement within your target hashtags. Use co-hashtag analysis to discover related conversations to join.

Input

FieldTypeDefaultDescription
hashtagsarrayrequiredList of hashtags to search (without # symbol)
maxTweetsinteger500Maximum tweets to collect per hashtag (1-10,000)
languagestring-Filter by language code (e.g., "en", "es", "ja")
includeRepliesbooleanfalseWhether to include reply tweets
minLikesinteger0Minimum likes threshold for filtering
proxyConfigurationobject-Proxy settings for large-scale scrapes
{
"hashtags": ["AI", "machinelearning"],
"maxTweets": 200
}

Example: High-engagement English tweets only

{
"hashtags": ["startup", "SaaS", "indiehackers"],
"maxTweets": 1000,
"language": "en",
"includeReplies": false,
"minLikes": 10
}

Example: Campaign tracking with replies

{
"hashtags": ["YourBrandName", "ProductLaunch2026"],
"maxTweets": 5000,
"includeReplies": true,
"minLikes": 0
}

Output

Each tweet is saved as a structured JSON object:

{
"tweetId": "1895234567890123456",
"tweetText": "The future of AI is here. These new models are changing everything we know about automation. #AI #machinelearning #tech",
"author": "Tech Insights",
"handle": "@techinsights",
"date": "2026-02-28T14:30:00.000Z",
"likes": 1523,
"retweets": 342,
"replies": 87,
"quotes": 45,
"images": ["https://pbs.twimg.com/media/example.jpg"],
"videos": [],
"isRetweet": false,
"isReply": false,
"tweetUrl": "https://twitter.com/techinsights/status/1895234567890123456",
"hashtag": "AI",
"mentionedHashtags": ["ai", "machinelearning", "tech"],
"scrapedAt": "2026-03-01T10:15:00.000Z"
}

Analytics Output

The actor also generates hashtag analytics saved to the key-value store under the ANALYTICS key:

{
"AI": {
"hashtag": "AI",
"totalTweets": 500,
"uniqueAuthors": 312,
"topAuthors": [
{
"author": "@techinsights",
"tweetCount": 15,
"totalLikes": 8934,
"totalRetweets": 2156,
"totalEngagement": 12450
}
],
"engagementSummary": {
"likes": { "total": 125000, "average": 250, "median": 45, "max": 15000 },
"retweets": { "total": 34000, "average": 68, "median": 12, "max": 5000 },
"replies": { "total": 18000, "average": 36, "median": 5, "max": 2000 },
"totalEngagement": 177000,
"averageEngagementPerTweet": 354
},
"postingFrequency": {
"totalDays": 7,
"averagePerDay": 71,
"peakDate": "2026-02-28",
"peakCount": 120
},
"mediaStats": {
"tweetsWithImages": 180,
"tweetsWithVideos": 45,
"originalTweets": 380,
"retweets": 90,
"replies": 30,
"mediaPercentage": 45
},
"coHashtags": [
{ "hashtag": "machinelearning", "count": 120 },
{ "hashtag": "deeplearning", "count": 85 },
{ "hashtag": "tech", "count": 72 }
]
}
}

Deduplication

Tweets are deduplicated across all sources using a two-layer approach:

  1. Tweet ID (primary) — Each tweet has a unique numeric ID. If the same tweet appears from multiple Nitter instances, the version with the highest engagement data is kept.
  2. Text hash (fallback) — For tweets without extractable IDs (e.g., from Google fallback), a text hash is used to prevent duplicates.

Rate Limiting & Reliability

  • Automatic rotation through 15+ Nitter instances
  • Health checks on startup to identify working instances
  • 1.5-second delay between requests with random jitter
  • Exponential backoff on rate limits (429 responses)
  • Three-layer fallback strategy (direct fetch, CheerioCrawler, Google)
  • Request timeouts to prevent hanging on unresponsive instances

Pay Per Event Pricing

This actor uses Apify's Pay Per Event model:

EventPrice
Tweet scraped$0.003

Example costs:

  • 100 tweets = $0.30
  • 500 tweets = $1.50
  • 1,000 tweets = $3.00

Limitations

  • Nitter availability: Nitter instances go up and down frequently. The actor handles this with multi-instance rotation, but scraping volume depends on instance availability.
  • Historical data: Nitter search results are limited in how far back they go. For deep historical analysis, results may be incomplete.
  • Engagement accuracy: Engagement numbers from Nitter may lag behind real-time Twitter values.
  • Language filtering: Language detection is advisory and depends on Nitter's rendering. Some tweets may slip through the filter.
  • Google fallback: When using Google as a fallback, engagement metrics (likes, retweets) are not available.
  • Rate limits: Very large scrapes (5,000+ tweets) may take longer due to rate limiting across instances.

Tips for Best Results

  1. Use proxies for scrapes over 500 tweets to avoid rate limiting
  2. Set minLikes to filter out low-quality or spam tweets
  3. Disable includeReplies for cleaner datasets focused on original content
  4. Search multiple related hashtags in one run for comprehensive topic coverage
  5. Check the analytics output in the key-value store for quick insights without processing raw data

Changelog

1.0.0 (2026-03-02)

  • Initial release
  • Multi-instance Nitter scraping with automatic rotation
  • Google search fallback
  • Tweet deduplication by ID and text hash
  • Hashtag analytics (top authors, engagement, frequency, co-hashtags)
  • Media extraction (images and videos)
  • Pay Per Event billing
  • CheerioCrawler integration with Crawlee