X Crawler
3 days trial then $3.00/month - No credit card required now
X Crawler
3 days trial then $3.00/month - No credit card required now
This project is a web scraper designed to extract user data and tweets from X (formerly known as Twitter) using Crawlee and Playwright.
X (Twitter) Crawler
This project is a web scraper designed to extract user data and tweets from X (formerly known as Twitter) using Crawlee and Playwright.
Purpose
The main purpose of this scraper is to:
- Navigate to a specified X user profile
- Extract user information
- Collect 100 most liked tweets from the user's timeline
This tool can be useful for various applications, such as:
- Social media analysis
- User behavior research
- Content aggregation
- Sentiment analysis
Features
- Utilizes Playwright for browser automation
- Implements Crawlee for efficient web crawling
- Extracts user profile data
- Collects recent tweets from the user's timeline
- Handles X's dynamic content loading
Usage
- Set the target user's profile URL in the
startUrls
array in the input configuration. - Adjust the
maxRequestsPerCrawl
value to limit the number of requests if needed. - Run the scraper to collect data.
Output
The scraper outputs two main types of data:
- User information
- Recent tweets from the user's timeline
The collected data is structured and can be easily processed for further analysis or integration into other systems.
Output Example
Here's an example of the structured output you can expect from this scraper:
User Object
1"user": { 2"__typename": "User", 3"id": "VXNlcjo0NDE5NjM5Nw==", 4"rest_id": "44196397", 5"affiliates_highlighted_label": { 6"label": { 7"url": { 8"url": "https://twitter.com/X", 9"urlType": "DeepLink" 10}, 11"badge": { 12"url": "https://pbs.twimg.com/profile_images/1683899100922511378/5lY42eHs_bigger.jpg" 13}, 14"description": "X", 15"userLabelType": "BusinessLabel", 16"userLabelDisplayType": "Badge" 17} 18}, 19"is_blue_verified": true, 20"profile_image_shape": "Circle", 21"legacy": { 22"created_at": "Tue Jun 02 20:12:29 +0000 2009", 23"default_profile": false, 24"default_profile_image": false, 25"description": "", 26"entities": { 27"description": { 28"urls": [] 29} 30}, 31"fast_followers_count": 0, 32"favourites_count": 60807, 33"followers_count": 189827332, 34"friends_count": 662, 35"has_custom_timelines": true, 36"is_translator": false, 37"listed_count": 152087, 38"location": "", 39"media_count": 2308, 40"name": "Elon Musk", 41"normal_followers_count": 189827332, 42"pinned_tweet_ids_str": [ 43"1813310196506349995" 44], 45"possibly_sensitive": false, 46"profile_banner_url": "https://pbs.twimg.com/profile_banners/44196397/1690621312", 47"profile_image_url_https": "https://pbs.twimg.com/profile_images/1780044485541699584/p78MCn3B_normal.jpg", 48"profile_interstitial_type": "", 49"screen_name": "elonmusk", 50"statuses_count": 47242, 51"translator_type": "none", 52"verified": false, 53"withheld_in_countries": [] 54}, 55"professional": { 56"rest_id": "1679729435447275522", 57"professional_type": "Creator", 58"category": [] 59}, 60"tipjar_settings": { 61"is_enabled": false, 62"bandcamp_handle": "", 63"bitcoin_handle": "", 64"cash_app_handle": "", 65"ethereum_handle": "", 66"gofundme_handle": "", 67"patreon_handle": "", 68"pay_pal_handle": "", 69"venmo_handle": "" 70}, 71"legacy_extended_profile": {}, 72"is_profile_translatable": false, 73"has_hidden_subscriptions_on_profile": false, 74"verification_info": { 75"is_identity_verified": false, 76"reason": { 77"description": { 78"text": "This account is verified because it's an affiliate of @X on X. Learn more", 79"entities": [ 80{ 81"from_index": 54, 82"to_index": 56, 83"ref": { 84"url": "https://twitter.com/X", 85"url_type": "ExternalUrl" 86} 87}, 88{ 89"from_index": 63, 90"to_index": 73, 91"ref": { 92"url": "https://help.twitter.com/en/rules-and-policies/profile-labels", 93"url_type": "ExternalUrl" 94} 95} 96] 97}, 98"verified_since_msec": "-156836000000000", 99"override_verified_year": -3000 100} 101}, 102"highlights_info": { 103"can_highlight_tweets": true, 104"highlighted_tweets": "265" 105}, 106"user_seed_tweet_count": 0, 107"business_account": {}, 108"creator_subscriptions_count": 151 109},
Tweet Object
1{ 2"__typename": "Tweet", 3"rest_id": "1519480761749016577", 4"unmention_data": {}, 5"is_translatable": false, 6"views": { 7"state": "Enabled" 8}, 9"source": "<a href=\"http://twitter.com/download/iphone\" rel=\"nofollow\">Twitter for iPhone</a>", 10"legacy": { 11"bookmark_count": 21256, 12"bookmarked": false, 13"created_at": "Thu Apr 28 00:56:58 +0000 2022", 14"conversation_id_str": "1519480761749016577", 15"display_text_range": [ 160, 1752 18], 19"entities": { 20"hashtags": [], 21"symbols": [], 22"timestamps": [], 23"urls": [], 24"user_mentions": [] 25}, 26"favorite_count": 4468299, 27"favorited": false, 28"full_text": "Next I’m buying Coca-Cola to put the cocaine back in", 29"is_quote_status": false, 30"lang": "en", 31"quote_count": 166677, 32"reply_count": 182762, 33"retweet_count": 625073, 34"retweeted": false, 35"user_id_str": "44196397", 36"id_str": "1519480761749016577" 37}, 38"quick_promote_eligibility": { 39"eligibility": "IneligibleUserUnauthorized" 40} 41},
Note
Please ensure you comply with X's terms of service and respect rate limits when using this scraper.
Actor Metrics
7 monthly users
-
1 star
>99% runs succeeded
Created in Jul 2024
Modified 3 months ago