Twitter Explorer avatar
Twitter Explorer
Try for free

7 days trial then $30.00/month - No credit card required now

View all Actors
Twitter Explorer

Twitter Explorer

jupri/twitter-scraper
Try for free

7 days trial then $30.00/month - No credit card required now

All-in-one solution to scrape every aspect of Twitter.com

Welcome to Twitter X-Plorer

description

About Twitter.com

Twitter is an online social media and social networking service owned and operated by American company X Corp., the legal successor of Twitter, Inc. Twitter users outside the United States are legally served by the Ireland-based Twitter International Unlimited Company, which makes these users subject to Irish and European Union data protection laws[9][10]

About This Actor

All-In-One solution for Twitter Data Extracting :

  • ⭐Scrape User timeline, followers, following, media, likes, lists, topics, highlights, etc.
  • ⭐Scrape List timeline, members, followers
  • ⭐Scrape Specific Topic
  • ⭐Scrape Status & Threads
  • ⭐Scrape Media: Photo / Video / Gif, etc.
  • ⭐Advanced Search
  • ⭐Content Formatting: HTML or Plain TEXT
  • ⭐Low Memory Cost
  • ⭐A.S.A.P

Disclaimer:

  • This actor using anonymous credential that bound to twitter rate-limit. More people accessing the actor, more fast the limit exceeded.
  • You can provide your own token (auth_token) that also bound to rate-limit but with different "bucket".
  • First, try without your own token. If there is no result then you can try using your own token (auth_token). Please refer to README page to obtain auth_token cookie value.
  • This Twitter scraper only collects data that’s publicly available. This means data that’s accessible without logging in to Twitter and without accepting Twitter’s terms of use. Please note that if you accepted Twitter’s terms of use, your ability to scrape Twitter data may be limited. If that is the case, please review the terms and make an informed decision yourself.
  • By providing your personal auth_token, you agree to obey twitter TOS, especially regarding rate-limit policy 50 request per 15 minutes enforced by elonmusk

INPUT

There are 2 primary parameters that cannot be used together at the same time. Choose one according to your needs.

ParameterTypeDescription
querystringusing specific end-point
filtersobjectusing advanced search

Other additional parameters :

ParameterTypeValueDescription
limitintegernumericlimit number of results
contentstringtext, htmlformat of results content
tokenstringcookievalue of auth_token cookie

TUTORIAL

Basically there are 2 method to use this Actor: Using query or using filters.

Lesson #1 : Using query parameter

1# KEYWORDS
2{"query": "Web Scraping"} 
3# #HASHTAG
4{"query": "#AI"}
5# @USERNAME
6{"query": "@apify"}
7# @USERNAME/media
8{"query": "@apify/media"}
9# @USERNAME/lists
10{"query": "@elonmusk/lists"}
11# topic/TOPIC_ID
12{"query": "topic/1280550787207147521"} # Arts & Culture
13# list/LIST_ID
14{"query": "list/1334803406523953152"} # Web scraping + automation

The Actor will use different end-point depends on The value specified. Possible values for query parameter :

FormatExampleDescription
KEYWORDSWeb ScrapingSame as filters.raw
#HASHTAG#AISame as filters.hashtag
@USERNAME@elonmusk/mediauser specific section/timeline
@USERNAME/replies
@USERNAME/media
@USERNAME/likes
@USERNAME/affiliates
@USERNAME/followers
@USERNAME/verified_followers
@USERNAME/followers_you_know
@USERNAME/following
@USERNAME/lists
@USERNAME/topics
@USERNAME/highlights
@USERNAME/subscriptions
@USERNAME/articles
STATUS_ID1562015197543497728status (numeric status-id)
STATUS_ID/posts
STATUS_ID/quotes
STATUS_ID/reposts
STATUS_ID/likes
topic/TOPIC_IDtopic/1280550787207147521topic timeline (numeric topic-id)
list/LIST_IDlist/1334803406523953152list timeline (numeric list-id)
list/LIST_ID/memberslist members
list/LIST_ID/followerslist followers
community/COMM_ID/top
community/COMM_ID/latest
community/COMM_ID/media

Lesson #2 : Using filters parameter

1{
2	"filters.from"		: "@apify",  		# (@ symbol is optional)
3	"filters.hashtag" 	: "#scraper",		# (# symbol is optional)
4	"filters.type" 		: "videos",
5	"filters.phrase" 	: "Space The Final Frontier",
6	"filters.replies" 	: 250,
7	"filters.since"		: "2022-01-20"	
8}

The Actor will use Advanced Search of The Twitter public API. Use this for searching.

ParameterTypeExampleSummaryDescription
filters.rawstringAnimals +cat -dog lang:enSearching raw query
filters.typestringOne of: latest, top, photos, videos, peoplePost type
filters.wordstringwhat’s happeningboth “what’s” and “happening”All of these words
filters.phrasestringhappy hourthe exact phrase “happy hour”This exact phrase
filters.anystringcats dogseither “cats” or “dogs” (or both)Any of these words
filters.excludestringcats dogsdoes not contain “cats” and does not contain “dogs”None of these words
filters.hashtagstring#ThrowbackThursdaythe hashtag #ThrowbackThursdayThese hashtags
filters.fromstring@Twittersent from @TwitterFrom these accounts
filters.tostring@Twittersent in reply to @TwitterTo these accounts
filters.mentionstring@SFBART @Caltrainmentions @SFBART or mentions @CaltrainMentioning these accounts
filters.repliesinteger250Minimum replies
filters.favesinteger200Minimum likes
filters.retweetsinteger100Minimum retweets
filters.sincedate2022-01-20Since date YYYY-MM-DD
filters.untildate2022-02-30Until date YYYY-MM-DD

Lesson #3 : token (auth_token) cookie

Some function need auth_token to work properly (required sign-in to Twitter.com). When you receive log error something like below, then probably you need to supply parameters with auth_token value.

1❌ Authorization: Denied by access control: unspecified reason
2❌ HTTP error 404: Not Found

Important Notes :

  • This is NOT your APIFY Token, instead a value from your browser cookie, named auth_token.
  • Use this only if necessary, as it have risk your account getting blocked by @elonmusk.
  • Your cookies is your SECRET. Please don't share it with someone else.
  • The auth_token value will always valid until you logged out from Twitter.com

To get auth_token cookie value :

  1. Login to Twitter.com
  2. Open Chrome Developer Tools (Ctrl + Shift + I)
  3. Open Application Tab
  4. On left panel, go to: Storage -> Cookies -> https://twitter.com
  5. Find cookie named auth_token (40 characters string value).
  6. Copy & Paste Here :
1{
2	"query": "@elonmusk/followers",
3	"token": "YOUR_TWITTER_AUTH_TOKEN"
4}

enter image description here

Lesson #4: Content Formatting

In case you want content format in HTML use content parameter :

{ "query": "@apify", "limit": 10, "content": "html" }

Lesson #5: User Info (WORK IN PROGRESS)

Normally, every tweet includes a User Information. This could make Dataset full of Redundant User Info, especially when tweets come from a same User. To reduce Dataset size, You have several options using userinfo parameter :

  1. basic : Instead full User Info, include only basic user info: name, id, screen
  2. separate : Place User Info into separate list (top of dataset)
  3. collect : Collect User Info into 1 row key-value record (top of dataset)
  4. disable : Disable User Info completely.
  5. full : This is default behavior.

Example :

1# example
2{ 
3	"query": "@apify/replies",
4	"limit": 100,
5	"userinfo": "collect"
6}

Output : userinfo = collect

1[
2	{ 
3		"__type": "users", 
4		"content" : {
5			"10001": {"name": "ChatGPT", ... },
6			"10002": {"name": "Bard", ... }
7		}
8	},
9	...
10	{ "__type": "tweet", "id": "123456", "user": "10001", "content": "What's happening ?",  ...},
11	{ "__type": "tweet", "id": "789101", "user": "10002", "content": "AI is happening.", ...},
12	{ "__type": "tweet", "id": "112233", "user": "10001", "content": "Cool !", ...},
13	...
14]

Output : userinfo = separate

1[
2	{ "__type": "user", "id" : "10001", "name": "ChatGPT", ... }, 
3	{ "__type": "user", "id" : "10002", "name": "Bard", ... },
4	...
5	{ "__type": "tweet", "id": "123456", "user": "10001", "content": "What's happening ?",  ...},
6	{ "__type": "tweet", "id": "789101", "user": "10002", "content": "AI is happening.", ...},
7	{ "__type": "tweet", "id": "112233", "user": "10001", "content": "Cool !", ...},
8	...
9]

Next Lesson

... almost there ...

Did You Know ?

Twitter internally detect faces on images.

1...
2"media": [  
3	{  
4		"id": "1695968940185534464",  
5		"image": "https://pbs.twimg.com/media/F4lKw_XawAAT-xn.jpg",  
6		"url": "https://t.co/Lk3VtAaKQ7",  
7		"features": {  
8			"orig":  {  
9				"faces":  [{"x": 315, "y": 130, "h": 232,"w": 232}]  
10			}  
11		},  
12		"__type": "photo",
13		"key":  "3_1695968940185534464"  
14	}
15]

To-Do List

  • Construct status URL: https://twitter.com/_/status/<status_id> or https://twitter.com/<screen-name>/status/<status-id>
  • Resize and format images URL: https://pbs.twimg.com/media/xxxxxxxxxx.jpg?format=[jpg|png|webp]&name=[orig|normal|large|medium|thumb]
Developer
Maintained by Community
Actor metrics
  • 45 monthly users
  • 92.6% runs succeeded
  • 30.6 days response time
  • Created in May 2023
  • Modified 9 days ago
Categories