Twitter Scraper

  • quacker/twitter-scraper
  • Modified
  • Users 9.5k
  • Runs 1.2M
  • Created by Author's avatarQuacker

Scrape tweets from any Twitter user profile. Top Twitter API alternative to scrape Twitter hashtags, threads, replies, followers, images, videos, statistics, and Twitter history. Download your data in any format, including JSON and Excel. Seamless integration with apps, reports, and databases.

What data can Twitter Scraper extract?

Twitter Scraper crawls specified Twitter profiles and URLs, and extracts:

  • 🔍 User information, such as name, Twitter handle (username), location, follower/following count, profile URL/image/banner, date of creation.
  • 🐦 List of tweets, retweets, and replies from profiles.
  • 📊 Statistics for each tweet: favorites, replies, and retweets for each tweet.
  • 🔎 Search hashtags, get top, latest, people, picture, or video tweets.

Our Twitter Scraper enables you to extract large amounts of data from Twitter. It lets you do much more than the Twitter API, because it doesn't have rate limits and you don't even need to have a Twitter account, a registered app, or Twitter API key.

You can crawl based on a list of Twitter handles or just by using a Twitter URL such as a search, trending topics, or hashtags.

Why use Twitter Scraper?

Scraping Twitter will give you access to the more than 500 million tweets posted every day. You can use that data in lots of different ways:

  • 🔍 Track discussions about your brand, products, country, or city.
  • 👥 Monitor your competitors and see how popular they really are, and how you can get a competitive edge.
  • 🔎 Keep an eye on new trends, attitudes, and fashions as they emerge.
  • 🧠 Use the data to train AI models or for academic research.
  • 😊 Track sentiment to make sure your investments are protected.
  • 🚫 Fight fake news by understanding the pattern of how misinformation spreads.
  • ✈️ Explore discussions about travel destinations, services, amenities, and take advantage of local knowledge.
  • 💰 Analyze consumer habits and develop new products or target underdeveloped niches.

How do I use Twitter Scraper?

If you need guidance on how to run the scraper, you can read our step-by-step tutorial or watch a short video tutorial ▷ on YouTube.

Apify - Twitter

It is legal to scrape Twitter to extract publicly available information, but you should be aware that the data extracted might contain personal data. Personal data is protected by GDPR in the European Union and by other regulations around the world. You should not scrape personal data unless you have a legitimate reason to do so. If you're unsure whether your reason is legitimate, consult your lawyers. You can also read our blog post on the legality of web scraping.

Want more Twitter scraping options?

If you want to keep your scraping tasks as quick and easy as possible, you should try these targeted Twitter scrapers for simpler and more targeted scraping activities ⬇️

🔎 Twitter URL Scraper | 🔍 Easy Twitter Search Scraper

🖼️ Twitter Image Scraper | 📜 Twitter History Scraper

🔄 Twitter Latest Scraper | 🎥 Twitter Video Scraper

👤 Twitter Profile Scraper | 📊 Twitter Info Scraper

🥾 Twitter Explore Scraper | 📝 Twitter List Scraper

#️⃣ Twitter History Hashtag Scraper

Tips and tricks

Using the URL option

The default option is to scrape using search terms, but you can also scrape by Twitter handles or Twitter URLs. If you want to use the URL option, these are the supported Twitter URL types ⬇️

Using cookies to log in

This solution allows you to log in using the already initialized cookies of a logged-in user. If you use this option, the scraper will do as much as possible to prevent the account from being banned (slow down to just one page open at a time and introduce delays between actions).

It's highly recommended that you don't use your personal account (unless you really have to). You should instead create a new Twitter account to use with this solution. Using your personal account could result in the account being banned by Twitter.

To log in using cookies, you can use a Chrome browser extension such as EditThisCookie. Once you have installed it, open Twitter in your browser, log in with the account you want to use, and export cookies with the extension. This should give you an array of cookies that you can paste as a value for the loginCookies input field.

If you log out of the Twitter account connected to the cookies, it will invalidate them, and your solution will stop working.

Here's a short video tutorial ▷ on YouTube to help you figure it out.

Apify - Twitter

Input parameters

Twitter Scraper has the following input options:

Apify - Twitter Scraper input

Twitter data output

You can download the resulting dataset in various formats such as JSON, HTML, CSV or Excel. Each item in the dataset will contain a separate tweet following this format:


  

[{

"user": {

"protected": false,

"created_at": "2009-06-02T20:12:29.000Z",

"default_profile_image": false,

"description": "",

"fast_followers_count": 0,

"favourites_count": 19158,

"followers_count": 130769125,

"friends_count": 183,

"has_custom_timelines": true,

"is_translator": false,

"listed_count": 117751,

"location": "",

"media_count": 1435,

"name": "Elon Musk",

"normal_followers_count": 130769125,

"possibly_sensitive": false,

"profile_banner_url": "https://pbs.twimg.com/profile_banners/44196397/1576183471",

"profile_image_url_https": "https://pbs.twimg.com/profile_images/1590968738358079488/IY9Gx6Ok_normal.jpg",

"screen_name": "elonmusk",

"statuses_count": 23422,

"translator_type": "none",

"verified": true,

"withheld_in_countries": [],

"id_str": "44196397"

},

"id": "1633026246937546752",

"conversation_id": "1632363525405392896",

"full_text": "@MarkChangizi Sweden’s steadfastness was incredible!",

"reply_count": 243,

"retweet_count": 170,

"favorite_count": 1828,

"hashtags": [],

"symbols": [],

"user_mentions": [

{

"id_str": "49445813",

"name": "Mark Changizi",

"screen_name": "MarkChangizi"

}

],

"urls": [],

"media": [],

"url": "https://twitter.com/elonmusk/status/1633026246937546752",

"created_at": "2023-03-07T08:46:12.000Z",

"is_quote_tweet": false,

"replying_to_tweet": "https://twitter.com/MarkChangizi/status/1632363525405392896",

"startUrl": "https://twitter.com/elonmusk/with_replies"

},

{

"user": {

"protected": false,

"created_at": "2009-06-02T20:12:29.000Z",

"default_profile_image": false,

"description": "",

"fast_followers_count": 0,

"favourites_count": 19158,

"followers_count": 130769125,

"friends_count": 183,

"has_custom_timelines": true,

"is_translator": false,

"listed_count": 117751,

"location": "",

"media_count": 1435,

"name": "Elon Musk",

"normal_followers_count": 130769125,

"possibly_sensitive": false,

"profile_banner_url": "https://pbs.twimg.com/profile_banners/44196397/1576183471",

"profile_image_url_https": "https://pbs.twimg.com/profile_images/1590968738358079488/IY9Gx6Ok_normal.jpg",

"screen_name": "elonmusk",

"statuses_count": 23422,

"translator_type": "none",

"verified": true,

"withheld_in_countries": [],

"id_str": "44196397"

},

"id": "1633021151197954048",

"conversation_id": "1632930485281120256",

"full_text": "@greg_price11 @Liz_Cheney @AdamKinzinger @RepAdamSchiff Besides misleading the public, they withheld evidence for partisan political reasons that sent people to prison for far more serious crimes than they committed.\n\nThat is deeply wrong, legally and morally.",

"reply_count": 727,

"retweet_count": 2458,

"favorite_count": 10780,

"hashtags": [],

"symbols": [],

"user_mentions": [

{

"id_str": "896466491587080194",

"name": "Greg Price",

"screen_name": "greg_price11"

},

{

"id_str": "98471035",

"name": "Liz Cheney",

"screen_name": "Liz_Cheney"

},

{

"id_str": "18004222",

"name": "Adam Kinzinger #fella",

"screen_name": "AdamKinzinger"

},

{

"id_str": "29501253",

"name": "Adam Schiff",

"screen_name": "RepAdamSchiff"

}

],

"urls": [],

"media": [],

"url": "https://twitter.com/elonmusk/status/1633021151197954048",

"created_at": "2023-03-07T08:25:57.000Z",

"is_quote_tweet": false,

"replying_to_tweet": "https://twitter.com/greg_price11/status/1632930485281120256",

"startUrl": "https://twitter.com/elonmusk/with_replies"

}]

...

You can use a predefined search using Advanced Search as a startUrl, e.g. https://twitter.com/search?q=cool%20until%3A2020-01-01&src=typed_query

This returns only tweets containing "cool" before 2020-01-01.

Workaround for max tweets limit

By default, Twitter will return only at most 3,200 tweets per profile or search. If you need to get more than the maximum number, you can split your start URLs with time slices, like this:

  • https://twitter.com/search?q=(from%3Aelonmusk)%20since%3A2020-03-01%20until%3A2020-04-01&src=typed_query&f=live

  • https://twitter.com/search?q=(from%3Aelonmusk)%20since%3A2020-02-01%20until%3A2020-03-01&src=typed_query&f=live

  • https://twitter.com/search?q=(from%3Aelonmusk)%20since%3A2020-01-01%20until%3A2020-02-01&src=typed_query&f=live

All URLs are from the same profile (elonmusk), but they are split by month (January -> February -> March 2020). This can be created using Twitter "Advanced Search" on https://twitter.com/search

You can use bigger intervals for profiles that don't post very often.

Other limitations include:

  • Live tweets are capped by at most 1 day in the past (use the search filters above to get around this)

  • Most search modes are capped at around 150 tweets (Top, Videos, Pictures)

Extend output function

This parameter allows you to change the shape of your dataset output, split arrays into separate dataset items, or filter the output:


  

async ({ data, item, request }) => {

  

item.user = undefined; // removes this field from the output

  

delete item.user; // this works as well

  

  

const raw = data.tweets[item['#sort_index']]; // allows you to access the raw data

  

  

item.source = raw.source; // adds "Twitter for ..." to the output

  

  

if (request.userData.search) {

  

item.search = request.userData.search; // add the search term to the output

  

item.searchUrl = request.loadedUrl; // add the raw search URL to the output

  

}

  

  

return item;

  

}

  

Filtering items:


  

async ({ item }) => {

  

if (!item.full_text.includes('lovely')) {

  

return  null; // omit the output if the tweet body doesn't contain the text

  

}

  

  

return item;

  

}

  

Splitting into multiple dataset items and change the output completely:


  

async ({ item }) => {

  

// dataset will be full of items like { hashtag: '#somehashtag' }

  

// returning an array here will split in multiple dataset items

  

return item.hashtags.map((hashtag) => {

  

return { hashtag: `#${hashtag}` };

  

});

  

}

  

Extend scraper function

This parameter allows you to extend how the scraper works and can make it easier to extend the default functionality without having to create your own custom version. For example, you can include a search of the trending topics on each page visit:


  

async ({ page, request, addSearch, addProfile, addThread, customData }) => {

  

await page.waitForSelector('[aria-label="Timeline: Trending now"] [data-testid="trend"]');

  

  

const trending = await page.evaluate(() => {

  

const trendingEls = $('[aria-label="Timeline: Trending now"] [data-testid="trend"]');

  

  

return trendingEls.map((_, el) => {

  

return {

  

term: $(el).find('> div > div:nth-child(2)').text().trim(),

  

profiles: $(el).find('> div > div:nth-child(3) [role="link"]').map((_, el) => $(el).text()).get()

  

}

  

}).get();

  

});

  

  

for (const { search, profiles } of trending) {

  

await addSearch(search); // add a search using text

  

  

for (const profile of profiles) {

  

await addProfile(profile); // adds a profile using link

  

}

  

}

  

  

// adds a thread and get replies. can accept an id, like from conversation_id or an URL

  

// you can call this multiple times but will be added only once

  

await addThread("1351044768030142464");

  

}

  

Additional variables are available inside extendScraperFunction:


  

async ({ label, response, url }) => {

  

if (label === 'response' && response) {

  

// inside the page.on('response') callback

  

if (url.includes('live_pipeline')) {

  

// deal with plain text content

  

const blob = await (await response.blob()).text();

  

}

  

} else  if (label === 'before') {

  

// executes before the page.on('response'), can be used for intercept request/response

  

} else  if (label === 'after') {

  

// executes after the scraping process has finished, even on crash

  

}

  

}

  

Integrations and Twitter Scraper

Last but not least, Twitter Scraper can be connected with almost any cloud service or web app thanks to integrations on the Apify platform. You can integrate with Make, Zapier, Slack, Airbyte, GitHub, Google Sheets, Google Drive, and more. Or you can use webhooks to carry out an action whenever an event occurs, e.g. get a notification whenever Twitter Scraper successfully finishes a run.

Using Twitter Scraper with the Apify API

The Apify API gives you programmatic access to the Apify platform. The API is organized around RESTful HTTP endpoints that enable you to manage, schedule, and run Apify actors. The API also lets you access any datasets, monitor actor performance, fetch results, create and update versions, and more.

To access the API using Node.js, use the apify-client NPM package. To access the API using Python, use the apify-client PyPI package.

Check out the Apify API reference docs for full details or click on the API tab for code examples.

Industries

See how Twitter Scraper is used in industries around the world