Twitter Scraper

  • quacker/twitter-scraper
  • Modified
  • Users 9.5k
  • Runs 1.2M
  • Created by Author's avatarQuacker

Scrape tweets from any Twitter user profile. Top Twitter API alternative to scrape Twitter hashtags, threads, replies, followers, images, videos, statistics, and Twitter history. Download your data in any format, including JSON and Excel. Seamless integration with apps, reports, and databases.

What search terms do you want to scrape?

searchTerms

Optional

array

If you add search terms, the scraper will find and extract tweets that mention those terms. If you want to search for hashtags, just use the # at the beginning of your search term, so instead of webscraping search for #webscraping. Alternatively, scroll further down to scrape by Twitter profile or URL.

Do you want to filter tweets by content?

searchMode

Optional

string

This setting will change the way the scraper sorts Twitter data before extracting it: by latest or top tweets, people, photos or videos.

Options:

"live", "image", "video", "user"

Limit profiles (Only for content type 'People')

profilesDesired

Optional

integer

Limit the number of profiles to scrape. This is useful if you want to scrape a lot of tweets, but only from a few profiles.

Set the maximum number of tweets (per search query)

tweetsDesired

Optional

integer

This value lets you set the maximum number of tweets to retrieve per search query

Get tweet view count

addTweetViewCount

Optional

boolean

This allows you to get the number of times a tweet has been viewed

Add user information

addUserInfo

Optional

boolean

Extends the tweets with user information. You can decrease the size of your dataset by turning this off.

Use Cheerio

useCheerio

Optional

boolean

Use Cheerio instead of Puppeteer to scrape the page. This has a very high chance to scrape all tweets and it's faster for multiple scrapes, but slower for single scrapes. It's only enabled if (content type is not `People`, and not logged in)

Do you want to scrape by Twitter profiles?

handle

Optional

array

You can add the twitter handles of specific profiles you want to scrape. This is a shortcut so that you don't have to add a full username URLs such as https://twitter.com/username

Do you want to scrape replies in addition to tweets?

mode

Optional

string

You can either limit scraping to tweets or also include replies. Note that this only applies when scraping Twitter profiles.

Options:

"own", "replies"

Do you want to scrape by Twitter URL?

startUrls

Optional

array

This lets you tell the scraper where to start. You can enter Twitter URLs one by one. You can also link to or upload a text file with a list of URLs.

Tweets newer than

toDate

Optional

string

Scrape tweets newer than this date, format YYYY-MM-DD. You can use this in conjunction with 'Tweets older than' to create a limited time slice.

Tweets older than

fromDate

Optional

string

Scrape tweets from this date and before, format YYYY-MM-DD. You can use this in conjunction with 'Tweets newer than' to create a limited time slice.

Use advanced search (Works only with searchTerms, handle, and dates)

useAdvancedSearch

Optional

boolean

Use advanced search instead of the default search. This is useful if you want to scrape tweets using a searchTerm, user handle, or using a date range. Note that enabling this option doesn't scrape `retweets`.

Proxy configuration

proxyConfig

Required

object

This is required if you want to use Apify Proxy.

Extend Output Function

extendOutputFunction

Optional

string

Add or remove properties on the output object or omit the output returning null

Extend Scraper Function

extendScraperFunction

Optional

string

Advanced function that allows you to extend the default scraper functionality, allowing you to manually perform actions on the page

Custom data

customData

Optional

object

Any data that you want to have available inside the Extend Output/Scraper Function

Max timeout seconds (For browser scraping only)

handlePageTimeoutSecs

Optional

integer

Max timeout for the handlePageFunction. Can be increased for long running processes

Max request retries

maxRequestRetries

Optional

integer

Set the max request retries

Scrolling idle seconds

maxIdleTimeoutSecs

Optional

integer

Configures how many seconds of no data received will be considered done

Debug log

debugLog

Optional

boolean

Enable debug log

Login cookies

initialCookies

Optional

array

Your login cookies will be used to bypass the login wall. Check the README for detailed instructions.