
Twitter Scraper
- quacker/twitter-scraper
- Modified
- Users 12.7k
- Runs 2.7M
- Created by
Quacker
Scrape tweets from any Twitter user profile. Top Twitter API alternative to scrape Twitter hashtags, threads, replies, followers, images, videos, statistics, and Twitter history. Download your data in any format, including JSON and Excel. Seamless integration with apps, reports, and databases.
2023-05-16
Changes:
- Add an experimental option
useNewProfileScraper
that allows scraping multiple profiles at once is now faster and more efficient - Add an option
skipPromotedTweets
that allows the users to skip promoted tweets
Features:
- Add
is_thread
,is_root_thread
, androot_thread_url
to the output - Add
includeThreadsOnly
option to include only threads in the output - Add
username
anduser_id
fields to output, ifaddUserInfo
is false
2023-05-09
Features:
- Add
repliesDepth
option to scrape replies of replies bases on the depth provided
Changes:
- Removed
addTweetViewCount
option, now the view count is always scraped
2023-05-05
Features:
- For retweets now the
retweet
object will contain the original tweet, and thetweet
object will contain the retweet - Add
skipRetweets
option to skip retweets - Add
collectOriginalRetweetOnly
option to collect only the original retweet
2023-04-28
Features:
- Now truncated tweets full text is scraped
- Add
is_truncated
to the output - Add the option
tweetsLanguage
to search tweets by language - Add the option
keywordsSearchType
to search for tweets that contain all the keywords or any of them or the exact phrase or none of them - Add
relativeToDate
andrelativeFromDate
to the input to search for tweets using relative dates instead of absolute dates
Fixes:
- Fixed an issue were quoted tweets were not always scraped
- Fixed an issue with
requestsFromUrl
- Fixed issue where
view_count
would be undefined for retweets
2023-04-25
Features:
- Add
quote_count
to the output
2023-04-09
Features:
- Add
video_url
orgif_url
tomedia
object in the output
2023-04-07
Fixes:
- Fixed issue where the browser would get stuck when there a couple of tweets only
Changes:
- Removed
browserFallback
option and updated the method of scraping headers that is faster, doesn't require a lot of resources and doesn't depend on another actor
2023-04-05
Fixes:
- Fixed issue with extracting info from quote tweets that are retweets
2023-04-04
Fixes:
- Some quoted tweets were not scraped
2023-04-03
Features:
- Add
is_retweet
to the output
Fixes:
- Fixed issue where pinned tweets are not scraped
- Filter replies in advanced search
2023-03-29
Features:
- Add new option
browserFallback
, this option enables a fallback to browser-based scraping if Cheerio requests fail (This will use another actor in order to function = more resources). When enabled, the actor will attempt to use the browser to retrieve tweets and provide the results to Cheerio for parsing. This process will occur automatically for every new request, improving the actor's ability to scrape tweets. - Now you will be able to scrape tweets indefinitely, the actor will try scrape all the tweets available for the given url, and it will stop when it reaches the
tweetsDesired
limit, that is if you enabled the new optionbrowserFallback
.
Changes:
- Remove
tweetsDesired
limit, now the you can scrape as many tweets as you want - Add runId to the key-value store to prevent conflicts with other runs
Fixes:
- Fixed reties to scrape tweets that are deleted
- Handle issue where the twitter website would get stuck
- Fixed issue where
view_count
would be undefined - Fixed issue where the actor would get stuck on some tweets that are not available
2023-03-15
Features:
- Added
useAdvancedSearch
option to use the advanced search instead of the regular search for content typeSearch
, it works withfromDate
,toDate
,searchTerms
andhandles
(usernames). It's disabled by default, and it doesn't scrape retweets.
2023-03-02
Features:
- Added
replying_to_tweet
to the output, which is the link to the tweet that the current tweet is replying to - Added
is_quote_tweet
andquoted_tweet
to the output, which is the tweet that the current tweet is quoting
Changes:
- Small code refactoring.
Fixes:
- Fixed issue where twitter sometimes return not tweets for a page, which was causing the request to finish without collecting all the tweets.
2023-02-02
Changes:
- The Actor now uses Cheerio Crawler for content type 'People'
- Improved
cheerioCrawler
stability.
Fixes:
- Fixed issue where the actor get stuck on private profiles
- Fixed issue where some requests where not handled properly or not added to the queue
2023-01-25
Changes:
- Increased default
maxRequestRetries
to 6
2023-01-23
Features:
- Add
profilesDesired
option to limit the number of scraped profiles
Changes:
- Improved cheerio scraper by allowing it to scrape tweets using any content type
- Improved logging for info and errors
- The actor now uses the
cheerioCrawler
by default, which is more reliable than thepuppeteerCrawler
- Removed the
tweetsDesired
max limit, which was 3200 tweets
2023-01-21
Changes:
- Changed the custom infiniteScroll function with the one provided by the SDK, which faster and more reliable
- Disabled the page's cache, which increased the total number of scraped tweets by 20-40%
Feature:
- Add
useCheerio
option to scrape tweets using cheerio crawler instead of puppeteer crawler, cheerio is more reliable, but it doesn't work when provided with login cookies
2023-01-19
Fix:
- Updated scrolling to be more efficient and fixed an issue where the scrolling would quite early
- Increased the default
maxIdleTimeoutSecs
to 60, to ensure that all tweets are scraped
Feature:
- Add
startUrl
to output
2023-01-11
Fix:
- Made scrolling slower to reduce the overload on the CPU
- Add a check to stop scrolling if the desired number of tweets is reached
- Increased the default
maxRequestRetries
to 6
2023-01-10
Feature:
- Add option
addTweetViewCount
to include tweet view count (it's hidden and enabled by default) - Add
view_count
to the output if the option is enabled
2022-08-20
Feature:
- Don't retry non-existing profiles
- More efficient scrolling
- Max concurrency increased to 3 by default
Fix:
- Login modal blocking the scrolling
- More resilient URL inputs and normalization
2022-08-10
Feature:
- Revamp to Typescript and Crawlee
Fix:
- Hanging timers on CPU overload
2022-02-25
Fix:
- Timeline v2 object
2022-02-10
Feature:
- Added '#sort_index' to the output
- Updated README
2022-02-03
Fix:
- Thread replies
2021-11-03
Fix:
- Search results
2021-08-09
Features:
- Update SDK 2
Bug fixes:
- User shape object for some profiles
2021-07-18
Features:
- Update to SDK 1.3.1
Changes:
- Change default timeout values
- Retiring of broken sessions
- Deals with pinned tweets
- Add debug log
Bug fixes:
- Fix thread extraction
2021-06-12
Features:
- Update to SDK 1.2.1
Fixes:
- New GraphQL format
2021-05-03
Features:
- Update to SDK 1.1.2
- Recursive "People" search
- Tweaks to wording in README and INPUT schema
Bug fixes:
- Filter cookies that lead to never loading page / 401 error
- Fetch data from GraphQl responses
2021-03-18
Features:
- Update to SDK 1.0.2
Fixes:
- Clicking on non-replies buttons
2021-02-26
Features:
- Scrape replies of replies
Fixes:
- Improve scraping stability
2021-02-04
Features:
- Add topics
- Add hashtags URLs
- Optimize end of listings
- Labels for outputScraperFunction for various scraper phases
Fixes:
- Deduplication of tweets
- Force retiring forever failing proxies
2021-01-19
- Add mentions, symbols, URLs and hashtags to output
- Add threads/status links support
2021-01-12
- BREAKING CHANGE: Format of the dataset has changed
- Search multiple terms at once, search hashtags and terms
- Enriched user profile information (some information are only available when logged in)
- Added minimum and max tweet dates
- Updated SDK version
- Custom data
- Powerful extend output / scraper function
2020-11-25
- Remove the need to provide credentials
- Update SDK version
- Allow to filter profile tweets for own tweets or include replies
- Scrape faster when there's no login information
- Accept twitter URLs, handles or
@usernames
for better user experience - Throws immediately if invalid handles are passed
ON THIS PAGE
- 2023-05-16
- 2023-05-09
- 2023-05-05
- 2023-04-28
- 2023-04-25
- 2023-04-09
- 2023-04-07
- 2023-04-05
- 2023-04-04
- 2023-04-03
- 2023-03-29
- 2023-03-15
- 2023-03-02
- 2023-02-02
- 2023-01-25
- 2023-01-23
- 2023-01-21
- 2023-01-19
- 2023-01-11
- 2023-01-10
- 2022-08-20
- 2022-08-10
- 2022-02-25
- 2022-02-10
- 2022-02-03
- 2021-11-03
- 2021-08-09
- 2021-07-18
- 2021-06-12
- 2021-05-03
- 2021-03-18
- 2021-02-26
- 2021-02-04
- 2021-01-19
- 2021-01-12
- 2020-11-25