
Reddit Scraper
- trudax/reddit-scraper
- Modified
- Users 688
- Runs 29.4k
- Created by
Gustavo Rudiger
Unlimited Reddit web scraper to crawl posts, comments, communities, and users without login. Limit web scraping by number of posts or items and extract all data in a dataset in multiple formats.
What does Reddit Scraper do?
Our unofficial Reddit API will get data from Reddit with no limitations or authentication. It enables you to extract posts and comments together with some user info without login. It is built on top of Apify SDK, and you can run it both on the Apify platform and locally.
Reddit Scraper allows you to:
- scrape the most popular subreddits (leaderboard).
- scrape subreddits (communities) with top posts, and community details such as moderator usernames, number of members, community URL, and category.
- scrape Reddit posts with title and text, username, number of comments, votes, media elements.
- get Reddit comments, timestamps, points, usernames, post and comment URLs.
- scrape user details, their most recent posts and comments.
- sort scraped data by categories of Relevance, Hot, Top, and New.
- scrape data using a specific URL or by keyword.
Only need a few Reddit results?
Use our super fast dedicated Free Reddit Scraper if you want to scrape Reddit data on a smaller scale. Just enter one or more Reddit URLs or keywords and click to scrape. Note that Free Reddit Scraper can only get you up to 10 posts, 10 comments, 2 communities, and 2 leaderboard items.
How much will it cost to scrape Reddit?
Reddit Scraper on the Apify platform will give you 1,000 results for less than $4 in platform usage credits. That should be covered by the free $5 in monthly credits you get on every Apify Free plan.
But if you need to get more data regularly from Reddit, you should grab an Apify subscription. We recommend our $49/month Personal plan - you can get well over 10,000 results every month with the $49 monthly plan!
How to scrape Reddit?
Reddit Scraper doesn't require any coding skills to start using it. If you're unsure where to start, just follow our step-by-step guide or see our short video tutorial. The tutorial steps can be also be used for Free Reddit Scraper.
How to use scraped Reddit data
- Keep track of discussions about your brand or product across Reddit communities.
- Research the topics that interest you and get a wide range of opinions.
- Keep an eye on debates over high stakes subjects such as finance, politics, new technology, and news in general.
- Watch out for new trends, attitudes, and PR opportunities.
- Automatically track mentions of the business or topic that interests you.
- Scrape Reddit comments to kick off and support your sentiment analysis.
Input parameters
If this actor is run on the Apify platform, there are two ways you can scrape Reddit:
- by Start URLs field - this will get you all details from any Reddit URL, no matter whether it's a post, a user, or a community.
- or by Search Term field - this will scrape all data from Reddit in Communities, Posts, and People for a specific keyword.
How to scrape Reddit by URLs
Almost any URL from Reddit will return a dataset. If the URL is not supported, the scraper will display a message before scraping the page.
Input examples:
Here are some examples of URLs that can be scraped.
-
scraping communities: https://www.reddit.com/r/worldnews/
-
scraping channels within communities: https://www.reddit.com/r/worldnews/hot
-
scraping popular communities: https://www.reddit.com/subreddits/leaderboard/crypto/
-
scraping users: https://www.reddit.com/user/lukaskrivka
-
scraping user comments: https://www.reddit.com/user/lukaskrivka/comments/
-
scraping posts: https://www.reddit.com/r/learnprogramming/comments/lp1hi4/is_webscraping_a_good_skill_to_learn_as_a_beginner/
-
scraping popular posts: https://www.reddit.com/r/popular/
-
scraping search results:
-
for users/communities: https://www.reddit.com/search/?q=news&type=sr%2Cuser
-
for posts: https://www.reddit.com/search/?q=news
Note: if you use a search URL
as a parameter for startUrls
, it will only scrape for posts. If you want to search for communities and users, use the search
field or a specific URL instead.
How to scrape Reddit by search term
-
Search Term or
searches
- the keywords you want to search via the Reddit's search engine. You can keep one field or add as many as you want. Don't use this field if you're using thestartUrls
parameter. -
Search type or
type
- indicates which part of Reddit you're scraping: "Posts" or "Communities and users". -
Sort search or
sort
- will sort search results by Relevance, Hot, Top, New or most amount of Comments. -
Filter by date or
time
- will filter the search by the last hour, day, week, month or year. Only available if you're scraping Posts.
To see the full list of parameters, their default values, and how to set the values of your own, head over to Input Schema tab.
Input example:
This is an example of how your input will look like if you decide to scrape all Reddit communities and users that contain the keyword parrot. Results will be sorted by the newest first.
{ "maxItems": 10, "maxPostCount": 10, "maxComments": 10, "maxCommunitiesAndUsers": 10, "maxLeaderBoardItems": 10, "scrollTimeout": 40, "proxy": { "useApifyProxy": true }, "debugMode": false, "searches": ["parrots"], "type": "communities_and_users", "sort": "new", "time": "all" }
Results
The output from scraping Reddit is stored in the dataset. Each post, comment, user or community is stored as an item inside the dataset. After the run is finished, you can download the scraped data onto your computer or export to any web app in various data formats (JSON, CSV, XML, RSS, HTML Table). Here's a few examples of the outputs you can get for different types of inputs:
Example Reddit post
{ "id": "ss5c25", "title": "Weekly Questions Thread / Open Discussion", "description": "For any questions regarding dough, sauce, baking methods, tools, and more, comment below.You can also post any art, tattoos, comics, etc here. Keep it SFW, though.As always, our wiki has a few sauce recipes and recipes for dough.Feel free to check out threads from weeks ago.This post comes out every Monday and is sorted by 'new'.", "numberOfVotes": "4", "createdAt": "3 days ago", "scrapedAt": "2022-01-09T22:52:48.489Z", "username": "u/AutoModerator", "numberOfComments": "19", "mediaElements": [], "tag": "HELP", "dataType": "post" }
Example Reddit comment
{ "url": "https://www.reddit.com/r/Pizza/comments/sud2hm/tomato_pie_from_sallys_apizza_stamford_ct/t1_hx9k9it", "username": "Acct-404", "createdAt": "9 h ago", "scrapedAt": "2022-03-09T12:52:48.547Z", "description": "Raises handUhhhh can I get some cheese on my pizza please?", "numberOfVotes": "3", "postUrl": "https://www.reddit.com/r/Pizza/comments/sud2hm/tomato_pie_from_sallys_apizza_stamford_ct/", "postId": "sud2hm", "dataType": "comment" }
Example Reddit community
{ "title": "Pizza", "alternativeTitle": "r/Pizza", "createdAt": "Created Aug 26, 2008", "scrapedAt": "2022-03-09T12:54:42.721Z", "members": 366000, "moderatos": [ "6745408", "AutoModerator", "BotTerminator", "DuplicateDestroyer" ], "url": "https://www.reddit.com/r/pizza/", "dataType": "community", "categories": ["hot", "new", "top", "rising"] }
Notes for developers
Limiting results with maxItems
If you need to limit the scope of your search, you can do that by setting the max number of posts you want to scrape inside a community or user. You can also set a limit to the number of comments for each post. You can limit the number of communities and the number of leaderboards by using the following parameters:
{ "maxPostCount": 50, "maxComments": 10, "maxCommunitiesAndUsers": 5, "maxLeaderBoardsItems": 5 }
You can also set maxItems
to prevent a very long run of the actor. This parameter will stop your scraper when it reaches the number of results you've indicated, so you need to be careful not to trim your results.
See the Input Schema tab for the full list of the ways to restrict Reddit Scraper using these parameters:
maxItems
, maxPostCount
, maxComments
, maxCommunitiesAndUsers
, maxLeaderBoardItems
Extend output function
You can use this function to update the result output of this actor. You can choose what data from the page you want to scrape. The output from this function will get merged with the result output.
The return value of this function has to be an object!
You can return fields to achieve 3 different things:
- Add a new field - Return object with a field that is not in the result output
- Change a field - Return an existing field with a new value
- Remove a field - Return an existing field with a value
undefined
async () => { return { pageTitle: document.querySelector("title").innerText, }; };
This example will add the title of the page to the final object:
{ "title": "Pizza", "alternativeTitle": "r/Pizza", "createdAt": "Created Aug 26, 2008", "scrapedAt": "2022-03-08T21:57:25.832Z", "members": 366000, "moderators": [ "6745408", "AutoModerator", "BotTerminator", "DuplicateDestroyer" ], "url": "https://www.reddit.com/r/pizza/", "categories": ["hot", "new", "top", "rising"], "dataType": "community", "pageTitle": "homemade chicken cheese masala pasta" }
- What does Reddit Scraper do?
- Only need a few Reddit results?
- How much will it cost to scrape Reddit?
- How to scrape Reddit?
- How to use scraped Reddit data
- Input parameters
- How to scrape Reddit by URLs
- Input examples:
- How to scrape Reddit by search term
- Input example:
- Results
- Example Reddit post
- Example Reddit comment
- Example Reddit community
- Notes for developers
- Limiting results with
- Extend output function