Actor picture

Reddit Scraper

trudax/reddit-scraper

Free Reddit web scraper to crawl posts, comments, communities, and users without login. Limit web scraping by number of posts or items and extract all data in a dataset in multiple formats.

Author's avatarGustavo Rudiger
  • Modified
  • Users319
  • Runs2,988
Actor picture

Reddit Scraper

Start URLs

startUrls

Optional

array

If you already have URL(s) of page(s) you wish to scrape, you can set them here. If you want to use the search field bellow, remove all startUrls here.

Search Term

searches

Optional

array

Here you can provide a search query which will be used to search Reddit`s topics.

Search type

type

Optional

string

Select the type of content you want to scrape

Options:

"posts", "communities_and_users"

Sort search

sort

Optional

string

Sort search by Relevance, Hot, Top, New or Comments

Options:

"relevance", "hot", "top", "new", "comments"

Filter by date (Posts only)

time

Optional

string

Filter posts by last hour, week, day, month or year

Options:

"all", "hour", "day", "week", "month", "year"

Maximum number of items to be saved

maxItems

Optional

integer

The maximum number of items that will be saved in the dataset. If you are scrapping for Communities&Users, remember to consider that each category inside a community is saved as a separeted item.

Limit of posts scraped inside a single page

maxPostCount

Optional

integer

The maximum number of posts that will be scraped for each Posts Page or Communities&Users URL

Limit of comments scraped inside a single page

maxComments

Optional

integer

The maximum number of comments that will be scraped for each Comments Page.

Limit of `Communities & Users`'s pages scraped

maxCommunitiesAndUsers

Optional

integer

The maximum number of `Communities & Users`'s pages that will be scraped if your search or startUrl is a Communities&Users type.

Limit leaderboard items

maxLeaderBoardItems

Optional

integer

Limit of communities inside a leaderboard page that will be scraped

Extended Output Function

extendOutputFunction

Optional

string

Here you can write your custom javascript code to extract custom data from the page.

Page scroll timeout (seconds)

scrollTimeout

Optional

integer

Set the timeout in seconds in which the page will stop scrolling down to load new items

Proxy configuration

proxy

Optional

object

Either use Apify proxy, or provide your own proxy servers.

Debug Mode

debugMode

Optional

boolean

Activate to see detailed logs