Reddit Scraper avatar

Reddit Scraper

Pricing

Pay per usage

Go to Apify Store
Reddit Scraper

Reddit Scraper

Scrape posts and comments from Reddit

Pricing

Pay per usage

Rating

0.0

(0)

Developer

Jan Bruinier

Jan Bruinier

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

3 hours ago

Last modified

Categories

Share

Scrape posts and comments from any subreddit. Uses Reddit's public JSON API -- no API key or login required.

What it does

This actor pulls posts from any public subreddit with full metadata: scores, authors, timestamps, flairs, and more. Optionally fetches the entire comment tree for each post with configurable depth. Supports sorting by hot/new/top/rising and searching within subreddits.

Use cases

  • Market research: Track what people are saying about your product or industry on Reddit.
  • Content research: Find trending topics and discussions in any niche.
  • Sentiment analysis: Collect posts and comments for NLP analysis.
  • Competitor monitoring: Set up scheduled runs to track mentions of competitor brands.
  • Academic research: Gather discussion data for social science or linguistics studies.
  • Lead generation: Find people asking for recommendations in your product category.

Input options

ParameterDescriptionDefault
SubredditSubreddit name without "r/" prefixRequired
Sort Orderhot, new, top, or risinghot
TimeframeTime filter for "top" sort (hour/day/week/month/year/all)week
Search QuerySearch within the subredditEmpty (no search)
Include CommentsAlso scrape comments for each postNo
Max Comment DepthHow deep to go into reply chains (1-10)3
Max PostsNumber of posts to scrape (up to 1000)25

Output format

Posts:

{
"_type": "post",
"post_id": "abc123",
"title": "What's the best Python web framework in 2024?",
"author": "pythondev42",
"subreddit": "python",
"score": 342,
"upvote_ratio": 0.95,
"num_comments": 89,
"created_utc": "2024-01-15T10:30:00+00:00",
"selftext": "I've been using Flask for years but...",
"url": "https://reddit.com/r/python/comments/abc123/...",
"permalink": "https://www.reddit.com/r/python/comments/abc123/...",
"flair": "Discussion",
"is_self": true,
"is_video": false,
"domain": "self.python",
"awards_count": 2
}

Comments (when enabled):

{
"_type": "comment",
"comment_id": "xyz789",
"post_id": "abc123",
"author": "django_fan",
"body": "Django is still the best for larger projects...",
"score": 156,
"created_utc": "2024-01-15T11:45:00+00:00",
"depth": 0,
"parent_id": "t3_abc123",
"is_submitter": false,
"awards_count": 1
}

How it works

The actor appends .json to Reddit URLs to get structured data without needing the official API. It handles pagination automatically and respects Reddit's rate limits with built-in delays and retry logic.

Rate limits

Reddit rate-limits unauthenticated JSON requests. The actor handles this automatically:

  • 1 second delay between post page fetches
  • 1.5 second delay between comment fetches
  • Automatic retry with backoff on 429 responses

For large scrapes (100+ posts with comments), runs may take several minutes.

Tips

  • Start without comments to quickly scan posts, then re-run with comments for the ones you care about.
  • Use the search feature to find niche discussions within large subreddits.
  • Sort by "top" with timeframe "all" to get the most popular posts of all time.
  • Schedule hourly runs with "new" sort to catch every post in a subreddit.