Reddit Post Harvester avatar

Reddit Post Harvester

Pricing

from $1.00 / 1,000 results

Go to Apify Store
Reddit Post Harvester

Reddit Post Harvester

Scrape posts from any subreddit without authentication. Fetches titles, scores, authors, flairs, thumbnails and URLs via RSS + JSON API. Supports hot/new/top/rising sorting, time filters, and proxy rotation to bypass Reddit blocks.

Pricing

from $1.00 / 1,000 results

Rating

0.0

(0)

Developer

Saregaa

Saregaa

Maintained by Community

Actor stats

0

Bookmarked

4

Total users

2

Monthly active users

5 days ago

Last modified

Share

Reddit RSS Scraper | Extract Subreddit Posts, Scores & Metadata Without an API Key

Extract posts from any public subreddit without an API key, OAuth tokens, or a browser. Built for marketers, researchers, data engineers, and developers who need fresh Reddit post data on a schedule — without managing API credentials.

The actor pulls post titles, URLs, scores, authors, flairs, comment counts, and timestamps from any combination of subreddits using Reddit's public RSS and JSON feeds, with Chrome TLS fingerprint impersonation to avoid blocks.

✅ No Reddit API key or OAuth required

✅ Scrape multiple subreddits in a single run

✅ 4 sort modes: hot, new, top, rising

✅ Pagination beyond the 25-post RSS limit via JSON cursor

✅ Export results as JSON, CSV, or Excel

✅ Full API and scheduling support via Apify


What Data Can Be Extracted?

FieldTypeDescription
idstringReddit post ID (e.g.1dxyz42)
titlestringFull post title
permalinkstringFull Reddit URL to the post
urlstringExternal URL for link posts; Reddit URL for self posts
subredditstringSubreddit name
authorstringAuthor username
scoreintegerNet upvotes at scrape time
upvote_ratiofloatUpvote ratio 0.0–1.0 (JSON pages only)
num_commentsintegerTotal comment count (JSON pages only)
flairstringPost flair label, or null
post_typestringself(text post) or link
thumbnailstringThumbnail URL, or null
created_atstringPost creation time (ISO 8601 UTC)
scraped_atstringScrape time (ISO 8601 UTC)

Note: upvote_ratio and num_comments are only available from the JSON API (page 2+). The first 25 posts fetched via RSS will have null for these fields. Set maxPostsPerSubreddit > 25 to backfill them for all posts.


Features

  • No credentials needed — accesses only public, anonymous Reddit feeds
  • Multi-subreddit — scrape dozens of subreddits in one run
  • 4 sort modeshot, new, top, rising
  • Pagination — goes beyond the RSS 25-post limit via JSON cursor (up to ~1,000 posts per subreddit)
  • Time filter — restrict top posts to hour, day, week, month, year, or all
  • TLS fingerprint spoofingcurl_cffi impersonates Chrome 120, bypassing Reddit's fingerprint-based blocks
  • Residential proxy support — plug in Apify Proxy for high-volume runs
  • Export to JSON, CSV, or Excel — download directly from the Apify Output tab
  • Schedule runs — automate hourly, daily, or weekly collection
  • API access — integrate with Zapier, Make, n8n, or your own pipeline

How to Scrape Reddit Data — Step by Step

  1. Open the actor in Apify Console
  2. Enter one or more subreddit names (e.g. MachineLearning, LocalLLaMA). The r/ prefix is optional.
  3. Choose a sort order (hot, new, top, rising), set the max posts per subreddit, and optionally set a time filter for top
  4. Click Start
  5. Download results as JSON, CSV, or Excel from the Output tab, or access them via the Apify API

Input Example

{
"subreddits": ["technology", "MachineLearning", "LocalLLaMA"],
"sort": "top",
"maxPostsPerSubreddit": 50,
"timeFilter": "week",
"proxyConfiguration": {
"useApifyProxy": true,
"apifyProxyGroups": ["RESIDENTIAL"]
}
}

Input Parameters

ParameterTypeDefaultDescription
subredditsstring[]["technology"]Subreddit names — with or without r/prefix
sortstringhotSort order:hot,new,top,rising
maxPostsPerSubredditinteger25Posts to collect per subreddit (1–100)
timeFilterstringdayTime range for topsort:hour,day,week,month,year,all
proxyConfigurationobjectApify Proxy config. Residential recommended for high-volume runs

Output Example

Each record in the dataset represents one Reddit post:

{
"id": "1dxyz42",
"title": "New open-source model beats GPT-4 on coding benchmarks",
"permalink": "https://www.reddit.com/r/MachineLearning/comments/1dxyz42/new_open_source_model/",
"url": "https://arxiv.org/abs/2406.12345",
"subreddit": "MachineLearning",
"author": "ml_researcher",
"score": 4821,
"upvote_ratio": 0.96,
"num_comments": 312,
"flair": "Research",
"post_type": "link",
"thumbnail": "https://b.thumbs.redditmedia.com/abc123.jpg",
"created_at": "2026-06-09T10:34:21+00:00",
"scraped_at": "2026-06-09T11:00:03+00:00"
}

Use Cases

Trend Monitoring

Track what's gaining traction in your niche in real time. Schedule the actor to run hourly on subreddits like entrepreneur, startups, or SaaS to power a trend dashboard or Slack alert.

{
"subreddits": ["entrepreneur", "startups", "SaaS"],
"sort": "hot",
"maxPostsPerSubreddit": 25
}

Weekly Top Posts Digest

Pull the best content from multiple communities for a newsletter or internal report.

{
"subreddits": ["MachineLearning", "LocalLLaMA", "datascience"],
"sort": "top",
"timeFilter": "week",
"maxPostsPerSubreddit": 100
}

NLP Training Data Collection

Collect high-quality community text at scale. Filter by score >= 500 post-processing to keep only community-validated content.

{
"subreddits": ["AskReddit", "explainlikeimfive", "changemyview"],
"sort": "top",
"timeFilter": "year",
"maxPostsPerSubreddit": 100
}

Competitor & Community Research

Monitor conversations in competitor product subreddits. Track sentiment, common complaints, and feature requests over time without manual browsing.

Content Ideation

Identify top-performing post titles and topics in your niche. Use the data to inform blog posts, video ideas, or social media content calendars.

Academic & Social Research

Gather timestamped post data for studying online community behavior, topic evolution, or information spread over time.


API Access & Automation

All results are accessible via the Apify API. Trigger runs, poll for results, and stream dataset items into your own pipeline.

curl -X POST \
"https://api.apify.com/v2/acts/YOUR_USERNAME~reddit-rss-scraper/runs?token=<YOUR_TOKEN>" \
-H "Content-Type: application/json" \
-d '{
"subreddits": ["python"],
"sort": "hot",
"maxPostsPerSubreddit": 25
}'

Or use the Python SDK:

from apify_client import ApifyClient
client = ApifyClient("<YOUR_APIFY_TOKEN>")
run = client.actor("YOUR_USERNAME/reddit-rss-scraper").call(run_input={
"subreddits": ["MachineLearning", "LocalLLaMA"],
"sort": "top",
"maxPostsPerSubreddit": 50,
"timeFilter": "week"
})
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
print(item["title"], "|", item["score"])

Results integrate natively with Zapier , Make , and n8n for no-code automation.


Pricing

This actor runs on Apify's pay-per-use infrastructure. Costs depend on compute time and the number of requests made.

VolumeSubredditsPosts EachEstimated Cost
Small run325< $0.05
Medium run1050~$0.10–$0.20
Large run20100~$0.30–$0.60
With residential proxyAnyAnyAdd proxy usage costs

For most runs under 200 posts across a few subreddits, no proxy is needed — the TLS fingerprint spoofing handles standard loads without additional cost.


Why Use This Instead of the Reddit Official API?

FeatureReddit RSS ScraperReddit Official API
API key required❌ No✅ Yes — OAuth app registration
Setup time~2 minutes15–30 minutes
Rate limitsGenerous (curl_cffi)60 req/min (free tier)
PaginationUp to ~1,000 postsUp to ~1,000 posts
Export formatsJSON, CSV, ExcelJSON only
SchedulingBuilt-in via ApifyManual implementation
Proxy supportApify Residential built-inNot applicable

FAQ

Is scraping Reddit legal?

This actor accesses only publicly available Reddit content visible to any anonymous visitor. It does not bypass authentication, CAPTCHAs, or access private data. Web scraping of public data is generally permitted, as affirmed by the hiQ Labs v. LinkedIn ruling. This tool is not affiliated with, endorsed by, or sponsored by Reddit Inc.

Does this actor require proxies?

For most runs (200 posts or fewer across a few subreddits), the actor works without any proxy thanks to curl_cffi Chrome TLS impersonation. For high-volume or scheduled runs, Apify Residential proxies are recommended to avoid 403 blocks.

Can I schedule runs?

Yes. Apify has built-in scheduling. You can set the actor to run hourly, daily, weekly, or on any cron schedule from the Apify Console.

Can I export results to CSV or Excel?

Yes. Once a run completes, download results as JSON, CSV, or Excel directly from the Output tab in Apify Console.

How many posts can I scrape per subreddit?

Up to approximately 1,000 posts. Reddit's unauthenticated JSON API stops paginating after around 1,000 items. The first 25 posts are fetched via RSS; subsequent pages use the JSON API with cursor pagination.

Does it work on all subreddits?

It works on any public subreddit. Private or restricted subreddits that require a logged-in account are not accessible.

What sort modes are supported?

hot, new, top, and rising. For top, you can also set a time filter: hour, day, week, month, year, or all.

Do upvote_ratio and num_comments get populated for all posts?

These fields are only available from the JSON API, not from RSS. The first 25 posts will have null for these fields unless maxPostsPerSubreddit is set above 25, which triggers JSON pagination and backfills them.

What happens if Reddit changes its feed format?

Open an issue on the Issues tab. The actor is maintained and will be updated to reflect structural changes.

Can I use this through the API without Apify Console?

Yes. The actor exposes a full REST API. You can trigger runs, poll status, and fetch dataset items programmatically using the Apify REST API or Python SDK.


How to Scrape Reddit Data Without an API Key

Reddit's official API requires OAuth registration, credential management, and enforces strict rate limits. This actor uses Reddit's public RSS and JSON feeds instead — available to any anonymous visitor. By combining curl_cffi Chrome TLS impersonation with cursor-based JSON pagination, it reliably collects up to 1,000 posts per subreddit without any API key setup.

Reddit API Alternative for Bulk Data Collection

If you need Reddit post data for research, monitoring, or data pipelines, the official Reddit API is often overkill. This actor provides a simpler alternative: paste in subreddit names, click Start, and get structured data ready to download or query via API.

How to Export Reddit Data to CSV

After a run completes in Apify, open the Output tab and click Download as CSV or Excel . No additional tooling required. For API-driven workflows, you can stream results as JSONL or paginate through the dataset endpoint.

Automate Reddit Data Collection

Use Apify's built-in scheduler to run this actor on any cron schedule — hourly trend monitoring, daily digests, or weekly research pulls. Results can be forwarded automatically to Google Sheets, Slack, Airtable, or any webhook via Apify's integrations with Zapier, Make, and n8n.


Support

Found a bug or need a feature? Open an issue on the Issues tab in Apify Console. Feedback and pull requests are welcome.