Pricing

$19.99/month + usage

Reddit Posts Scraper

🧰 Reddit Posts Scraper extracts Reddit post data by subreddit, keyword, or URL—titles, authors, flairs, scores, upvotes, comments, timestamps, links & media. 📊 Export CSV/JSON. 🔎 Perfect for trend tracking, sentiment analysis, content research & social listening. 🚀

Pricing

$19.99/month + usage

Rating

0.0

(0)

Developer

ScrapeMesh

Actor stats

Bookmarked

Total users

Monthly active users

a month ago

Last modified

Reddit Posts Scraper

Reddit Posts Scraper is a fast, reliable Reddit scraper that lets you scrape Reddit posts and comments by subreddit, full thread URL, or keyword — returning clean, structured JSON for analysis. It solves the challenge of extracting at scale from Reddit by handling sort orders, time filters, blocks, and concurrency automatically. Built for marketers, developers, data analysts, and researchers, this Reddit thread scraper and Reddit comments scraper uses proxy fallback and parallel comment fetching to help you track trends, run sentiment analysis, and power automation workflows at scale. 🚀

What data / output can you get?

The actor saves results to a dataset where each item represents one Reddit post with structured fields. You can export Reddit posts to CSV or JSON.

Data type	Description	Example value
subreddit	Community name	"news"
title	Post title	"Breaking: Major update in AI policy announced"
author	Reddit username of the poster	"u/example_user"
score	Post score/upvotes	156
num_comments	Number of comments	42
created_utc	Post created time as Unix timestamp (UTC)	1703123456
permalink	Link to the Reddit thread	"https://www.reddit.com/r/news/comments/abc123/example_post/"
body	Selftext/body of the post (if any)	"Here’s what changed and why it matters…"
thumbnail_url	Thumbnail image URL	"https://preview.redd.it/..."
image_url	Main image/destination URL (if any)	"https://i.redd.it/..."
comments	Nested array of comments with replies	[{"author":"commenter1","body":"Great news!","score":23,"created_utc":1703123499,"replies":[]}]
post_id	Unique Reddit post ID	"abc123"
success	Whether the post was processed successfully	true
error_message	Error details if processing failed	null

Notes:

Comments are returned as nested arrays with fields: author, body, score, created_utc, replies.
Exports available as JSON and CSV from the dataset.

Key features

⚡ Parallel comment fetching
Fetch comments for multiple posts concurrently with controlled concurrency for speed and stability.
🛡️ Automatic proxy fallback
Robust Reddit scraping tool with “no proxy → datacenter → residential” fallback to handle blocks (403/429) and keep runs stable.
🧭 Sort & time filtering
Choose sortOrder: hot, new, top, rising and optionally apply a timeFilter (hour, day, week, month, year, all) when using top or rising.
📏 Configurable limits
Set maxPosts per source (1–1000) and maxComments per post (0–1000). Use maxComments: 0 to skip comments for faster runs.
📤 Structured JSON output
Clean post-level records including subreddit, title, author, score, timestamps, links, media, and nested comments — ideal for analytics and dashboards.
🔁 Resilient retries
Automatic retries for rate limits, timeouts, upstream 5xx errors, and connection/SSL issues to reduce false failures.
💾 Real-time dataset saving
Results are pushed to the dataset as they’re scraped, so partial data is preserved even if a run stops.
🐍 Python-based on Apify
Built with Python and aiohttp for efficient, scalable extraction — a great fit for “Reddit scraper Python” workflows.

How to use Reddit Posts Scraper - step by step

Sign in to Apify and open the Reddit Posts Scraper.
Add your sources in “Reddit URLs / Subreddits / Keywords” (one per line). You can mix:
- Full URLs like https://www.reddit.com/r/news/
- Subreddits like news or r/technology
- Search keywords like artificial intelligence
Configure sorting & timing:
- sortOrder: hot, new, top, rising
- timeFilter: hour, day, week, month, year, all (applies only to top or rising)
Set limits:
- maxPosts per source (1–1000)
- maxComments per post (0–1000; set 0 to skip comments)
Configure proxyConfiguration if needed. The actor can fall back automatically to datacenter/residential proxies if blocked.
Start the run and monitor logs for progress. The scraper collects post metadata first, then processes comments (if enabled) in parallel.
Download results from the dataset. Export Reddit posts to CSV or JSON for use in your BI tools or automations.

Pro Tip: Combine multiple subreddits and keywords in one run to quickly compare topics, track trends, or build richer datasets to scrape Reddit posts at scale.

Use cases

Use case	Description
Market & trend research	Aggregate top posts and discussions by subreddit or keyword to quantify topics over time.
Sentiment & NLP datasets	Build labeled text corpora from titles, bodies, and nested comments for modeling and analysis.
Content & editorial planning	Discover high-performing threads to guide content strategy and ideation.
Social listening	Monitor public conversations and extract signals from Reddit threads for brand/research insights.
Competitive & product analysis	Track discussions around features, pain points, and releases across relevant communities.
Academic & journalism research	Collect transparent, attributable public discourse for qualitative/quantitative studies.
Data pipelines & dashboards	Feed structured JSON into analytics stacks for automated reporting on Reddit search results and subreddits.

Why choose Reddit Posts Scraper?

Reddit Posts Scraper is engineered for precision, scale, and reliability — a production-ready Reddit data extractor without browser overhead.

✅ Accuracy with structured fields: Consistent post- and comment-level JSON, not brittle HTML.
⚡ Scale-ready: Parallel comment processing with smart concurrency for large sources.
🔐 Reliable under pressure: Automatic proxy fallback and robust retries for blocks, 5xx, timeouts, and connection issues.
🧭 Flexible targeting: Scrape by subreddit, full Reddit URL, or keyword in a single list.
📤 Analytics-friendly output: Export JSON/CSV and plug into downstream tools with minimal wrangling.
🧩 Developer-friendly: Python-based architecture fits automation and integration workflows.
🛡️ Safer than extensions: No unstable browser extensions — a server-side Reddit scraping tool built for repeatability.

In short, it’s a robust Reddit scraper that balances speed, stability, and clean output for serious use cases.

Is it legal / ethical to use Reddit Posts Scraper?

Yes — when used responsibly. This actor is designed to extract publicly available Reddit content. Do not scrape private communities without permission and avoid misuse of personal data. Always respect platform terms and applicable data protection laws (e.g., GDPR/CCPA). For edge cases, consult your legal team and ensure your usage complies with your policies.

Input parameters & output format

Example input (JSON):

{
  "startUrls": [
    "https://www.reddit.com/r/news/",
    "news",
    "artificial intelligence"
  ],
  "sortOrder": "top",
  "timeFilter": "week",
  "maxPosts": 50,
  "maxComments": 100,
  "proxyConfiguration": { "useApifyProxy": false }
}

Parameter reference:

startUrls (array of strings, required)
Description: One item per line — mix full URLs (e.g., https://www.reddit.com/r/news/), subreddit names (e.g., news or r/news), or search keywords (e.g., artificial intelligence). Duplicate subreddits are merged.
Default: none
maxPosts (integer)
Description: Max number of posts to scrape per subreddit or keyword (1–1000).
Default: 50
maxComments (integer)
Description: Max comments to fetch for each post (0–1000). Set to 0 to skip comments and only get post metadata.
Default: 100
sortOrder (string enum: hot, new, top, rising)
Description: How Reddit should sort the posts.
Default: top
timeFilter (string enum: hour, day, week, month, year, all)
Description: Time range for results. Only applies when sortOrder is top or rising; ignored for hot and new.
Default: week
proxyConfiguration (object)
Description: Choose which proxies to use. If Reddit blocks a request, the actor can fall back automatically: no proxy → datacenter → residential.
Default: { "useApifyProxy": false }

Example output item (JSON):

{
  "subreddit": "news",
  "title": "Example post title",
  "author": "username",
  "score": 156,
  "num_comments": 42,
  "created_utc": 1703123456,
  "permalink": "https://www.reddit.com/r/news/comments/abc123/example_post/",
  "body": "Post content...",
  "thumbnail_url": "https://...",
  "image_url": "https://...",
  "comments": [
    {
      "author": "commenter1",
      "body": "Comment text...",
      "score": 23,
      "created_utc": 1703123499,
      "replies": []
    }
  ],
  "post_id": "abc123",
  "success": true,
  "error_message": null
}

Notes:

Some fields can be empty or “Unknown” if Reddit does not provide the value (e.g., author on deleted posts, body for link posts).
When maxComments is 0, comments is an empty array and the actor skips the comment-fetching phase.

FAQ

Does it scrape comments as well as posts?

Yes. Set maxComments > 0 to fetch nested comments for each post. Set maxComments to 0 if you only need post metadata and want faster runs.

Can I mix subreddits, keywords, and full URLs in one run?

Yes. Add any combination of subreddit names (e.g., news or r/news), full Reddit URLs, and search keywords in startUrls — one per line.

What sort options and time ranges are supported?

sortOrder supports hot, new, top, and rising. timeFilter supports hour, day, week, month, year, and all, and it only applies when sortOrder is top or rising.

How many posts can I scrape per source?

You can scrape between 1 and 1000 posts per subreddit or keyword using maxPosts. Combine multiple sources in startUrls to scale further.

What output formats are supported?

You can export the dataset to JSON or CSV. Each item includes subreddit, title, author, score, timestamps, links, media URLs, and nested comments.

How does the scraper handle blocks and rate limits?

It includes robust retry logic and automatic proxy fallback (no proxy → datacenter → residential) for 403/429 blocks, upstream 5xx, timeouts, and connection/SSL issues.

Is this built with Python? Can I integrate it into my data pipeline?

Yes. The actor is Python-based and produces structured JSON that fits well into analytics pipelines and automation workflows.

Closing CTA / Final thoughts

Reddit Posts Scraper is built to extract structured Reddit post and comment data reliably at scale. With flexible inputs (subreddits, URLs, keywords), configurable limits, sort/time filters, and automatic proxy fallback, it delivers clean JSON that’s ready for research, analytics, and automation. Ideal for marketers, developers, data analysts, and researchers, you can export Reddit posts to CSV/JSON and plug them straight into your dashboards or workflows. Start extracting smarter Reddit insights today with a production-ready Reddit scraper.

Reddit Scraper

scraperx/reddit-scraper

🔎 Reddit Scraper (reddit-scraper) extracts posts, comments, authors, flair, upvotes & timestamps from subreddits and threads—fast, real-time & reliable. 📊 Perfect for social listening, market research, trend analysis & sentiment. ⚡ Clean JSON/CSV output. 🚀 API-ready.

ScraperX

Reddit Posts Scraper

scrapium/reddit-posts-scraper

Scrape Reddit posts with ease 🧵👽 Extract titles, post text, subreddits, usernames, upvotes, comments, timestamps, and links from Reddit threads. Perfect for trend tracking, sentiment analysis, audience research, and content discovery. Turn Reddit data into actionable insights fast 🚀

Scrapium

Reddit Api Scraper

api-empire/reddit-api-scraper

Extract Reddit data efficiently using the Reddit API Scraper. Collect posts, comments, authors, upvotes, subreddit names, and timestamps through the Reddit API. Ideal for market research, sentiment analysis, community monitoring, and trend discovery.

API Empire

Reddit Data Scraper – Scrape Posts, Comments, Upvotes & More

sovanza.inc/reddit-data-scraper---scrape-posts-comments-upvotes-more

Extract Reddit posts, comments, upvotes, and subreddit data with this powerful Reddit scraper. Ideal for data analysis, lead generation, trend research, and AI datasets. Scrape Reddit data at scale without API limits and export results in JSON, CSV, or Excel format.

Sovanza