Reddit Posts Scraper avatar

Reddit Posts Scraper

Pricing

$19.99/month + usage

Go to Apify Store
Reddit Posts Scraper

Reddit Posts Scraper

🧰 Reddit Posts Scraper extracts Reddit post data by subreddit, keyword, or URL—titles, authors, flairs, scores, upvotes, comments, timestamps, links & media. 📊 Export CSV/JSON. 🔎 Perfect for trend tracking, sentiment analysis, content research & social listening. 🚀

Pricing

$19.99/month + usage

Rating

0.0

(0)

Developer

ScrapeMesh

ScrapeMesh

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

16 days ago

Last modified

Share

Reddit Posts Scraper

Reddit Posts Scraper is a fast, reliable Reddit scraper that lets you scrape Reddit posts and comments by subreddit, full thread URL, or keyword — returning clean, structured JSON for analysis. It solves the challenge of extracting at scale from Reddit by handling sort orders, time filters, blocks, and concurrency automatically. Built for marketers, developers, data analysts, and researchers, this Reddit thread scraper and Reddit comments scraper uses proxy fallback and parallel comment fetching to help you track trends, run sentiment analysis, and power automation workflows at scale. 🚀

What data / output can you get?

The actor saves results to a dataset where each item represents one Reddit post with structured fields. You can export Reddit posts to CSV or JSON.

Data typeDescriptionExample value
subredditCommunity name"news"
titlePost title"Breaking: Major update in AI policy announced"
authorReddit username of the poster"u/example_user"
scorePost score/upvotes156
num_commentsNumber of comments42
created_utcPost created time as Unix timestamp (UTC)1703123456
permalinkLink to the Reddit thread"https://www.reddit.com/r/news/comments/abc123/example_post/"
bodySelftext/body of the post (if any)"Here’s what changed and why it matters…"
thumbnail_urlThumbnail image URL"https://preview.redd.it/..."
image_urlMain image/destination URL (if any)"https://i.redd.it/..."
commentsNested array of comments with replies[{"author":"commenter1","body":"Great news!","score":23,"created_utc":1703123499,"replies":[]}]
post_idUnique Reddit post ID"abc123"
successWhether the post was processed successfullytrue
error_messageError details if processing failednull

Notes:

  • Comments are returned as nested arrays with fields: author, body, score, created_utc, replies.
  • Exports available as JSON and CSV from the dataset.

Key features

  • ⚡ Parallel comment fetching
    Fetch comments for multiple posts concurrently with controlled concurrency for speed and stability.

  • 🛡️ Automatic proxy fallback
    Robust Reddit scraping tool with “no proxy → datacenter → residential” fallback to handle blocks (403/429) and keep runs stable.

  • 🧭 Sort & time filtering
    Choose sortOrder: hot, new, top, rising and optionally apply a timeFilter (hour, day, week, month, year, all) when using top or rising.

  • 📏 Configurable limits
    Set maxPosts per source (1–1000) and maxComments per post (0–1000). Use maxComments: 0 to skip comments for faster runs.

  • 📤 Structured JSON output
    Clean post-level records including subreddit, title, author, score, timestamps, links, media, and nested comments — ideal for analytics and dashboards.

  • 🔁 Resilient retries
    Automatic retries for rate limits, timeouts, upstream 5xx errors, and connection/SSL issues to reduce false failures.

  • 💾 Real-time dataset saving
    Results are pushed to the dataset as they’re scraped, so partial data is preserved even if a run stops.

  • 🐍 Python-based on Apify
    Built with Python and aiohttp for efficient, scalable extraction — a great fit for “Reddit scraper Python” workflows.

How to use Reddit Posts Scraper - step by step

  1. Sign in to Apify and open the Reddit Posts Scraper.
  2. Add your sources in “Reddit URLs / Subreddits / Keywords” (one per line). You can mix:
  3. Configure sorting & timing:
    • sortOrder: hot, new, top, rising
    • timeFilter: hour, day, week, month, year, all (applies only to top or rising)
  4. Set limits:
    • maxPosts per source (1–1000)
    • maxComments per post (0–1000; set 0 to skip comments)
  5. Configure proxyConfiguration if needed. The actor can fall back automatically to datacenter/residential proxies if blocked.
  6. Start the run and monitor logs for progress. The scraper collects post metadata first, then processes comments (if enabled) in parallel.
  7. Download results from the dataset. Export Reddit posts to CSV or JSON for use in your BI tools or automations.

Pro Tip: Combine multiple subreddits and keywords in one run to quickly compare topics, track trends, or build richer datasets to scrape Reddit posts at scale.

Use cases

Use caseDescription
Market & trend researchAggregate top posts and discussions by subreddit or keyword to quantify topics over time.
Sentiment & NLP datasetsBuild labeled text corpora from titles, bodies, and nested comments for modeling and analysis.
Content & editorial planningDiscover high-performing threads to guide content strategy and ideation.
Social listeningMonitor public conversations and extract signals from Reddit threads for brand/research insights.
Competitive & product analysisTrack discussions around features, pain points, and releases across relevant communities.
Academic & journalism researchCollect transparent, attributable public discourse for qualitative/quantitative studies.
Data pipelines & dashboardsFeed structured JSON into analytics stacks for automated reporting on Reddit search results and subreddits.

Why choose Reddit Posts Scraper?

Reddit Posts Scraper is engineered for precision, scale, and reliability — a production-ready Reddit data extractor without browser overhead.

  • ✅ Accuracy with structured fields: Consistent post- and comment-level JSON, not brittle HTML.
  • ⚡ Scale-ready: Parallel comment processing with smart concurrency for large sources.
  • 🔐 Reliable under pressure: Automatic proxy fallback and robust retries for blocks, 5xx, timeouts, and connection issues.
  • 🧭 Flexible targeting: Scrape by subreddit, full Reddit URL, or keyword in a single list.
  • 📤 Analytics-friendly output: Export JSON/CSV and plug into downstream tools with minimal wrangling.
  • 🧩 Developer-friendly: Python-based architecture fits automation and integration workflows.
  • 🛡️ Safer than extensions: No unstable browser extensions — a server-side Reddit scraping tool built for repeatability.

In short, it’s a robust Reddit scraper that balances speed, stability, and clean output for serious use cases.

Yes — when used responsibly. This actor is designed to extract publicly available Reddit content. Do not scrape private communities without permission and avoid misuse of personal data. Always respect platform terms and applicable data protection laws (e.g., GDPR/CCPA). For edge cases, consult your legal team and ensure your usage complies with your policies.

Input parameters & output format

Example input (JSON):

{
"startUrls": [
"https://www.reddit.com/r/news/",
"news",
"artificial intelligence"
],
"sortOrder": "top",
"timeFilter": "week",
"maxPosts": 50,
"maxComments": 100,
"proxyConfiguration": { "useApifyProxy": false }
}

Parameter reference:

  • startUrls (array of strings, required)
    Description: One item per line — mix full URLs (e.g., https://www.reddit.com/r/news/), subreddit names (e.g., news or r/news), or search keywords (e.g., artificial intelligence). Duplicate subreddits are merged.
    Default: none

  • maxPosts (integer)
    Description: Max number of posts to scrape per subreddit or keyword (1–1000).
    Default: 50

  • maxComments (integer)
    Description: Max comments to fetch for each post (0–1000). Set to 0 to skip comments and only get post metadata.
    Default: 100

  • sortOrder (string enum: hot, new, top, rising)
    Description: How Reddit should sort the posts.
    Default: top

  • timeFilter (string enum: hour, day, week, month, year, all)
    Description: Time range for results. Only applies when sortOrder is top or rising; ignored for hot and new.
    Default: week

  • proxyConfiguration (object)
    Description: Choose which proxies to use. If Reddit blocks a request, the actor can fall back automatically: no proxy → datacenter → residential.
    Default: { "useApifyProxy": false }

Example output item (JSON):

{
"subreddit": "news",
"title": "Example post title",
"author": "username",
"score": 156,
"num_comments": 42,
"created_utc": 1703123456,
"permalink": "https://www.reddit.com/r/news/comments/abc123/example_post/",
"body": "Post content...",
"thumbnail_url": "https://...",
"image_url": "https://...",
"comments": [
{
"author": "commenter1",
"body": "Comment text...",
"score": 23,
"created_utc": 1703123499,
"replies": []
}
],
"post_id": "abc123",
"success": true,
"error_message": null
}

Notes:

  • Some fields can be empty or “Unknown” if Reddit does not provide the value (e.g., author on deleted posts, body for link posts).
  • When maxComments is 0, comments is an empty array and the actor skips the comment-fetching phase.

FAQ

Does it scrape comments as well as posts?

Yes. Set maxComments > 0 to fetch nested comments for each post. Set maxComments to 0 if you only need post metadata and want faster runs.

Can I mix subreddits, keywords, and full URLs in one run?

Yes. Add any combination of subreddit names (e.g., news or r/news), full Reddit URLs, and search keywords in startUrls — one per line.

What sort options and time ranges are supported?

sortOrder supports hot, new, top, and rising. timeFilter supports hour, day, week, month, year, and all, and it only applies when sortOrder is top or rising.

How many posts can I scrape per source?

You can scrape between 1 and 1000 posts per subreddit or keyword using maxPosts. Combine multiple sources in startUrls to scale further.

What output formats are supported?

You can export the dataset to JSON or CSV. Each item includes subreddit, title, author, score, timestamps, links, media URLs, and nested comments.

How does the scraper handle blocks and rate limits?

It includes robust retry logic and automatic proxy fallback (no proxy → datacenter → residential) for 403/429 blocks, upstream 5xx, timeouts, and connection/SSL issues.

Is this built with Python? Can I integrate it into my data pipeline?

Yes. The actor is Python-based and produces structured JSON that fits well into analytics pipelines and automation workflows.

Closing CTA / Final thoughts

Reddit Posts Scraper is built to extract structured Reddit post and comment data reliably at scale. With flexible inputs (subreddits, URLs, keywords), configurable limits, sort/time filters, and automatic proxy fallback, it delivers clean JSON that’s ready for research, analytics, and automation. Ideal for marketers, developers, data analysts, and researchers, you can export Reddit posts to CSV/JSON and plug them straight into your dashboards or workflows. Start extracting smarter Reddit insights today with a production-ready Reddit scraper.