Reddit Posts Scraper avatar

Reddit Posts Scraper

Pricing

$19.99/month + usage

Go to Apify Store
Reddit Posts Scraper

Reddit Posts Scraper

Scrape Reddit posts with ease πŸ§΅πŸ‘½ Extract titles, post text, subreddits, usernames, upvotes, comments, timestamps, and links from Reddit threads. Perfect for trend tracking, sentiment analysis, audience research, and content discovery. Turn Reddit data into actionable insights fast πŸš€

Pricing

$19.99/month + usage

Rating

0.0

(0)

Developer

Scrapium

Scrapium

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

14 days ago

Last modified

Share

Reddit Posts Scraper

Reddit Posts Scraper is a production-ready Apify actor that lets you scrape Reddit posts and comment threads by subreddit, full URL, or search keyword β€” fast. It solves the hassle of manual copy-paste and unreliable tools by returning clean, structured JSON for analysis. Marketers, developers, data analysts, and researchers use this Reddit scraper tool to scrape Reddit posts at scale for trend tracking, sentiment analysis, and content discovery. With proxy fallback, batching, and structured exports, it enables large-scale Reddit data pipelines and automation.

What data / output can you get?

Below are the exact fields the actor saves to the dataset (one row per post). You can export results to JSON or CSV, or fetch them via the Apify API.

Data fieldDescriptionExample value
subredditCommunity name the post belongs to"news"
titlePost title"Example post title"
authorReddit username of the poster"username"
scoreUpvotes/score of the post156
num_commentsTotal number of comments42
created_utcUnix timestamp (UTC)1703123456.789
permalinkDirect link to the Reddit thread"https://www.reddit.com/r/news/comments/abc123/..."
bodySelftext/body of the post"Post content..."
thumbnail_urlThumbnail image URL"https://..."
image_urlMain image/media URL (if any)"https://..."
commentsNested array of comments with replies[{"author":"commenter1","body":"Comment text...","score":23,"created_utc":1703123456.789,"replies":[]}]
post_idReddit post ID"abc123"
successWhether the post was processed successfullytrue
error_messageError detail if processing failednull

Note: The actor returns structured data for both posts and comments (including nested replies). Fields like author or title may occasionally be "Unknown" or "No Title" if Reddit does not provide them for the post.

Key features

  • ⚑ Parallel comment processing
    Fetches comments in parallel for high-throughput scraping β€” ideal for a Reddit thread scraper capturing full discussions.

  • 🧩 Flexible targeting
    Input can be subreddits (e.g., news or r/technology), full Reddit URLs, or search keywords β€” perfect to scrape subreddit posts or Reddit search results.

  • πŸ”€ Sort and time filter
    Supports sortOrder (hot, new, top, rising) and timeFilter (hour, day, week, month, year, all) for precise Reddit post extractor workflows.

  • πŸ“ Scalable limits
    Control maxPosts per source and maxComments per post to tune depth β€” great for a Reddit bulk post downloader strategy.

  • πŸ›‘οΈ Proxy fallback and retries
    Automatic fallback from no proxy β†’ datacenter β†’ residential with robust retries for blocks (403/429), 5xx, timeouts, and connection/SSL issues β€” reliable Reddit crawler behavior.

  • πŸ’Ύ Live dataset saving
    Pushes each item as it’s processed to avoid data loss β€” supports incremental pipelines and monitoring.

  • πŸ”Œ Developer-friendly outputs
    Structured JSON ready for analytics, dashboards, and integrations (use the Apify API from your Python Reddit scraper or apps like Make, n8n, Zapier).

How to use Reddit Posts Scraper - step by step

  1. Sign in to your Apify account at console.apify.com.
  2. Open the actor β€” search for β€œReddit Posts Scraper” in the Store.
  3. Enter your sources in startUrls:
  4. Configure sorting and time range:
    • sortOrder: hot, new, top, rising
    • timeFilter: hour, day, week, month, year, all (applies to top/rising)
  5. Set limits and comments depth:
    • maxPosts: number of posts per source (1–1000)
    • maxComments: number of comments per post (0–1000; 0 skips comments)
  6. Set proxyConfiguration as needed (e.g., enable Apify Proxy). The actor automatically falls back to residential if blocked.
  7. Click Start to run. Watch logs for progress β€” the actor crawls sources, then processes comments in parallel.
  8. Open the Output tab to view the β€œReddit Posts Data” dataset. Export to JSON or CSV, or connect via the Apify API.

Pro Tip: Trigger runs programmatically with the Apify API and pipe results into your analytics stack or automation workflows β€” a robust alternative to a Reddit posts scraping script or PRAW scrape posts setup.

Use cases

Use caseDescription
Market & trend researchAggregate top posts by keyword or subreddit to quantify discussion volume and surface emerging topics.
NLP / ML datasetsCollect titles, bodies, and nested comment threads to build labeled corpora for sentiment analysis and topic modeling.
Content & SEOIdentify what resonates in your niche, extract quotes, and plan content around high-engagement threads.
Brand monitoringTrack mentions across communities, measure sentiment shifts, and flag high-velocity threads in real time.
Journalism & researchCompile public Reddit discussions and quotes with timestamps and permalinks for verifiable sourcing.
Automation & pipelinesSchedule runs via the Apify API, export JSON/CSV, and sync to BI tools or data lakes as a Reddit API scraper alternative.

Why choose Reddit Posts Scraper?

Built for reliability and scale, this Reddit data scraper balances speed with resilience β€” without requiring a browser.

  • βœ… Structured and consistent outputs that are analytics-ready (JSON/CSV).
  • βš™οΈ High-throughput comment fetching with parallel processing for Reddit thread scraper use cases.
  • πŸ” Automatic proxy fallback and robust retries for blocks, 5xx, and timeouts β€” production-ready reliability.
  • πŸ”Œ Developer access via the Apify API for integration with Python, ETL tools, and workflow automation.
  • πŸ’‘ No browser overhead; efficient HTTP-based collection of public endpoints.
  • πŸ’° Cost-effective and scalable β€” suitable for small experiments and larger pipelines alike.
  • πŸ”„ Better than flaky extensions or manual scripts: stable infrastructure, monitoring, and dataset storage.

In short, it’s a dependable Reddit scraper tool for teams that need consistent, structured Reddit post extraction at scale.

Yes β€” when done responsibly. This actor is designed for public Reddit content only and does not access private subreddits or authenticated data.

Guidelines for compliant use:

  • Scrape only publicly available Reddit content and respect community norms.
  • Review Reddit’s terms and apply reasonable rate limits using proxyConfiguration as needed.
  • Avoid misuse of personal data in line with applicable regulations (e.g., GDPR, CCPA).
  • Do not attempt to bypass authentication to access private resources.
  • Consult your legal team for edge cases or jurisdiction-specific requirements.

Input parameters & output format

Example JSON input

{
"startUrls": [
"https://www.reddit.com/r/news/",
"news",
"artificial intelligence"
],
"sortOrder": "top",
"timeFilter": "week",
"maxPosts": 50,
"maxComments": 100,
"proxyConfiguration": { "useApifyProxy": false }
}

Parameter details:

  • startUrls (array, required)
    Description: Enter one item per line. Mix full URLs (e.g., https://www.reddit.com/r/news/), subreddit names (e.g., news or r/news), or search keywords (e.g., artificial intelligence). Duplicate subreddits are merged.
    Default: none (required)

  • maxPosts (integer)
    Description: Max number of posts to scrape per subreddit or keyword (1–1000).
    Default: 50

  • maxComments (integer)
    Description: Max comments to fetch for each post (0–1000). Set to 0 to skip comments and only get post metadata.
    Default: 100

  • sortOrder (string)
    Description: How Reddit should sort posts β€” hot (trending), new (latest), top (most upvoted), rising (gaining traction).
    Allowed values: "hot", "new", "top", "rising"
    Default: "top"

  • timeFilter (string)
    Description: Time range for results. Only applies when sortOrder is top or rising; ignored for hot and new.
    Allowed values: "hour", "day", "week", "month", "year", "all"
    Default: "week"

  • proxyConfiguration (object)
    Description: Choose which proxies to use. If Reddit blocks a request, the actor automatically falls back: no proxy β†’ datacenter β†’ residential. Recommended for large runs or when you hit blocks.
    Default: { "useApifyProxy": false }

Example JSON output item

{
"subreddit": "news",
"title": "Example post title",
"author": "username",
"score": 156,
"num_comments": 42,
"created_utc": 1703123456.789,
"permalink": "https://www.reddit.com/r/news/comments/abc123/...",
"body": "Post content...",
"thumbnail_url": "https://...",
"image_url": "https://...",
"comments": [
{
"author": "commenter1",
"body": "Comment text...",
"score": 23,
"created_utc": 1703123456.789,
"replies": []
}
],
"post_id": "abc123",
"success": true,
"error_message": null
}

Notes:

  • The comments field contains nested replies (recursive structure).
  • Some fields may be "Unknown" or null if not provided by Reddit for a given post.

FAQ

Is there a free tier to try it?

Yes. You can run small jobs on Apify’s free plan to evaluate the actor before scaling up. Larger workloads may require enabling proxies for reliability.

Does it include comments and replies?

Yes. Set maxComments > 0 to fetch nested comment threads. If you set maxComments to 0, the actor returns only post metadata without comments.

Can I target multiple subreddits or keywords in one run?

Yes. Add as many as you need to startUrls β€” you can mix subreddit names, full Reddit URLs, and search keywords in the same list.

How does it handle blocks or rate limits?

The actor automatically falls back from no proxy β†’ datacenter β†’ residential and retries on common errors (403/429, 5xx, timeouts, connection/SSL issues).

Which formats can I export to?

You can export the dataset to JSON or CSV from Apify, or access results programmatically via the Apify API.

Can developers integrate this with Python or other workflows?

Yes. Fetch the dataset via the Apify API and plug it into your Python Reddit scraper, data pipelines, or automation tools like Make, n8n, and Zapier.

What types of sources can I input?

You can input subreddits (e.g., news or r/technology), full Reddit URLs, or search keywords to run a Reddit search results scraper workflow.

Does it require a browser or login?

No. This actor collects public Reddit data efficiently without a browser or login, focusing on structured, reliable output.

Closing CTA / Final thoughts

Reddit Posts Scraper is built for structured extraction of public Reddit posts and comment threads at scale. With flexible inputs, sorting and time filters, scalable limits, proxy fallback, and robust retries, it delivers reliable datasets for analysis and automation. It’s ideal for marketers, developers, data analysts, and researchers who need a dependable Reddit post extractor with JSON/CSV exports and API access. Use the Apify API to wire it into your pipelines or trigger runs from your Python workflows β€” start extracting smarter Reddit insights today.