Reddit Data Scraper – Scrape Posts, Comments, Upvotes & More
Pricing
$5.00/month + usage
Reddit Data Scraper – Scrape Posts, Comments, Upvotes & More
Extract Reddit posts, comments, upvotes, and subreddit data with this powerful Reddit scraper. Ideal for data analysis, lead generation, trend research, and AI datasets. Scrape Reddit data at scale without API limits and export results in JSON, CSV, or Excel format.
Pricing
$5.00/month + usage
Rating
0.0
(0)
Developer
Sovanza
Actor stats
0
Bookmarked
2
Total users
1
Monthly active users
a day ago
Last modified
Categories
Share
Reddit Post Scraper – Extract Posts, Comments, Upvotes & Data
Extract Reddit posts, scores (upvote / net score), comment counts, and subreddit metadata with this scraper—ideal for trend analysis, content research, and NLP-ready datasets. Export in JSON, CSV, or Excel from your Apify dataset.
Scope: This actor is post-level only. It does not download comment text, nested replies, or awards. You get num_comments and rich post fields (title, selftext, link, score, etc.). For full comment trees you would need a separate flow or actor extension.
What is Reddit Post Scraper?
Reddit Post Scraper is a Reddit data extraction tool built on Apify that reads Reddit’s public JSON (append .json to listing or post URLs). No browser and no official Reddit API key. It is designed for:
- Marketers
- Researchers
- Data analysts
- Content creators
- AI developers
It turns subreddit listings and post URLs into a structured dataset for trends, engagement signals, and text analysis.
Why This Reddit Scraper is Powerful
- Fast and lightweight: HTTP + JSON only (no Playwright).
- Flexible input: Subreddit listing URLs, direct post URLs,
productUrls(string list, same pattern as Amazon/eBay actors),startUrls, legacyurl, orsubreddit+sort. - Engagement fields:
score,upvote_ratio,num_comments, plus title and body text for NLP. - Multi-URL runs: Scrape several listings or posts in one run.
- Proxy-aware: Optional Apify residential proxy or custom
proxyUrl—Reddit often blocks datacenter IPs. - Automation: Apify API, schedules, webhooks; export CSV/Excel or integrate with Sheets and BI tools.
➡️ Built for insight and datasets at the post level, not full comment-thread dumps.
What Data Does Reddit Post Scraper Extract?
Each dataset item is one post (see .actor/dataset_schema.json).
Post data
- Title (
title) - Body (
selftext, optionalselftext_html) - Post URL (
url) andpermalink - Subreddit (
subreddit,subreddit_name_prefixed) - Post
id source_url(the listing or post URL used for the request)
Engagement metrics
score— net score (upvotes minus downvotes, as returned by Reddit)upvote_ratio— upvote ratio when presentnum_comments— comment count (not the comments themselves)
Author
author— post author username when present
Media and links
link_url— outbound link for link poststhumbnail— thumbnail URL when availabledomain— link domain
Other metadata
is_self— text post vs link postover_18— NSFW flaglink_flair_text— flair when presentcreated_utc— post creation time (ISO, UTC)scraped_at— scrape timestamp
Not included: comment bodies, nested replies, awards, or “search query” scraping across all of Reddit (use specific subreddit/listing URLs or post URLs).
➡️ All fields are structured and exportable as JSON, CSV, or Excel.
Advanced Features
- Subreddit listings: Use URLs like
https://www.reddit.com/r/python/hotor.../top, or passsubreddit+sort(hot,new,top,rising,controversial). - Direct posts: Open a single post URL; the actor normalizes it to the
.jsonendpoint. - Volume control:
maxPostsPerUrl— up to 100 posts per listing request (Reddit/API limit per call). - Locales:
languagesetsAccept-Languageon requests. - Bulk runs: Multiple URLs in
productUrlsorstartUrlsin one actor run.
How to Use Reddit Post Scraper on Apify
- Open Reddit Post Scraper on Apify.
- Under Run settings, enable Proxy where possible (this actor uses residential proxy when Apify Proxy is available).
- Configure Input (at least one URL source or
subreddit). - Click Start.
- Open Dataset → export JSON, CSV, or Excel, or use the API.
Local run
cd reddit-post-scraperpip install -r requirements.txt# Optional: INPUT.jsonpython main.py
Input configuration
Recommended on Apify: enable Apify Proxy (residential). Optionally set proxyUrl to your own HTTP proxy.
Example — multiple subreddit listings (use paths that include sort when you want something other than default hot, e.g. top):
{"productUrls": ["https://www.reddit.com/r/marketing/hot","https://www.reddit.com/r/startups/top"],"maxPostsPerUrl": 100,"language": "en","proxyCountry": "US","proxyUrl": ""}
Alternate — startUrls (strings or { "url" } objects):
{"startUrls": [{ "url": "https://www.reddit.com/r/marketing/hot" },"https://www.reddit.com/r/startups/new"],"maxPostsPerUrl": 50,"language": "en","proxyCountry": "AUTO_SELECT_PROXY_COUNTRY"}
Using subreddit name + sort (no full URL required):
{"subreddit": "python","sort": "top","maxPostsPerUrl": 25,"language": "en","proxyCountry": "AUTO_SELECT_PROXY_COUNTRY"}
| Field | Type | Description |
|---|---|---|
productUrls | array | Reddit URLs (string list). Subreddit listing or post URLs (same naming as Amazon/eBay scrapers in this repo). |
startUrls | array | Alternate: strings or { "url" } request-list style. |
url | string | Legacy single URL. |
subreddit | string | Subreddit name without r/ (used if no URLs given). |
sort | string | hot, new, top, rising, controversial — used with subreddit only. |
maxPostsPerUrl | integer | Max posts per URL (1–100, default 25). |
language | string | Request locale hint (en, de, …). |
proxyCountry | string | Apify proxy country when using platform proxy (US, GB, … or AUTO_SELECT_PROXY_COUNTRY). |
proxyUrl | string | Optional custom proxy URL; overrides Apify proxy when set. |
Note: There is no scrapeComments or maxPosts field in this actor — use maxPostsPerUrl. Reddit does not support arbitrary “search all Reddit” through this input; point at specific subreddit feeds or post URLs.
Output
| Field | Description |
|---|---|
url | Full post URL |
permalink | Permalink path / URL |
source_url | Input listing or post URL used |
id | Reddit post id |
title | Post title |
author | Author username |
subreddit | Subreddit name |
subreddit_name_prefixed | e.g. r/python |
score | Net score |
upvote_ratio | Upvote ratio (0–1) |
num_comments | Number of comments |
selftext | Text body (truncated at 50k chars in code) |
selftext_html | HTML body when present |
link_url | Linked URL for link posts |
is_self | Text post flag |
over_18 | NSFW flag |
link_flair_text | Flair |
thumbnail | Thumbnail URL |
domain | Link domain |
created_utc | Created time (ISO) |
scraped_at | Scrape time (ISO) |
Example item (illustrative):
{"url": "https://www.reddit.com/r/marketing/comments/abc123/example/","permalink": "https://www.reddit.com/r/marketing/comments/abc123/example/","source_url": "https://www.reddit.com/r/marketing/hot","id": "abc123","title": "Example Reddit post","author": "example_user","subreddit": "marketing","subreddit_name_prefixed": "r/marketing","score": 512,"upvote_ratio": 0.96,"num_comments": 74,"selftext": "Post body…","link_url": null,"is_self": true,"over_18": false,"created_utc": "2026-03-15T12:00:00+00:00","scraped_at": "2026-03-30T12:05:00+00:00"}
How the scraper works
- Collects URLs from
productUrls,startUrls,url, orsubreddit+sort. - Builds Reddit
.jsonURLs with alimitcapped bymaxPostsPerUrl(max 100). - Requests JSON with a descriptive User-Agent (
CloudBots Reddit Post Scraper/1.0 (Apify; +https://apify.com)), optional proxy, and ~2s delay between URLs. - Parses listing responses into post objects, or a single post from a post/comments JSON payload.
- Pushes one dataset item per post.
Anti-blocking and reliability
- Use residential proxy on Apify (actor attempts Apify Proxy with
RESIDENTIALgroup when credentials exist). proxyUrlfor your own provider if needed.- Reddit may return 403/429 without proxy; results may be empty—check logs and proxy settings.
Integrations and API
- Apify API — runs and datasets.
- Python / Node.js — Apify clients.
- Zapier / Make / Google Sheets — via Apify connectors or HTTP.
FAQ
How can this scraper help find trending topics?
Use top, hot, or rising URLs or sort values on niche subreddits, then rank by score and num_comments in your own analysis.
Can I use it for content ideation?
Yes—high score / num_comments posts plus titles and selftext show what resonates in a community.
Does it extract full comment discussions?
No. Only num_comments at the post level. Comment bodies and nested replies are not included. Extend the actor or use another tool if you need threads.
Sentiment and AI datasets?
title and selftext are suitable for NLP; add comment scraping separately if you need conversational sentiment.
Monitor a subreddit over time?
Schedule runs with the same input and compare exports.
Data freshness?
Data reflects Reddit’s JSON at request time; content changes—refresh on a schedule for monitoring.
Integrations?
Export or API into Sheets, warehouses, or automation tools.
How much data?
Up to 100 posts per URL per request, times the number of URLs in the run. Plan limits apply on Apify.
Is scraping Reddit legal?
Use only in line with Reddit’s terms and applicable law; respect privacy and rate limits.
SEO keywords (high-intent)
reddit scraper, reddit post scraper, reddit comment scraper, scrape reddit data, subreddit scraper, reddit scraping api, reddit trend analysis, reddit sentiment analysis, reddit data extraction, reddit automation tool
Actor permissions
This actor reads input and writes to its dataset only. In Apify Console you can use limited permissions for Store trust (Settings → Permissions).
Limitations
- No comment bodies or nested replies in the current implementation.
- No global Reddit “search query” input—use specific subreddit/listing or post URLs.
- Some subs or posts may be private, removed, or geo-blocked; JSON may be empty.
selftext/selftext_htmltruncated at 50,000 characters in code.- Reddit JSON shape can change; the actor may need updates.
License
MIT.
Get started
Configure URLs or subreddit + sort, enable proxy on Apify, run the actor, and export your post dataset.