Reddit Data Scraper – Scrape Posts, Comments, Upvotes & More
Pricing
$5.00/month + usage
Reddit Data Scraper – Scrape Posts, Comments, Upvotes & More
Extract Reddit posts, comments, upvotes, and subreddit data with this powerful Reddit scraper. Ideal for data analysis, lead generation, trend research, and AI datasets. Scrape Reddit data at scale without API limits and export results in JSON, CSV, or Excel format.
Pricing
$5.00/month + usage
Rating
0.0
(0)
Developer
Sovanza
Actor stats
0
Bookmarked
2
Total users
1
Monthly active users
2 days ago
Last modified
Categories
Share
Reddit Post Scraper
What is Reddit Post Scraper?
Reddit Post Scraper is a powerful Reddit data extraction tool built on Apify that allows you to scrape posts and subreddit listings at scale using Reddit’s public JSON endpoints (no browser required). It is designed for marketers, researchers, developers, and businesses who want to automate trend analysis, content research, lead generation, and AI dataset creation — without relying on Reddit’s official API.
Why Use This Reddit Scraper?
Use this scraper to:
- Extract trending posts from any subreddit
- Analyze discussions, engagement, and content performance
- Track upvotes, comments, and popularity over time
- Build datasets for AI, sentiment analysis, and research
- Automate Reddit data collection workflows
Features
- Scrape subreddit listings (
hot,new,top,rising, etc.) from subreddit URLs or names. - Scrape single posts directly by URL.
- Extract rich post-level data:
- Post title and body (self text + HTML)
- Author and subreddit information
- Score, upvote ratio, number of comments
- Permalink, link URL, flair, thumbnail, domain
- Creation time and scrape time
- Uses Reddit’s JSON API (
.jsonendpoints) — no headless browser needed. - Structured output for analytics and automation.
How to Use Reddit Post Scraper on Apify
Using the Actor
To use this actor on Apify, follow these steps:
-
Go to the Reddit Post Scraper on the Apify platform.
-
Input Configuration:
- Provide one or more Reddit URLs (subreddit listings or individual posts), or a subreddit name plus sort mode.
- Configure how many posts to fetch per URL and your proxy settings for reliability.
Input Configuration
The actor supports multiple input styles. A typical configuration looks like:
{"startUrls": [{ "url": "https://www.reddit.com/r/marketing/top" },{ "url": "https://www.reddit.com/r/startups/new" }],"subreddit": "python","sort": "hot","maxPostsPerUrl": 25,"language": "en","proxyCountry": "AUTO_SELECT_PROXY_COUNTRY","proxyUrl": null}
Common fields:
productUrls(optional): One or more Reddit URLs (subreddit or post), one per line (array of strings).startUrls(optional): Array of{ "url": "..." }objects used as starting points (subreddits or posts).url(optional): Single Reddit URL (legacy single-URL input).subreddit(optional): Subreddit name only, e.g."python","AskReddit".sort(optional): Sort order for subreddit listings (e.g.hot,new,top,rising,controversial).maxPostsPerUrl(optional): Maximum number of posts to fetch per listing URL (typically 1–100, default ~25).language(optional): Language/locale hint for requests (default:en).proxyCountry(optional): Apify proxy country, e.g.AUTO_SELECT_PROXY_COUNTRY,US,GB,DE,FR,JP,CA,IT.proxyUrl(optional): Custom proxy URL (e.g. Webshare). When set, overrides Apify Proxy.
-
Run the Actor:
- Click Start to begin scraping.
- The actor fetches
.jsonfrom each Reddit URL and normalizes post data.
-
Access Your Results:
- View results in the Dataset tab.
- Export data in JSON, CSV, or Excel.
- Access via the Apify API for programmatic workflows.
-
Schedule Regular Runs (Optional):
- Schedule the actor to run periodically to track trends and subreddit activity over time.
Output
Each Reddit post becomes one item in the dataset. According to the dataset schema, each item typically includes:
url: Full URL to the Reddit post.permalink: Reddit permalink path.source_url: The URL that was scraped (listing or post URL).id: Reddit post ID.title: Post title.author: Post author username.subreddit: Subreddit name.subreddit_name_prefixed: Subreddit with prefix, e.g.r/python.score: Net score (upvotes - downvotes).upvote_ratio: Ratio of upvotes (0–1).num_comments: Number of comments.selftext: Post body (self text).selftext_html: Post body in HTML (if available).link_url: URL of linked content (for link posts).is_self:trueif text (self) post.over_18:trueif marked NSFW.link_flair_text: Post flair text (if present).thumbnail: Thumbnail image URL (if available).domain: Domain of the linked content (for link posts).created_utc: ISO timestamp of when the post was created (UTC).scraped_at: Timestamp of when the post was scraped.
Example item (simplified):
{"url": "https://www.reddit.com/r/marketing/comments/xxxxxx/example_post/","permalink": "/r/marketing/comments/xxxxxx/example_post/","source_url": "https://www.reddit.com/r/marketing/top","id": "xxxxxx","title": "Example Reddit post title","author": "example_user","subreddit": "marketing","subreddit_name_prefixed": "r/marketing","score": 512,"upvote_ratio": 0.96,"num_comments": 74,"selftext": "Post body text...","selftext_html": "<p>Post body text...</p>","link_url": null,"is_self": true,"over_18": false,"link_flair_text": "Discussion","thumbnail": "https://b.thumbs.redditmedia.com/...","domain": "self.marketing","created_utc": "2025-01-01T12:00:00Z","scraped_at": "2025-01-01T12:05:00Z"}
➡️ Output is clean, structured, and ready for analysis, trend tracking, or automation.
How the Scraper Works
The Reddit Post Scraper uses Reddit’s public JSON endpoints (no browser or official API) to fetch and normalize post data:
- URL normalization: For each input URL or subreddit, the actor builds the corresponding
.jsonendpoint. - HTTP requests: It sends HTTP requests with a descriptive User-Agent and optional proxy configuration.
- Data extraction: It parses the JSON response, extracts relevant post fields, and converts timestamps to ISO strings.
- Dataset writing: Each post is saved as a structured item in the default Apify dataset.
Anti-blocking & Reliability
To keep scraping stable on Reddit:
- Uses a descriptive User-Agent (
Sovanza Reddit Post Scraper/1.0). - Adds a small delay between requests to reduce rate limiting.
- Supports Apify Proxy and custom
proxyUrlso you can use residential or region-specific IPs. - Retries failed requests where appropriate to handle transient issues.
Performance Optimization
- Processes multiple listing URLs in a single run.
- Uses lightweight HTTP+JSON (no headless browser), which is faster and cheaper.
- Lets you control
maxPostsPerUrlto tune between speed and depth.
Why Choose This Actor?
- Scalable Reddit data extraction from subreddits and posts.
- Extracts rich post data and engagement metrics.
- No official Reddit API required.
- Automation-ready via Apify API, scheduling, and webhooks.
- Produces clean, structured datasets suitable for analytics and AI.
FAQ
How does Reddit Post Scraper work?
It appends .json to Reddit URLs (subreddits or posts), fetches the public Reddit JSON API, normalizes post data, and saves it to an Apify dataset.
Can I scrape multiple subreddits at once?
Yes. You can provide multiple subreddit listing URLs or use multiple startUrls and/or productUrls in a single run.
Does it require Reddit API credentials?
No. It works with publicly available Reddit JSON endpoints and does not use Reddit’s official OAuth API.
Can I extract comments and replies?
This actor focuses on post-level data (title, body, score, metadata). If you need full comment trees, you may use or extend it with an additional comments scraper.
Is the data accurate?
Yes. Data is fetched in real time from Reddit’s public JSON responses.
Can I automate scraping?
Yes. You can use Apify scheduling, webhooks, and the API to run it regularly and integrate it into pipelines.
What formats are supported?
JSON, CSV, Excel via Apify dataset export, plus API output for programmatic access.
Is it suitable for AI and sentiment analysis?
Yes. The structured text fields (title, selftext, etc.) are ideal for NLP, topic modeling, and sentiment analysis workflows.
Is scraping Reddit legal?
Scraping publicly available data is generally allowed, but you should comply with Reddit’s terms of service and all applicable laws.
Actor permissions
This Actor is designed to work with limited permissions. It only reads input and writes to its default dataset; it does not access other user data or require full account access.
To set limited permissions in Apify Console:
- Open your Actor on the Apify platform.
- Go to the Source tab (or Settings).
- Click Review permissions (or open Settings → Permissions).
- Select Limited permissions and save.
Using limited permissions improves trust and can improve your Actor's quality score in the Store.
Anti-blocking Notes
- Reddit requires a descriptive User-Agent; the actor sends
Sovanza Reddit Post Scraper/1.0. - A short delay between requests helps reduce the risk of rate limiting.
- When running on Apify, enable a suitable proxy configuration (often residential) to reduce 429/403 errors, as Reddit may block datacenter IPs.
Limitations
- Some subreddits or posts may be restricted, removed, or rate-limited.
- Reddit’s JSON structure can change, which may require actor updates.
- Large-scale scraping may require appropriate Apify plan limits and careful proxy usage.
License
This project is licensed under the MIT License - see the LICENSE file for details.
Get Started
Start extracting Reddit posts and build powerful datasets for research, marketing, and automation today. 🚀