Reddit Scraper — Keywords, Subreddits & Comments
Pricing
from $8.00 / 1,000 post scrapeds
Reddit Scraper — Keywords, Subreddits & Comments
Scrape Reddit posts by keywords, subreddits or direct URLs. Extracts posts, comments, upvote ratios, media URLs and analytics. Pure HTTP — no Playwright, runs on 512 MB, faster and cheaper than browser-based scrapers.
Pricing
from $8.00 / 1,000 post scrapeds
Rating
0.0
(0)
Developer
Yuliia Kulakova
Maintained by CommunityActor stats
0
Bookmarked
6
Total users
1
Monthly active users
2 days ago
Last modified
Categories
Share
Extract Reddit posts at scale using keyword search, subreddit feeds, or direct URLs. Get full post metadata, optional comments, upvote ratios, media URLs, and an AI-ready analytics report — all without a Reddit API key.
What This Actor Does
This Actor scrapes publicly available Reddit data using Reddit's JSON API endpoints — no browser automation, no Reddit API key required. It's fast, lightweight (512 MB memory), and significantly cheaper per result than browser-based alternatives.
Use cases:
- Brand monitoring — track mentions of your product, company, or competitors across Reddit
- Market research — discover what your target audience is talking about
- Lead generation — find potential customers discussing problems your product solves
- Content strategy — identify top-performing posts in your niche to guide content creation
- Academic research — collect Reddit data for NLP, sentiment analysis, or social studies
- Competitor intelligence — monitor competitor mentions and community sentiment
Key Features
✅ Multiple input modes — keyword search, subreddit feeds, direct post URLs ✅ Multi-keyword in one run — search 10 keywords at once, results combined ✅ Subreddit restriction — search within specific communities (e.g. only r/Python) ✅ Optional comments — fetch top-level comments for each post ✅ Time filters — past hour/day/week/month/year/all time ✅ Quality filters — minimum score (upvotes) and minimum comment count ✅ Full text — complete selftext body (not truncated) ✅ Media URLs — image, video, and gallery URLs extracted ✅ Analytics report — automated insights on engagement, subreddits, authors, patterns ✅ Deduplication — cross-session duplicate removal built-in ✅ Pure HTTP — no Playwright/Puppeteer, runs on 512 MB RAM
Input Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
keywords | array | — | Search keywords (e.g. ["ChatGPT", "AI tools"]) |
subreddits | array | — | Restrict to subreddits (e.g. ["python", "r/MachineLearning"]) |
startUrls | array | — | Direct Reddit URLs (subreddit pages, posts, search URLs) |
sort | select | new | Sort: relevance, new, hot, top, comments |
time | select | week | Time filter: hour, day, week, month, year, all |
maxPostsPerSource | integer | 100 | Max posts per keyword/subreddit |
maxCommentsPerPost | integer | 0 | Top comments to fetch per post (0 = skip) |
minScore | integer | 0 | Filter: minimum upvote score |
minComments | integer | 0 | Filter: minimum comment count |
includeNSFW | boolean | false | Include NSFW posts |
includeAnalytics | boolean | true | Generate analytics report in Key-Value store |
proxyConfiguration | object | Residential | Proxy settings (Apify Residential recommended) |
requestDelayMs | integer | 500 | Delay between requests (ms) |
Example Input
{"keywords": ["ChatGPT", "AI writing tools"],"subreddits": ["artificial", "ChatGPT", "MachineLearning"],"sort": "top","time": "month","maxPostsPerSource": 50,"maxCommentsPerPost": 10,"minScore": 5,"includeAnalytics": true}
Output — Dataset (one item per post)
{"id": "1abc123","name": "t3_1abc123","type": "self","title": "ChatGPT just saved me 4 hours of work — here's how","url": "https://www.reddit.com/r/ChatGPT/comments/1abc123/chatgpt_just_saved_me/","externalUrl": null,"author": "example_user","subreddit": "ChatGPT","subredditPrefixed": "r/ChatGPT","subredditSubscribers": 6800000,"selftext": "I was stuck writing a technical specification for three hours...","score": 2847,"upvoteRatio": 0.97,"upvotePct": 97,"numComments": 143,"numCrossposts": 2,"totalAwards": 5,"thumbnail": null,"mediaUrl": null,"domain": "self.ChatGPT","flair": "Prompt Engineering","isNSFW": false,"isSelf": true,"isVideo": false,"isOriginalContent": false,"isPinned": false,"isLocked": false,"isSpoiler": false,"distinguished": null,"crosspostParentId": null,"matchedKeyword": "ChatGPT","postedAt": "2025-01-15T14:32:00.000Z","editedAt": null,"scrapedAt": "2025-01-20T09:00:00.000Z","comments": [{"id": "jxyz789","author": "another_user","body": "Which model are you using? GPT-4 or the free version?","score": 312,"depth": 0,"isStickied": false,"distinguished": null,"postedAt": "2025-01-15T14:45:00.000Z","editedAt": null}]}
Output Fields Reference
| Field | Type | Description |
|---|---|---|
id | string | Reddit post ID |
type | string | self, link, image, video, gallery |
title | string | Post title |
url | string | Canonical Reddit permalink |
externalUrl | string|null | External URL for link posts |
author | string | Reddit username |
subreddit | string | Subreddit name (without r/) |
subredditSubscribers | number | Subreddit subscriber count |
selftext | string|null | Full post body text (self posts only) |
score | number | Net upvotes |
upvoteRatio | number | Upvote ratio (0–1) |
upvotePct | number | Upvote percentage (0–100) |
numComments | number | Total comment count |
totalAwards | number | Number of awards received |
thumbnail | string|null | Thumbnail image URL |
mediaUrl | string|null | Full-size image or video URL |
flair | string|null | Post flair label |
isNSFW | boolean | Whether post is marked NSFW |
isPinned | boolean | Whether post is pinned/stickied |
matchedKeyword | string|null | The keyword that found this post |
postedAt | string | ISO 8601 timestamp |
comments | array | Top comments (if maxCommentsPerPost > 0) |
Output — Analytics Report (Key-Value store: ANALYTICS)
When includeAnalytics: true, an automated insights report is saved to the Key-Value store under the ANALYTICS key.
{"type": "ANALYTICS","summary": {"totalPostsAnalyzed": 300,"uniqueSubreddits": 12,"uniqueAuthors": 287,"averageScore": 234,"medianScore": 45,"averageUpvoteRatio": 91.2},"topSubreddits": [{ "subreddit": "ChatGPT", "postCount": 87, "avgScore": 512, "avgComments": 34 }],"engagementAnalysis": {"scoreDistribution": { ... },"upvoteRatioSentiment": {"positive": { "label": "≥90% upvoted", "count": 210, "percentage": 70 }},"viralPosts": 8},"topAuthors": [{ "author": "poweruser42", "postCount": 5, "totalScore": 12400, "avgScore": 2480 }],"postingPatterns": {"peakHourUTC": 14,"peakDayOfWeek": "Tuesday"},"topPosts": [ ... ],"generatedAt": "2025-01-20T09:00:05.000Z"}
How It Compares to Alternatives
| Feature | This Actor | crawlerbros/reddit-keywords | fatihtahta/reddit-scraper-search-fast |
|---|---|---|---|
| Keyword search | ✅ | ✅ | ✅ |
| Subreddit feed | ✅ | ❌ | ✅ |
| Subreddit restriction | ✅ | ❌ | ✅ |
| Comments included | ✅ | ❌ | Optional |
| Time filter | ✅ | ❌ | ✅ |
| Min score filter | ✅ | ❌ | ❌ |
| Min comments filter | ✅ | ❌ | ❌ |
| Analytics report | ✅ | ❌ | ❌ |
| Full post text | ✅ | ✅ | ✅ |
| Media URLs | ✅ | Partial | ✅ |
| Browser required | ❌ (Pure HTTP) | ✅ (4 GB RAM!) | ❌ |
| Memory required | 512 MB | 4096 MB | 512 MB |
Tips & Best Practices
For brand monitoring: Use specific keywords ("your_brand_name") with sort: "new" and time: "day" to catch fresh mentions. Schedule the actor to run daily.
For market research: Combine relevant keywords (["pain point", "frustrated with", "looking for alternative"]) with sort: "top" and time: "month" to find the most resonant discussions.
For large runs: Set requestDelayMs: 1000 and use Apify Residential proxies to avoid rate limiting when scraping thousands of posts.
For comments analysis: Set maxCommentsPerPost: 20 to get the top 20 comments per post. Note: this multiplies API calls (1 extra call per post), so factor this into run time and cost.
Subreddit feed vs search: Leave keywords empty and only fill subreddits to scrape a subreddit's full feed (hot/new/top posts). This is ideal for monitoring specific communities.
Legal and Ethical Use
This Actor accesses only publicly available Reddit data — the same data visible to anyone visiting Reddit without an account. No authentication, login, or private data is accessed.
Use this tool in compliance with Reddit's Terms of Service and applicable data privacy laws. Do not use scraped data to identify or target individual users, send unsolicited communications, or violate Reddit's content policies.
