Reddit Scraper — Keywords, Subreddits & Comments avatar

Reddit Scraper — Keywords, Subreddits & Comments

Pricing

from $8.00 / 1,000 post scrapeds

Go to Apify Store
Reddit Scraper — Keywords, Subreddits & Comments

Reddit Scraper — Keywords, Subreddits & Comments

Scrape Reddit posts by keywords, subreddits or direct URLs. Extracts posts, comments, upvote ratios, media URLs and analytics. Pure HTTP — no Playwright, runs on 512 MB, faster and cheaper than browser-based scrapers.

Pricing

from $8.00 / 1,000 post scrapeds

Rating

0.0

(0)

Developer

Yuliia Kulakova

Yuliia Kulakova

Maintained by Community

Actor stats

0

Bookmarked

6

Total users

1

Monthly active users

2 days ago

Last modified

Share

Extract Reddit posts at scale using keyword search, subreddit feeds, or direct URLs. Get full post metadata, optional comments, upvote ratios, media URLs, and an AI-ready analytics report — all without a Reddit API key.

What This Actor Does

This Actor scrapes publicly available Reddit data using Reddit's JSON API endpoints — no browser automation, no Reddit API key required. It's fast, lightweight (512 MB memory), and significantly cheaper per result than browser-based alternatives.

Use cases:

  • Brand monitoring — track mentions of your product, company, or competitors across Reddit
  • Market research — discover what your target audience is talking about
  • Lead generation — find potential customers discussing problems your product solves
  • Content strategy — identify top-performing posts in your niche to guide content creation
  • Academic research — collect Reddit data for NLP, sentiment analysis, or social studies
  • Competitor intelligence — monitor competitor mentions and community sentiment

Key Features

Multiple input modes — keyword search, subreddit feeds, direct post URLs ✅ Multi-keyword in one run — search 10 keywords at once, results combined ✅ Subreddit restriction — search within specific communities (e.g. only r/Python) ✅ Optional comments — fetch top-level comments for each post ✅ Time filters — past hour/day/week/month/year/all time ✅ Quality filters — minimum score (upvotes) and minimum comment count ✅ Full text — complete selftext body (not truncated) ✅ Media URLs — image, video, and gallery URLs extracted ✅ Analytics report — automated insights on engagement, subreddits, authors, patterns ✅ Deduplication — cross-session duplicate removal built-in ✅ Pure HTTP — no Playwright/Puppeteer, runs on 512 MB RAM

Input Parameters

ParameterTypeDefaultDescription
keywordsarraySearch keywords (e.g. ["ChatGPT", "AI tools"])
subredditsarrayRestrict to subreddits (e.g. ["python", "r/MachineLearning"])
startUrlsarrayDirect Reddit URLs (subreddit pages, posts, search URLs)
sortselectnewSort: relevance, new, hot, top, comments
timeselectweekTime filter: hour, day, week, month, year, all
maxPostsPerSourceinteger100Max posts per keyword/subreddit
maxCommentsPerPostinteger0Top comments to fetch per post (0 = skip)
minScoreinteger0Filter: minimum upvote score
minCommentsinteger0Filter: minimum comment count
includeNSFWbooleanfalseInclude NSFW posts
includeAnalyticsbooleantrueGenerate analytics report in Key-Value store
proxyConfigurationobjectResidentialProxy settings (Apify Residential recommended)
requestDelayMsinteger500Delay between requests (ms)

Example Input

{
"keywords": ["ChatGPT", "AI writing tools"],
"subreddits": ["artificial", "ChatGPT", "MachineLearning"],
"sort": "top",
"time": "month",
"maxPostsPerSource": 50,
"maxCommentsPerPost": 10,
"minScore": 5,
"includeAnalytics": true
}

Output — Dataset (one item per post)

{
"id": "1abc123",
"name": "t3_1abc123",
"type": "self",
"title": "ChatGPT just saved me 4 hours of work — here's how",
"url": "https://www.reddit.com/r/ChatGPT/comments/1abc123/chatgpt_just_saved_me/",
"externalUrl": null,
"author": "example_user",
"subreddit": "ChatGPT",
"subredditPrefixed": "r/ChatGPT",
"subredditSubscribers": 6800000,
"selftext": "I was stuck writing a technical specification for three hours...",
"score": 2847,
"upvoteRatio": 0.97,
"upvotePct": 97,
"numComments": 143,
"numCrossposts": 2,
"totalAwards": 5,
"thumbnail": null,
"mediaUrl": null,
"domain": "self.ChatGPT",
"flair": "Prompt Engineering",
"isNSFW": false,
"isSelf": true,
"isVideo": false,
"isOriginalContent": false,
"isPinned": false,
"isLocked": false,
"isSpoiler": false,
"distinguished": null,
"crosspostParentId": null,
"matchedKeyword": "ChatGPT",
"postedAt": "2025-01-15T14:32:00.000Z",
"editedAt": null,
"scrapedAt": "2025-01-20T09:00:00.000Z",
"comments": [
{
"id": "jxyz789",
"author": "another_user",
"body": "Which model are you using? GPT-4 or the free version?",
"score": 312,
"depth": 0,
"isStickied": false,
"distinguished": null,
"postedAt": "2025-01-15T14:45:00.000Z",
"editedAt": null
}
]
}

Output Fields Reference

FieldTypeDescription
idstringReddit post ID
typestringself, link, image, video, gallery
titlestringPost title
urlstringCanonical Reddit permalink
externalUrlstring|nullExternal URL for link posts
authorstringReddit username
subredditstringSubreddit name (without r/)
subredditSubscribersnumberSubreddit subscriber count
selftextstring|nullFull post body text (self posts only)
scorenumberNet upvotes
upvoteRationumberUpvote ratio (0–1)
upvotePctnumberUpvote percentage (0–100)
numCommentsnumberTotal comment count
totalAwardsnumberNumber of awards received
thumbnailstring|nullThumbnail image URL
mediaUrlstring|nullFull-size image or video URL
flairstring|nullPost flair label
isNSFWbooleanWhether post is marked NSFW
isPinnedbooleanWhether post is pinned/stickied
matchedKeywordstring|nullThe keyword that found this post
postedAtstringISO 8601 timestamp
commentsarrayTop comments (if maxCommentsPerPost > 0)

Output — Analytics Report (Key-Value store: ANALYTICS)

When includeAnalytics: true, an automated insights report is saved to the Key-Value store under the ANALYTICS key.

{
"type": "ANALYTICS",
"summary": {
"totalPostsAnalyzed": 300,
"uniqueSubreddits": 12,
"uniqueAuthors": 287,
"averageScore": 234,
"medianScore": 45,
"averageUpvoteRatio": 91.2
},
"topSubreddits": [
{ "subreddit": "ChatGPT", "postCount": 87, "avgScore": 512, "avgComments": 34 }
],
"engagementAnalysis": {
"scoreDistribution": { ... },
"upvoteRatioSentiment": {
"positive": { "label": "≥90% upvoted", "count": 210, "percentage": 70 }
},
"viralPosts": 8
},
"topAuthors": [
{ "author": "poweruser42", "postCount": 5, "totalScore": 12400, "avgScore": 2480 }
],
"postingPatterns": {
"peakHourUTC": 14,
"peakDayOfWeek": "Tuesday"
},
"topPosts": [ ... ],
"generatedAt": "2025-01-20T09:00:05.000Z"
}

How It Compares to Alternatives

FeatureThis Actorcrawlerbros/reddit-keywordsfatihtahta/reddit-scraper-search-fast
Keyword search
Subreddit feed
Subreddit restriction
Comments includedOptional
Time filter
Min score filter
Min comments filter
Analytics report
Full post text
Media URLsPartial
Browser required❌ (Pure HTTP)✅ (4 GB RAM!)
Memory required512 MB4096 MB512 MB

Tips & Best Practices

For brand monitoring: Use specific keywords ("your_brand_name") with sort: "new" and time: "day" to catch fresh mentions. Schedule the actor to run daily.

For market research: Combine relevant keywords (["pain point", "frustrated with", "looking for alternative"]) with sort: "top" and time: "month" to find the most resonant discussions.

For large runs: Set requestDelayMs: 1000 and use Apify Residential proxies to avoid rate limiting when scraping thousands of posts.

For comments analysis: Set maxCommentsPerPost: 20 to get the top 20 comments per post. Note: this multiplies API calls (1 extra call per post), so factor this into run time and cost.

Subreddit feed vs search: Leave keywords empty and only fill subreddits to scrape a subreddit's full feed (hot/new/top posts). This is ideal for monitoring specific communities.

This Actor accesses only publicly available Reddit data — the same data visible to anyone visiting Reddit without an account. No authentication, login, or private data is accessed.

Use this tool in compliance with Reddit's Terms of Service and applicable data privacy laws. Do not use scraped data to identify or target individual users, send unsolicited communications, or violate Reddit's content policies.