Reddit Scraper 💰 $1.25/1K — Posts & Full Comment Threads
Pricing
from $1.25 / 1,000 results
Reddit Scraper 💰 $1.25/1K — Posts & Full Comment Threads
Scrape Reddit posts with their full nested comment threads, user profiles, and community pages. Search any subreddit or keyword and capture scores, awards and timestamps. Bodies come as AI-ready text, HTML and Markdown for LLMs. No login or developer token is needed.
Pricing
from $1.25 / 1,000 results
Rating
0.0
(0)
Developer
Black Falcon Data
Maintained by CommunityActor stats
0
Bookmarked
2
Total users
1
Monthly active users
2 days ago
Last modified
Categories
Share
What does Reddit Scraper do?
Reddit Scraper extracts Reddit discussions — posts together with their nested comment threads — plus user profiles and community pages, in clean, AI-ready formats. Point it at any subreddit, post, profile, or search term and get structured records with text, HTML, and Markdown bodies, scores, vote ratios, awards, authors, timestamps, and extracted links. No Reddit account or login required.
New to Apify? Sign up free and use the included $5 monthly platform credit to test this actor.
Key features
- 🧵 Deep comment threads — capture discussions beyond top-level replies: nested comments with parent/child IDs, depth, score, and awards, up to the comment and depth limits you set. Optionally expand collapsed and low-score branches for deeper coverage.
- 🤖 AI-ready output — every post, comment, and profile body is emitted as clean text, HTML, and Markdown, so you can pipe threads straight into LLMs, RAG datasets, and MCP tools without extra cleanup.
- 🗂️ Four content types, one actor — scrape posts, comment threads, user profiles, and community pages in a single run; mix and match by dropping in any Reddit URL.
- 🔎 Search and subreddit feeds — pull a subreddit's feed, or run keyword searches with hot / top / new / relevance sorting that return lightweight discovery results (plus their comments) you can enrich by scraping the post URLs. Subreddit feeds also support time windows from the past hour through all-time.
- 🔗 Link & social extraction — outbound links and social handles mentioned in post and comment bodies are pulled into structured
extractedUrlsandsocialProfilesfields automatically. - 🌳 Depth & volume control — cap comments per post, limit reply nesting depth, or skip comments entirely to scrape posts only — tune coverage against cost.
- 🔞 NSFW & date filtering — include or exclude 18+ content with one toggle, and skip posts older than a date you set (subreddit feeds and post URLs) so scheduled runs stay focused on fresh discussions.
- 🧹 Lean, flexible output — choose a single description format (text, HTML, or Markdown) and strip empty fields to keep datasets small for downstream pipelines.
- 🔑 No login or API key required — point the actor at any public Reddit URL or search term and run; no Reddit account or app registration needed.
What data can you extract from reddit.com?
Every record carries a stable itemType (post, comment, user, or community), so you can tell the four content types apart inside a single dataset.
- Posts —
title, body as text / HTML / Markdown,score,upvoteRatio,numComments,awardCount,author,community,postType,language,createdAt, and the canonical posturl. - Comments — threaded, with
postId,parentId, anddepthso you can rebuild the tree, plusscore,awardCount,author, andcreatedAt. - Users —
username,postKarma,commentKarma, accountcreatedAt, profile description, and avataricon. - Communities —
name, displaytitle,members,createdAt, and description.
The full post fields above come from subreddit-feed and single-post scraping. Posts surfaced by keyword search are lighter discovery records — id, url, title, subreddit, and nsfw — and their comment threads are still fetched (unless you skip comments); to get a search hit's full post metadata, scrape its URL directly. For the same reason, the postDateLimit filter applies to subreddit feeds and post URLs, not to search results.
Post, comment, and profile text is also mined for outbound links and social handles, exposed as structured extractedUrls and socialProfiles fields. Fields stay consistent across runs — unavailable values are returned as null rather than dropped, unless you enable excludeEmptyFields to slim the payload.
Input
Configure the actor through the input schema in Apify Console.
Key parameters:
startUrls— Reddit URLs to scrape — subreddits, post pages, user profiles, community pages, or search result pages. Each URL determines what type of content is fetched.searchTerms— Search Reddit for these terms. Each entry becomes an independent search. Search posts are lightweight discovery records (plus their comments) — see Search Type.searchType— Type of results to return when using Search Terms. Post results are lightweight discovery records — id, url, title, subreddit and NSFW flag — plus their comment threads; scrape a result's URL directly for its full post fields (author, body, score, timestamp). (default:"posts")sort— Sort order for posts and search results. (default:"hot")time— Restrict subreddit-feed results to a time window (applies to Top sort on feeds; search is not time-windowed). (default:"all")includeNSFW— Include posts and communities marked as NSFW (18+). (default:false)postDateLimit— Skip posts older than this ISO-8601 date (e.g. "2024-01-01"). Applies to subreddit feeds and post URLs; search results carry no date and are not filtered. Leave blank for no date limit.maxItems— Maximum total records to save across all sources (posts, comments, users, communities). (default:100)maxComments— Maximum number of comments to collect from each post page. (default:200)includeCollapsed— Expand and include comments that are initially collapsed (controversial or low-score). Enables deeper thread coverage, up to the comment and depth limits you set. (default:true)commentDepth— Maximum reply nesting depth to collect (1 = top-level only). (default:10)skipComments— Do not collect comments from post pages — output posts only. (default:false)- ...and 3 more parameters
Input examples
Scrape a subreddit feed — Pull recent posts and their comment threads from any subreddit.
→ Posts from r/programming, each followed by its nested comments.
{"startUrls": [{"url": "https://www.reddit.com/r/programming/"}],"sort": "hot","maxItems": 100,"maxComments": 200}
Search Reddit by keyword — Run one or more keyword searches with sorting. Results are lightweight discovery records (plus their comments); scrape a result's URL for its full post fields.
→ Matching posts sorted by top score, each followed by its comments.
{"searchTerms": ["mechanical keyboards","ergonomic mice"],"searchType": "posts","sort": "top","maxItems": 200}
Get one post with its full thread — Point at a single post URL to capture the post and its comment tree (up to the limits you set), including collapsed replies.
→ One post record plus its comments, up to the configured limits, with parent/child IDs and depth.
{"startUrls": [{"url": "https://www.reddit.com/r/programming/comments/1abc234/what_makes_a_codebase_pleasant_to_work_in/"}],"includeCollapsed": true,"commentDepth": 10,"maxComments": 500}
Posts only — skip comments — Collect just the posts from a subreddit without fetching comment threads.
→ Post records only — faster and cheaper when you don't need discussions.
{"startUrls": [{"url": "https://www.reddit.com/r/technology/"}],"skipComments": true,"maxItems": 250}
Output
Each run produces a dataset of structured Reddit records. Results can be downloaded as JSON, CSV, or Excel from the Dataset tab in Apify Console.
Example Reddit record
{"itemType": "post","id": "t3_1ttjtwv","url": "https://www.reddit.com/r/programming/comments/1ttjtwv/your_process_memory_is_a_file_the/","title": "Your process' memory is a file: The underappreciated gem that is /proc/<pid>/mem","body": null,"bodyHtml": null,"contentHref": "https://lcamtuf.substack.com/p/weekend-trivia-your-process-memory","postType": "link","language": "en","score": 129,"upvoteRatio": 0.9708029197080292,"numComments": 1,"awardCount": 0,"author": "mttd","authorId": "t2_6gkbb","community": "r/programming","communityId": "t5_2fwo","createdAt": "2026-06-01T08:32:12.581+02:00","icon": "https://www.redditstatic.com/avatars/defaults/v2/avatar_default_7.png","nsfw": false}
Example comment record
{"itemType": "comment","id": "t1_ookwxid","url": "https://www.reddit.com/r/programming/comments/1tqwksq/comment/ookwxid/","postId": "t3_1tqwksq","parentId": null,"depth": 0,"body": "When is everyone going to agree that the javascript ecosystem is complete garbage?","bodyHtml": "\n <div id=\"t1_ookwxid-post-rtjson-content\" class=\"py-0 xs:mx-xs mx-2xs max-w-full scalable-text [--emote-size:20px]\" dir=\"auto\">\n <p dir=\"auto\">\n When is everyone going to agree that...","score": 27,"author": "wildjokers","awardCount": 0,"createdAt": "2026-05-29T14:48:19.159000+0000","description": "When is everyone going to agree that the javascript ecosystem is complete garbage?","descriptionText": "When is everyone going to agree that the javascript ecosystem is complete garbage?","descriptionHtml": "\n <div id=\"t1_ookwxid-post-rtjson-content\" class=\"py-0 xs:mx-xs mx-2xs max-w-full scalable-text [--emote-size:20px]\" dir=\"auto\">\n <p dir=\"auto\">\n When is everyone going to agree that...","descriptionMarkdown": "When is everyone going to agree that the javascript ecosystem is complete garbage?"}
Example user record
{"itemType": "user","id": "t2_1w72","url": "https://www.reddit.com/user/spez/","username": "spez","postKarma": 182953,"commentKarma": 755345,"createdAt": "2005-06-06T00:00:00.000Z","description": "Reddit CEO","icon": "https://styles.redditmedia.com/t5_3k30p/styles/profileIcon_uj015iwx9s7g1.png?frame=1&auto=webp&crop=256%3A256%2Csmart&s=54cdc94b6359f38240017e6737d3c56933e0206b"}
Example community record
{"itemType": "community","id": "t5_2fwo","name": "programming","title": "programming","members": null,"description": "Computer Programming","createdAt": null,"url": "https://www.reddit.com/r/programming/"}
How to scrape reddit.com
- Go to Reddit Scraper in Apify Console.
- Configure the input.
- Set
maxItemsto control how many results you need. - Click Start and wait for the run to finish.
- Export the dataset as JSON, CSV, or Excel.
Use cases
- Build training and RAG datasets from real Reddit discussions, with full comment context in Markdown.
- Monitor brand, product, or competitor mentions across subreddits and surface the threads driving them.
- Track sentiment and emerging topics in niche communities over time with scheduled runs.
- Power market and audience research with authentic user opinions, questions, and pain points.
- Feed structured Reddit threads into AI agents, MCP tools, and automation pipelines.
- Generate leads by extracting outbound links and social handles shared in relevant threads.
- Archive a subreddit, profile, or discussion thread for research or record-keeping.
- Export clean post and comment data to dashboards, spreadsheets, or data warehouses.
How much does it cost to scrape reddit.com?
Reddit Scraper uses pay-per-event pricing. You pay a small fee when the run starts and then for each result that is actually produced.
- Run start: $0.005 per run
- Per result: $0.00125 per Reddit record
Example costs:
- 10 results: $0.02
- 25 results: $0.04
- 100 results: $0.13
- 200 results: $0.26
- 500 results: $0.63
FAQ
How many results can I get from reddit.com?
The number of results depends on the search query and available listings on reddit.com. Use the maxItems parameter to control how many results are returned per run.
Can I integrate Reddit Scraper with other apps?
Yes. Reddit Scraper works with Apify's integrations to connect with tools like Zapier, Make, Google Sheets, Slack, and more. You can also use webhooks to trigger actions when a run completes.
Can I use Reddit Scraper with the Apify API?
Yes. You can start runs, manage inputs, and retrieve results programmatically through the Apify API. Client libraries are available for JavaScript, Python, and other languages.
Can I use Reddit Scraper through an MCP Server?
Yes. Apify provides an MCP Server that lets AI assistants and agents call this actor directly. Use a single descriptionFormat and excludeEmptyFields to keep payloads manageable for LLM context windows.
Is it legal to scrape reddit.com?
This actor extracts publicly available data from reddit.com. Web scraping of public information is generally considered legal, but you should always review the target site's terms of service and ensure your use case complies with applicable laws and regulations, including GDPR where relevant.
Your feedback
If you have questions, need a feature, or found a bug, please open an issue on the actor's page in Apify Console. Your feedback helps us improve.
You might also like
- Actiris Brussels Job Scraper — Scrape all active job listings from actiris.brussels — official Brussels public employment service..
- Adzuna Job Scraper — Global Jobs with Salary & Coordinates — Scrape adzuna.com job listings across 19 country markets with structured salary data.
- APEC.fr Scraper - French Executive Jobs — Scrape apec.fr - French executive job listings with salary ranges, company, location, skills,.
- Arbeitsagentur Jobs Feed — German Federal Employment Agency — Extract job listings from arbeitsagentur.de — Germany's official public employment portal with 1M+.
- Arbeitsagentur Scraper - German Jobs — Scrape arbeitsagentur.de - Germany’s official employment portal with 1M+ listings. Contact data,.
- Arbetsformedlingen Job Scraper — Scrape arbetsformedlingen.se (Platsbanken) — Sweden's official employment portal. Returns 84.
- AutoScout24 Scraper — European Car Listings with Dealer Data — Scrape autoscout24.com - Europe's largest used car marketplace with 770K+ listings. Structured.
- Bayt.com Scraper — MENA Jobs with Salary & Skills Filter — Scrape bayt.com — leading Middle East job board covering UAE, Saudi Arabia, Qatar, Egypt and 9 more.
Getting started with Apify
New to Apify? Create a free account with $5 credit — no credit card required.
- Sign up — $5 platform credit included
- Open this actor and configure your input
- Click Start — export results as JSON, CSV, or Excel
Need more later? See Apify pricing.