Reddit Keyword Search Scraper
Under maintenancePricing
Pay per usage
Reddit Keyword Search Scraper
Under maintenanceReddit Keyword Search Scraper searches Reddit and extracts posts matching your keywords using public web pages. It returns post titles, content, authors, engagement metrics, media URLs, and matched keywords across all Reddit or selected subreddits.
Pricing
Pay per usage
Rating
0.0
(0)
Developer
HappiTap
Maintained by CommunityActor stats
1
Bookmarked
2
Total users
1
Monthly active users
2 days ago
Last modified
Categories
Share
Reddit Keyword Search Scraper is a powerful web scraping tool that searches Reddit and extracts matching posts based on your keywords. This Actor scrapes Reddit's HTML pages to find posts containing your search terms across all of Reddit or within specific subreddits. Simply provide a search query, and the Actor will return detailed post information including titles, content, author details, engagement metrics, media URLs, and matched keywords.
What does Reddit Keyword Search Scraper do?
Reddit Keyword Search Scraper searches Reddit's public content and extracts posts that match your search query. Unlike Reddit's official API, this Actor uses web scraping to access Reddit's search functionality, allowing you to find posts by keywords across the entire platform or within specific subreddits. The Actor extracts comprehensive post data including titles, text content, author information, upvote scores, comment counts, timestamps, permalinks, post flairs, media URLs, and identifies which keywords from your query were found in each post.
The input is simple: provide a search query (required), optionally specify subreddits to search within, choose a sort order (relevance, new, top, or comments), set a time period for sorting, and define a limit for the number of results. The Actor handles all the complexity of web scraping, pagination, and data extraction automatically.
Why use Reddit Keyword Search Scraper?
Business Use Cases
- Market Research: Track discussions about your products, competitors, or industry trends
- Sentiment Analysis: Monitor public opinion and sentiment around specific topics
- Content Discovery: Find popular posts and trending discussions for content marketing
- Community Monitoring: Keep track of discussions in relevant subreddits
- Data Collection: Gather Reddit data for research, analysis, or machine learning projects
Platform Advantages
When you use Reddit Keyword Search Scraper on the Apify platform, you get:
- Scalability: Run multiple searches simultaneously with automatic resource management
- Monitoring: Track your scraping runs with detailed logs and metrics
- API Access: Integrate Reddit data extraction into your workflows via REST API
- Scheduling: Automate regular Reddit searches with built-in scheduling
- Data Storage: Results are automatically stored in Apify datasets with easy export options
- Integrations: Connect to Zapier, Make, and other automation platforms
- Reliability: Built-in retry logic and error handling for robust scraping
What data can Reddit Keyword Search Scraper extract?
The Actor extracts comprehensive data from each matching Reddit post:
| Field | Description |
|---|---|
| title | The title of the Reddit post |
| text | The selftext/body content of the post |
| author | The username of the post author |
| score | The upvote score (karma) of the post |
| commentCount | Number of comments on the post |
| createdAt | ISO 8601 timestamp when the post was created |
| permalink | Full URL to the Reddit post |
| flair | Link flair text (if any) |
| mediaUrls | Array of media URLs (images, videos, galleries) found in the post |
| matchedKeywords | Keywords from your search query that were found in the post |
| searchRank | The rank/position of this result in the search results |
How to scrape Reddit with Reddit Keyword Search Scraper
Step-by-Step Tutorial
-
Set up your search query
- Enter your search keywords in the "Search Query" field
- Example: "javascript tutorial" or "machine learning"
-
Optionally specify subreddits
- Leave empty to search all of Reddit
- Or add specific subreddits like "learnprogramming" or "node"
-
Choose sorting options
- Sort: Select relevance, new, top, or comments
- Time Period: Choose day, week, month, year, or all time (for top/comments sort)
-
Set result limit
- Specify how many posts you want (1-1000)
- Default is 50 posts
-
Run the Actor
- Click "Start" to begin scraping
- Monitor progress in real-time
- Results are automatically saved to the dataset
-
Download your data
- Access results from the Dataset tab
- Export as JSON, CSV, Excel, or HTML
- Use the API to integrate with your applications
Video Tutorial
[Add YouTube video URL here if available]
How much does it cost to scrape Reddit?
Reddit Keyword Search Scraper uses Apify's consumption-based pricing model, meaning you only pay for the Compute Units (CUs) consumed during the scraping run. The cost depends on the complexity of your search and the number of results requested.
- Free Plan: Get started with free CUs to test the Actor
- Small Searches (1-50 results): Typically consumes 0.1-0.5 CUs
- Medium Searches (50-200 results): Typically consumes 0.5-2 CUs
- Large Searches (200-1000 results): Typically consumes 2-10 CUs
The Actor is optimized for efficiency with built-in rate limiting and smart pagination to minimize compute usage. Since Reddit scraping can vary based on the number of subreddits searched and result limits, actual CU consumption may vary. You can monitor CU usage in real-time during each run.
Tip: Start with a small limit (10-20 results) to test your search query and estimate costs before running larger extractions.
Input
Reddit Keyword Search Scraper has the following input options. Click on the input tab for more information.
Input Parameters
- query (required): The search query to find matching Reddit posts
- subreddits (optional): Array of subreddit names to scope the search. If not provided, searches all of Reddit
- sort (optional): Sort order -
relevance,new,top, orcomments(default:relevance) - time (optional): Time period for sorting -
day,week,month,year, orall(default:day) - limit (optional): Maximum number of posts to return (1-1000, default: 50)
Example Input
{"query": "javascript tutorial","subreddits": ["learnprogramming", "webdev"],"sort": "relevance","time": "week","limit": 50}
Output
You can download the dataset extracted by Reddit Keyword Search Scraper in various formats such as JSON, HTML, CSV, or Excel.
Output Example
Each item in the dataset contains the following structure:
{"title": "Best JavaScript Tutorial for Beginners in 2024","text": "I've been learning JavaScript and found this amazing tutorial...","author": "jslearner123","score": 245,"commentCount": 32,"createdAt": "2024-01-15T10:30:00.000Z","permalink": "https://www.reddit.com/r/learnprogramming/comments/abc123/best_javascript_tutorial/","flair": "Tutorial","mediaUrls": ["https://i.redd.it/example.jpg"],"matchedKeywords": ["javascript", "tutorial"],"searchRank": 1}
Advanced Options and Tips
Optimizing Your Searches
- Use specific keywords: More specific queries yield better results
- Limit subreddits: Searching specific subreddits is faster and more targeted than searching all of Reddit
- Adjust sort order: Use "relevance" for best matches, "new" for recent posts, "top" for popular content
- Set reasonable limits: Start with smaller limits (10-50) to test, then scale up
Rate Limiting
The Actor includes built-in rate limiting and retry logic to avoid being blocked by Reddit. It automatically:
- Adds delays between requests
- Retries failed requests with exponential backoff
- Uses realistic browser headers to avoid detection
Keyword Matching
The Actor automatically extracts keywords from your search query and identifies which keywords were found in each post. This helps you understand why certain posts matched your search.
Is it legal to scrape Reddit?
Our scrapers are ethical and do not extract any private user data, such as email addresses, phone numbers, or private messages. They only extract what users have chosen to share publicly on Reddit. We therefore believe that our scrapers, when used for ethical purposes by Apify users, are safe.
However, you should be aware that your results could contain personal data. Personal data is protected by the GDPR in the European Union and by other regulations around the world. You should not scrape personal data unless you have a legitimate reason to do so. If you're unsure whether your reason is legitimate, consult your lawyers.
You can also read our blog post on the legality of web scraping.
Important: Always respect Reddit's Terms of Service and robots.txt. This Actor is designed for research and data collection purposes. Do not use scraped data for spam, harassment, or any illegal activities.
Reddit API vs Web Scraping
Reddit Keyword Search Scraper uses web scraping instead of Reddit's official API. Here's why:
| Web Scraping (This Actor) | Reddit Official API |
|---|---|
| ✅ No API key required | ❌ Requires OAuth authentication |
| ✅ Access to all public content | ⚠️ Limited by API rate limits |
| ✅ Flexible search across all subreddits | ⚠️ More complex API endpoints |
| ✅ No API quotas or restrictions | ⚠️ Subject to API rate limits |
| ✅ Works immediately | ⚠️ Requires app registration |
This Actor provides a simpler alternative to Reddit's API for users who need quick access to Reddit search functionality without the complexity of API authentication.
Troubleshooting
Common Issues
403 or 429 errors (Rate Limited or Blocked)
Reddit has aggressive anti-bot protection that may block automated access. If you encounter 403 errors:
- Use Apify Proxies: Configure Apify proxies in your Actor settings to rotate IP addresses
- Reduce request frequency:
- Lower the
limitparameter (try 10-20 results first) - Add longer delays between multiple runs
- Avoid running too many searches in quick succession
- Lower the
- Try different times: Run during off-peak hours when Reddit's servers are less busy
- Use old.reddit.com: The Actor automatically tries old.reddit.com first (less protected)
- Check your IP: Your IP may be temporarily blocked - wait a few hours before retrying
The Actor includes automatic retry logic with exponential backoff and will try both old.reddit.com and www.reddit.com.
No results returned
- Try a broader search query
- Check if the subreddit names are spelled correctly
- Verify that posts matching your query exist on Reddit
- Reddit may have blocked the request - see 403 error solutions above
Missing data fields
- Some posts may not have all fields (e.g., text posts vs link posts)
- The Actor handles missing data gracefully, using
nullfor unavailable fields - Old Reddit and New Reddit have different HTML structures - some fields may be unavailable
Rate limiting
- The Actor includes automatic retry logic with delays
- Consider reducing the number of results or searching fewer subreddits
- Add delays between multiple runs (5-10 seconds minimum)
- Use Apify proxies for better success rates
Getting Help
If you encounter issues or have questions:
- Check the Issues tab for known problems and solutions
- Review the Actor logs for detailed error messages
- Contact support through the Apify platform
- Report bugs or request features in the Issues tab
We're always open to feedback and improvements!
Use Cases
Market Research
Track discussions about your products, competitors, or industry trends across Reddit communities.
Content Marketing
Discover popular posts and trending discussions to inform your content strategy.
Sentiment Analysis
Monitor public opinion and sentiment around specific topics, brands, or products.
Data Collection
Gather Reddit data for research, academic studies, or machine learning projects.
Community Monitoring
Keep track of discussions in relevant subreddits for community management.
API Integration
Reddit Keyword Search Scraper can be integrated into your applications using Apify's REST API. This allows you to:
- Trigger searches programmatically
- Retrieve results via API calls
- Integrate with Zapier, Make, and other automation platforms
- Schedule automated searches
- Build custom workflows
See the API tab for detailed API documentation and code examples in Python, JavaScript, and other languages.
Related Actors
Looking for more Reddit scraping capabilities? Check out these related Actors:
- Reddit Scraper - Scrape posts, comments, and user profiles from Reddit
- Reddit Comments Scraper - Extract comments from Reddit posts
- Reddit User Scraper - Scrape user profiles and post history
Important: Reddit Anti-Bot Protection
Reddit has aggressive anti-bot protection that may block automated scraping attempts. If you encounter 403 errors:
Recommended Solutions
-
Use Apify Proxies (Highly Recommended)
- Configure Apify proxies in your Actor settings
- Proxies rotate IP addresses to avoid detection
- Significantly improves success rates
- Available in Apify Console under Actor settings
-
Optimize Your Scraping
- Start with small limits (10-20 results) to test
- Add delays between multiple runs (5-10 seconds minimum)
- Avoid running too many searches in quick succession
- Run during off-peak hours when Reddit's servers are less busy
-
Reduce Request Volume
- Lower the
limitparameter - Search fewer subreddits per run
- Spread searches over time instead of running all at once
- Lower the
The Actor automatically:
- Uses
old.reddit.comby default (less protected) - Includes retry logic with exponential backoff
- Mimics real browser behavior with Playwright
- Tries both old.reddit.com and www.reddit.com
However, using Apify proxies is the most effective solution for avoiding Reddit's blocking.
FAQ
How accurate is the keyword matching?
The Actor extracts keywords from your search query and matches them against post titles and text content. It uses case-insensitive matching to find all occurrences.
Can I search multiple subreddits at once?
Yes! Provide an array of subreddit names in the subreddits field. The Actor will search each subreddit sequentially.
What's the difference between "sort" options?
- relevance: Reddit's algorithm determines the most relevant posts
- new: Most recently posted content
- top: Highest upvoted posts
- comments: Posts with the most comments
How does pagination work?
The Actor automatically handles pagination to collect the requested number of results. It will fetch multiple pages if needed to reach your limit.
Can I get more than 1000 results?
The current limit is 1000 results per run. For larger datasets, you can run multiple searches with different queries or time periods.
Is my data stored securely?
Yes, all data is stored securely in Apify's cloud storage. You can download, export, or delete your datasets at any time.
Can I customize the Actor?
The Actor is open-source and can be customized for your specific needs. You can also contact us for custom development based on this Actor.
Why am I getting 403 errors?
Reddit has sophisticated anti-bot protection that detects automated access. This is normal and expected. To improve success rates:
- Use Apify proxies (most effective)
- Reduce the number of results per run
- Add longer delays between runs
- Try running during off-peak hours
The Actor includes automatic retry logic, but persistent 403 errors indicate Reddit is blocking your IP. Using proxies is the recommended solution.
Support
We're committed to providing the best scraping experience. If you need help:
- 📧 Contact us through the Apify platform
- 🐛 Report issues in the Issues tab
- 💡 Suggest features or improvements
- 📚 Check our documentation
- 🎓 Visit the Apify Academy
Ready to start scraping Reddit? Run the Actor now and get your first results in minutes!
