Reddit Scraper avatar
Reddit Scraper
Try for free

1 day trial then $45.00/month - No credit card required now

View all Actors
Reddit Scraper

Reddit Scraper

trudax/reddit-scraper
Try for free

1 day trial then $45.00/month - No credit card required now

Unlimited Reddit web scraper to crawl posts, comments, communities, and users without login. Limit web scraping by number of posts or items and extract all data in a dataset in multiple formats.

HD

Irrelevant data despite using keyword search

Closed

highbrow_desert opened this issue
2 months ago

Run Id: "g0U8gX5voivV1y39v"

I wanted to scrape reddit data from the sub-reddit "r/birthcontrol" using 58 keywords. However, when I went through the outputs, I observed the following:

  1. The sub-reddit has more than 250K post + comments in the past 2.5 years but I am getting only 5.8K
  2. Even in the 5.8K post + comments, many of it did not contain the keywords that I had provided

Could you provide a solution for the same.

Also, had few additional questions

  1. Is there a possibility to scrape data for a particular time period i.e. last 3 years data
  2. What do I have to do if I want to scrape all the data available for a particular sub-reddit
  3. Is there a data limit per keyword?
  4. Is there a data limit per sub-reddit?
trudax avatar

When you use a startUrl the search parameters are ignored. I like what you are trying to do, I will update the actor to allow you to search using a startUrl as base. The navigation on Reddit does not allow you to filter by date so the only option is to scrape all that you can and filter it out after. Reddit also limits the number of posts you can get so it is difficult to scrape all data for a subreddit because even if you go page by page Reddit will not show you all of it.

Developer
Maintained by Community
Actor metrics
  • 366 monthly users
  • 37 stars
  • 99.9% runs succeeded
  • 1.2 days response time
  • Created in Feb 2022
  • Modified 9 days ago
Categories