Reddit Scraper avatar
Reddit Scraper
Try for free

1 day trial then $45.00/month - No credit card required now

View all Actors
Reddit Scraper

Reddit Scraper

trudax/reddit-scraper
Try for free

1 day trial then $45.00/month - No credit card required now

Unlimited Reddit web scraper to crawl posts, comments, communities, and users without login. Limit web scraping by number of posts or items and extract all data in a dataset in multiple formats.

R1

Full communities aren't being scraped

Closed

researcher_1999 opened this issue
a year ago

Even when I set all limits to 5000000, this doesn't scrape all the posts. It only goes back about a year and 2-3 months, even in communities that have posts going back 5-7 years. I tried changing the URL to the "top of all time" and that gets far more results, but still doesn't get the full community.

It also doesn't work for user comments or posts.

The community URL that won't go past about a year: https://www.reddit.com/r/subreddit/

Top of all time format: https://www.reddit.com/r/subreddit/top/?t=all

trudax avatar

Can you share a runID of this?

R1

researcher_1999

a year ago

Here are two runs I tested: DQWz17oMJy3HD0MUn dEr1rbSY1Vq8SvBkX

I get the same results no matter what the limits are set to, and the only reason these runs got so many results is because I used the "Top of all time" format. When scraping a community's normal URL, it stops at around 3,000 results and goes back a year maybe. If you can fix this to scrape a full community, I bet I can send a lot of people your way. Thousands of us are struggling since Reddit revoked their API usage from pushshift and there is no other scraper with a GUI :)

trudax avatar

My scraper only returns what the Reddit webpage provides. The scraper returned all results until the last page so Reddit can be limiting the data on their site. I will try to find an alternative but this could take some time.

R1

researcher_1999

a year ago

Oh I see! That makes sense, and is helpful to know. Thank you!

Developer
Maintained by Community
Actor metrics
  • 274 monthly users
  • 23 stars
  • 100.0% runs succeeded
  • 23 hours response time
  • Created in Feb 2022
  • Modified 16 days ago
Categories