Reddit Scraper
1 day trial then $45.00/month - No credit card required now
Reddit Scraper
1 day trial then $45.00/month - No credit card required now
Unlimited Reddit web scraper to crawl posts, comments, communities, and users without login. Limit web scraping by number of posts or items and extract all data in a dataset in multiple formats.
I've tried two different runs now of trying to collect a significant portion of a subreddit's activity (2+ years), but the scraper stops after about 50,000 lines of data
runs: https://li2vbaesoevb.runs.apify.net and https://cargcpdn5cbn.runs.apify.net
one further detail or question - it'd be great to pick up where a previous run stopped. is there some way to use the previous pagination marker to start collecting where it left off?
can you share the run ID?
I think they're both in the original post aren't they? Those 2 links?
I don't think so, there is no usefull information when I click on those links. Ff you go to the run you should be able to see a share button on the right upper side that will provide you with the correct link.
Sorry about that. Here you go:
https://console.apify.com/view/runs/dP9XSbh7cxCArL5c8 (first run)
https://console.apify.com/view/runs/GkjTkfGQ9cNnN9JpF (second run)
what do you think?
I will try to replicate the run here but at first glance, I don't see anything wrong besides some blocked request, but that is expected.
okay - what about picking up a new run where the old one left off though? currently there isn't an option to do that.
what do you think?
I have got the same results, seems like it is returning everything that it can.
right - but my question is why we can't start the scraping at somewhere other than the beginning of the subreddit?
You can, you just need to know the last URL used for paginating.
Okay. How do I implement that within this actor then?
I will add a change to log the last page so you can copy from the logs and use it.
okay - where do I paste in the last URL used?
hello?
the last usedr url is on the logs
Actor Metrics
324 monthly users
-
58 stars
>99% runs succeeded
1.3 days response time
Created in Feb 2022
Modified 11 days ago