Reddit Scraper avatar
Reddit Scraper
Try for free

1 day trial then $45.00/month - No credit card required now

View all Actors
Reddit Scraper

Reddit Scraper

trudax/reddit-scraper
Try for free

1 day trial then $45.00/month - No credit card required now

Unlimited Reddit web scraper to crawl posts, comments, communities, and users without login. Limit web scraping by number of posts or items and extract all data in a dataset in multiple formats.

LN

won't collect beyond a certain limit

Closed

lemon_normalcy opened this issue
2 months ago

I've tried two different runs now of trying to collect a significant portion of a subreddit's activity (2+ years), but the scraper stops after about 50,000 lines of data

runs: https://li2vbaesoevb.runs.apify.net and https://cargcpdn5cbn.runs.apify.net

one further detail or question - it'd be great to pick up where a previous run stopped. is there some way to use the previous pagination marker to start collecting where it left off?

trudax avatar

can you share the run ID?

LN

lemon_normalcy

2 months ago

I think they're both in the original post aren't they? Those 2 links?

trudax avatar

I don't think so, there is no usefull information when I click on those links. Ff you go to the run you should be able to see a share button on the right upper side that will provide you with the correct link.

LN

lemon_normalcy

2 months ago
LN

lemon_normalcy

2 months ago

what do you think?

trudax avatar

I will try to replicate the run here but at first glance, I don't see anything wrong besides some blocked request, but that is expected.

LN

lemon_normalcy

2 months ago

okay - what about picking up a new run where the old one left off though? currently there isn't an option to do that.

LN

lemon_normalcy

2 months ago

what do you think?

trudax avatar

I have got the same results, seems like it is returning everything that it can.

LN

lemon_normalcy

2 months ago

right - but my question is why we can't start the scraping at somewhere other than the beginning of the subreddit?

trudax avatar

You can, you just need to know the last URL used for paginating.

LN

lemon_normalcy

2 months ago

Okay. How do I implement that within this actor then?

trudax avatar

I will add a change to log the last page so you can copy from the logs and use it.

LN

lemon_normalcy

2 months ago

okay - where do I paste in the last URL used?

LN

lemon_normalcy

2 months ago

hello?

trudax avatar

the last usedr url is on the logs

Developer
Maintained by Community
Actor metrics
  • 365 monthly users
  • 37 stars
  • 99.9% runs succeeded
  • 1.2 days response time
  • Created in Feb 2022
  • Modified 9 days ago
Categories