Reddit Scraper Lite avatar
Reddit Scraper Lite

Pricing

Pay per event

Go to Store
Reddit Scraper Lite

Reddit Scraper Lite

Developed by

Gustavo Rudiger

Gustavo Rudiger

Maintained by Community

Pay Per Result, unlimited Reddit web scraper to crawl posts, comments, communities, and users without login. Limit web scraping by number of posts or items and extract all data in a dataset in multiple formats.

3.9 (3)

Pricing

Pay per event

182

Total users

7.1K

Monthly users

890

Runs succeeded

90%

Issues response

9.8 hours

Last modified

8 days ago

R1

Reddit scraper isn't working

Closed

researcher_1999 opened this issue
2 years ago

The paid version of the Reddit scraper isn't working. The free version works, though.

trudax avatar

I don't see any issues with the paid version. Can you give me more details about the error you are getting?

R1

researcher_1999

2 years ago

Sure! Here are the errors, I can copy the whole log if you need it: (This happens with every sub, every URL including comments and users, so I'm not sure why)

2023-05-25T02:48:05.119Z WARN PuppeteerCrawler: Reclaiming failed request back to the list or queue. SyntaxError: Unexpected token o in JSON at position 1 2023-05-25T02:48:05.121Z at JSON.parse () 2023-05-25T02:48:05.123Z at communityJSONParser (file:///home/myuser/src/parsers/communityJSONParser.js:9:25) 2023-05-25T02:48:05.125Z at processTicksAndRejections (node:internal/process/task_queues:96:5) 2023-05-25T02:48:05.127Z at async handleCommunityInfo (file:///home/myuser/src/routes.js:22:21) 2023-05-25T02:48:05.129Z at async wrap (/home/myuser/node_modules/@apify/timeout/index.js:52:21) {"id":"g96PFO4z1RQhP4p","url":"https://www.reddit.com/r/columbinekillers.json","retryCount":1} 2023-05-25T02:48:16.364Z ERROR PuppeteerCrawler: Request failed and reached maximum retries. SyntaxError: Unexpected token o in JSON at position 1 2023-05-25T02:48:16.367Z at JSON.parse () 2023-05-25T02:48:16.369Z at communityJSONParser (file:///home/myuser/src/parsers/communityJSONParser.js:9:25) 2023-05-25T02:48:16.371Z at processTicksAndRejections (node:internal/process/task_queues:96:5) 2023-05-25T02:48:16.373Z at async handleCommunityInfo (file:///home/myuser/src/routes.js:22:21) 2023-05-25T02:48:16.375Z at async wrap (/home/myuser/node_modules/@apify/timeout/ind... [trimmed]

R1

researcher_1999

2 years ago

Now the free version isn't working, either, and when I just checked it says it's under maintenance. Reddit did recently revoke API access from pushshift, so I don't know if you are using their API and they may have revoked access.

R1

researcher_1999

2 years ago

This is the error that comes up now on the paid version: 2023-05-25T05:50:51.626Z ERROR This route is under maintenance, please use the previous actor version for now

trudax avatar

I have fixed the issue. You should be able to scrape communities again now.

R1

researcher_1999

2 years ago

Thank you, you rock! I will test it out shortly :)

R1

researcher_1999

2 years ago

I'm still getting this error on the paid Reddit Scraper: 2023-05-25T20:29:21.366Z ERROR This route is under maintenance, please use the previous actor version for now

I got the above error when scraping a user account for comments and this error just now for a community: 2023-05-25T20:32:47.425Z WARN PuppeteerCrawler: Reclaiming failed request back to the list or queue. Navigation timeout of 30000 ms exceeded

trudax avatar

Can you share the run ID with me so I can take a closer look? Just open the run and copy the ID from it like in the image:

R1

researcher_1999

2 years ago

Sure! Here is one run (the sub) HAXeBbcUPmKUn3SA0

And here is the second one (the comments) 9XlBVZi6b45JMpUe0

trudax avatar

Had a small bug in there. I tested both runs with the new fix and now is working with no problems. Give it a try and let me know.

R1

researcher_1999

2 years ago

Right on, it works to scrape communities! Now the only issue is that it doesn't work for users, for this URL structure: https://www.reddit.com/user/username/comments/ https://www.reddit.com/user/username/submitted/

I'm not sure if I'm just not setting limits correctly, but for subs, I can't seem to get results past 1 year. I set all the limits to 5000000 in case that was the issue and still, it cuts off at 1 year for subs that have posts dating back 4-7 years. But it is working to scrape content from the last year at least.

trudax avatar

I will probably need to add those url structures to the scraper. Let me check that.

R1

researcher_1999

2 years ago

Sweet! This tool is really amazing, and I have a feeling you'll be getting a lot more people using it soon. :)

R1

researcher_1999

2 years ago

I don't know if you thought about adding this as a feature, but it would be really useful if we could search for keywords in specific communities. Currently, the search can only be applied to Reddit as a whole. Just a thought!

trudax avatar

You can go to the Reddit community page, search for what you want, and then copy the URL and use it as starting point on the Reddit scrapper.

R1

researcher_1999

2 years ago

Unfortunately, that isn't the same, even with search parameters, Reddit doesn't return proper results. I was using Camas, but they revoked the pushshift API so a lot of us are trying to scrape full subs and search for specific posts and when you search on Reddit you don't get all the results, compared to when you use a tool, unfortunately. :(

trudax avatar

Just an updated, I am testing the user URLs so they should be added to the scraper soon.