Reddit Scraper avatar

Reddit Scraper

Try for free

1 day trial then $45.00/month - No credit card required now

Go to Store
Reddit Scraper

Reddit Scraper

trudax/reddit-scraper
Try for free

1 day trial then $45.00/month - No credit card required now

Unlimited Reddit web scraper to crawl posts, comments, communities, and users without login. Limit web scraping by number of posts or items and extract all data in a dataset in multiple formats.

JU

Crawling stops after 5 months

Open
julianwlsn opened this issue
7 days ago

Hi there,

Just tried this and it works great. I am trying to scrape this entire subreddit but when I ran it, it got posts up until 5 months and then just finished but there are clearly more posts. I want to scrape the entier thing. Any idea what happened?

Thanks

JU

julianwlsn

7 days ago

Actually tried this again and it stopped at the exact same number of requests (1044). Is there something wrong with my configuration or maybe it's a proxy thing?

Finished! Total 1044 requests: 1044 succeeded, 0 failed.

OL

aol48u3sx6

6 days ago

Hello! I am experiencing a similar issue. I couldn't find the reference to the problem on the information page of the scrapper. Run id: f5Fw1I2NvLUmYuw5P The input I have used:

1{
2  "debugMode": false,
3  "includeNSFW": true,
4  "maxComments": 10000000,
5  "maxCommunitiesCount": 10000000,
6  "maxItems": 100000,
7  "maxPostCount": 10000000,
8  "maxUserCount": 10000000,
9  "proxy": {
10    "useApifyProxy": true,
11    "apifyProxyGroups": [
12      "RESIDENTIAL"
13    ]
14  },
15  "scrollTimeout": 10000000,
16  "searchComments": false,
17  "searchCommunities": false,
18  "searchPosts": true,
19  "searchUsers": false,
20  "skipComments": false,
21  "skipCommunity": false,
22  "skipUserPosts": false,
23  "sort": "new",
24  "startUrls": [
25    {
26      "url": "https://www.reddit.com/r/Sauna/",
27      "method": "GET"
28    }
29  ]
30}
OL

aol48u3sx6

5 days ago

It seems it is stopping at 1st of January 2025, at least in my case. Explicitly changing date offset doing nothing

trudax avatar

Reddit doesn't keep the whole subreddit public. Usually, it is limited to around 1000 posts. The actor can scrape all posts that are on the website, if you see any posts on the site that the actor is not able to get please share it with me.

OL

aol48u3sx6

5 days ago

Seems like then it is required some sophisticated discovery pipeline with identification of users and look for links to discover by past user posts. Just wondering, are there options to expand functionality to do something like that automatically through the actor?

Developer
Maintained by Community

Actor Metrics

  • 360 monthly users

  • 82 bookmarks

  • >99% runs succeeded

  • 4.4 days response time

  • Created in Feb 2022

  • Modified 2 days ago

Categories