Reddit Scraper
1 day trial then $45.00/month - No credit card required now
Reddit Scraper
1 day trial then $45.00/month - No credit card required now
Unlimited Reddit web scraper to crawl posts, comments, communities, and users without login. Limit web scraping by number of posts or items and extract all data in a dataset in multiple formats.
Im using
https://www.reddit.com/r/comfyui/search/?q=upscale&restrict_sr=1&sort=new&type=link
as a starting URL. But it does not work as intended. It tries to fetch communities links even though I disabled them.
Also in the documentation at the input examples, scraping search results is blank.
I just want to search for posts inside a subreddit with a specific keyword like "upscale"
Can you share the runID?
I made lots of runs. Hmm try this one: IEBcDY0fVBFxiFJsy
You can see "INFO 223 communities urls added to queue" . "DEBUG Type: search_communities"
There were 226 requests. And 224 results. Which were posts in this case.
Also, comparing this actor to a simple scraping (with intant data scraper extension) of the reddit search webui, with the same "new" sorting algorithm, I get 250 records. While this gave me 224. Any idea why this discrepancy?
Thank you for your support.
I am looking at it but have not found any major issues until now. I will try some things and let you know.
Hmm maybe I’m not using it right. Here are my general insights.
I have a 2-stage objective. The first one is to scrape justs POSTS with a specific keyword from a specific subreddit. (r/comfyui). Im giving this starting URL:
https://www.reddit.com/r/comfyui/search/?q=upscale&restrict_sr=1&sort=relevance&type=link
- I’ve noticed that it can scrape a max of ~250 records / posts per URL / KEYWORD. This is the same as using the reddit search and scrolling to the bottom.
- Now regarding what I don’t understand. If set the “limit of communities pages scraped” to 0, it won’t find any posts. From my understanding a community is a subreddit. Im not sure why “DEBUG Type: search_communities” is triggered since I don’t want to search for other communities as Im only targeting a single subreddit. Then it tries to “INFO 25 communities urls added to queue”.
- Im not correctly understanding the other LIMITS settings either. What does “limit of posts scraped inside a single page” do?
- So anyway the only way I got to output posts is to set all limits to 9999 and set the comments scraped and users pages scraped to 0.
The second objective is to scrape comments from the posts I got earlier. The starting URLs is a text file with all posts links.This works pretty good.
Im kinda confused by the LIMITS section. And if the start URL as a search query works as intended. Can you please clarify how the LIMITS section works? Or what would be the best settings for my posts scraping objective?
I have finally fixed the issue. You should be able to set the limit of communities to 0 now.
Actor Metrics
337 monthly users
-
63 stars
>99% runs succeeded
1.8 days response time
Created in Feb 2022
Modified 2 days ago