Reddit Scraper avatar
Reddit Scraper

Pricing

$45.00/month + usage

Go to Store
Reddit Scraper

Reddit Scraper

trudax/reddit-scraper

Developed by

Gustavo Rudiger

Maintained by Community

Unlimited Reddit web scraper to crawl posts, comments, communities, and users without login. Limit web scraping by number of posts or items and extract all data in a dataset in multiple formats.

3.9 (2)

Pricing

$45.00/month + usage

90

Monthly users

257

Runs succeeded

>99%

Response time

2.1 days

Last modified

19 days ago

DK

How to scrape search results inside a subreddit?

Closed
dkampien opened this issue
4 months ago

Im using

https://www.reddit.com/r/comfyui/search/?q=upscale&restrict_sr=1&sort=new&type=link

as a starting URL. But it does not work as intended. It tries to fetch communities links even though I disabled them.

Also in the documentation at the input examples, scraping search results is blank.

I just want to search for posts inside a subreddit with a specific keyword like "upscale"

trudax avatar

Can you share the runID?

DK

dkampien

4 months ago

I made lots of runs. Hmm try this one: IEBcDY0fVBFxiFJsy

You can see "INFO 223 communities urls added to queue" . "DEBUG Type: search_communities"

There were 226 requests. And 224 results. Which were posts in this case.

Also, comparing this actor to a simple scraping (with intant data scraper extension) of the reddit search webui, with the same "new" sorting algorithm, I get 250 records. While this gave me 224. Any idea why this discrepancy?

Thank you for your support.

trudax avatar

I am looking at it but have not found any major issues until now. I will try some things and let you know.

DK

dkampien

4 months ago

Hmm maybe I’m not using it right. Here are my general insights.

I have a 2-stage objective. The first one is to scrape justs POSTS with a specific keyword from a specific subreddit. (r/comfyui). Im giving this starting URL:

https://www.reddit.com/r/comfyui/search/?q=upscale&restrict_sr=1&sort=relevance&type=link

  • I’ve noticed that it can scrape a max of ~250 records / posts per URL / KEYWORD. This is the same as using the reddit search and scrolling to the bottom.
  • Now regarding what I don’t understand. If set the “limit of communities pages scraped” to 0, it won’t find any posts. From my understanding a community is a subreddit. Im not sure why “DEBUG Type: search_communities” is triggered since I don’t want to search for other communities as Im only targeting a single subreddit. Then it tries to “INFO 25 communities urls added to queue”.
  • Im not correctly understanding the other LIMITS settings either. What does “limit of posts scraped inside a single page” do?
  • So anyway the only way I got to output posts is to set all limits to 9999 and set the comments scraped and users pages scraped to 0.

The second objective is to scrape comments from the posts I got earlier. The starting URLs is a text file with all posts links.This works pretty good.

Im kinda confused by the LIMITS section. And if the start URL as a search query works as intended. Can you please clarify how the LIMITS section works? Or what would be the best settings for my posts scraping objective?

trudax avatar

I have finally fixed the issue. You should be able to set the limit of communities to 0 now.

DK

dkampien

4 months ago

Got it. Thank you for the good work.

Pricing

Pricing model

Rental 

To use this Actor, you have to pay a monthly rental fee to the developer. The rent is subtracted from your prepaid usage every month after the free trial period. You also pay for the Apify platform usage.

Free trial

1 day

Price

$45.00