Reddit Scraper avatar

Reddit Scraper

Try for free

1 day trial then $45.00/month - No credit card required now

Go to Store
Reddit Scraper

Reddit Scraper

trudax/reddit-scraper
Try for free

1 day trial then $45.00/month - No credit card required now

Unlimited Reddit web scraper to crawl posts, comments, communities, and users without login. Limit web scraping by number of posts or items and extract all data in a dataset in multiple formats.

DK

How to scrape search results inside a subreddit?

Open

dkampien opened this issue
11 days ago

Im using

https://www.reddit.com/r/comfyui/search/?q=upscale&restrict_sr=1&sort=new&type=link

as a starting URL. But it does not work as intended. It tries to fetch communities links even though I disabled them.

Also in the documentation at the input examples, scraping search results is blank.

I just want to search for posts inside a subreddit with a specific keyword like "upscale"

trudax avatar

Can you share the runID?

DK

dkampien

10 days ago

I made lots of runs. Hmm try this one: IEBcDY0fVBFxiFJsy

You can see "INFO 223 communities urls added to queue" . "DEBUG Type: search_communities"

There were 226 requests. And 224 results. Which were posts in this case.

Also, comparing this actor to a simple scraping (with intant data scraper extension) of the reddit search webui, with the same "new" sorting algorithm, I get 250 records. While this gave me 224. Any idea why this discrepancy?

Thank you for your support.

trudax avatar

I am looking at it but have not found any major issues until now. I will try some things and let you know.

DK

dkampien

7 days ago

Hmm maybe I’m not using it right. Here are my general insights.

I have a 2-stage objective. The first one is to scrape justs POSTS with a specific keyword from a specific subreddit. (r/comfyui). Im giving this starting URL:

https://www.reddit.com/r/comfyui/search/?q=upscale&restrict_sr=1&sort=relevance&type=link

  • I’ve noticed that it can scrape a max of ~250 records / posts per URL / KEYWORD. This is the same as using the reddit search and scrolling to the bottom.
  • Now regarding what I don’t understand. If set the “limit of communities pages scraped” to 0, it won’t find any posts. From my understanding a community is a subreddit. Im not sure why “DEBUG Type: search_communities” is triggered since I don’t want to search for other communities as Im only targeting a single subreddit. Then it tries to “INFO 25 communities urls added to queue”.
  • Im not correctly understanding the other LIMITS settings either. What does “limit of posts scraped inside a single page” do?
  • So anyway the only way I got to output posts is to set all limits to 9999 and set the comments scraped and users pages scraped to 0.

The second objective is to scrape comments from the posts I got earlier. The starting URLs is a text file with all posts links.This works pretty good.

Im kinda confused by the LIMITS section. And if the start URL as a search query works as intended. Can you please clarify how the LIMITS section works? Or what would be the best settings for my posts scraping objective?

trudax avatar

I have finally fixed the issue. You should be able to set the limit of communities to 0 now.

Developer
Maintained by Community

Actor Metrics

  • 337 monthly users

  • 63 stars

  • >99% runs succeeded

  • 1.8 days response time

  • Created in Feb 2022

  • Modified 2 days ago

Categories