Redbubble Image Scraper From Keywords
This Actor is unavailable because the developer has decided to deprecate it. Would you like to try a similar Actor instead?
See alternative ActorsRedbubble Image Scraper From Keywords
Scrapes Redbubble exact image URLS from keywords and outputs them by page (based on popularity)
Redbubble Scraper
This project is an Actor for the Apify platform that crawls and extracts data from Redbubble based on specified search terms. It uses Puppeteer for web scraping and can store the extracted data in the Apify Dataset.
What do I need to use this?
- An Apify account
- A list of search keywords
Example Input
Search Terms
- A list of terms to search for on Redbubble
- For example:
["art", "poster", "t-shirt"]
Maximum Pages
- Maximum number of pages to scrape for each search term
- Default: 100
Maximum Retries
- Maximum number of retries for each request
- Default: 50
Proxy Configuration
- Select proxies to be used by your actor
- Default: Apify Proxy
Input Schema
The input schema for this Actor is defined in the .actor/input_schema.json
file. Here's a brief overview:
1{ 2 "title": "Redbubble Scraper Input", 3 "type": "object", 4 "schemaVersion": 1, 5 "properties": { 6 "searchList": { 7 "title": "Search Terms", 8 "type": "array", 9 "description": "List of terms to search for on Redbubble", 10 "editor": "stringList", 11 "prefill": ["art"], 12 "items": { 13 "type": "string" 14 } 15 }, 16 "maxPage": { 17 "title": "Maximum Pages", 18 "type": "integer", 19 "description": "Maximum number of items to scrape", 20 "minimum": 1, 21 "default": 100 22 }, 23 "maxRequestRetries": { 24 "title": "Maximum Retries", 25 "type": "integer", 26 "description": "Maximum number of retries for each request", 27 "minimum": 1, 28 "default": 50 29 }, 30 "proxyConfiguration": { 31 "title": "Proxy Configuration", 32 "type": "object", 33 "description": "Select proxies to be used by your actor", 34 "editor": "proxy", 35 "default": { "useApifyProxy": true }, 36 "sectionCaption": "Proxy", 37 "sectionDescription": "The actor will use Apify Proxy by default. You can customize the proxy settings or disable proxy usage altogether." 38 } 39 }, 40 "required": ["searchList"] 41}
How it works
- The Actor starts by getting the input parameters.
- It creates a proxy configuration to work around IP blocking.
- A PuppeteerCrawler instance is created with the specified options.
- For each search term, the crawler:
- Opens the Redbubble search page
- Extracts child URLs from the search results
- Pushes the extracted data to the Apify Dataset
- Enqueues the next page if the maximum page limit hasn't been reached
- The process continues until all search terms and pages have been processed.
Customization
- Concurrency: Adjust
maxConcurrency
in the PuppeteerCrawler options to control how many pages are processed in parallel. - Proxy: Modify the
proxyConfiguration
to use your own proxies or Apify's proxy service. - Error Handling: The Actor includes custom error handling for proxy-related issues and general request failures.
Tips for Effective Use
- Start with a small number of search terms to test the Actor's performance.
- Use specific search terms to get more targeted results.
- Regularly check Redbubble's robots.txt and terms of service to ensure compliance.
- Monitor your Apify storage usage to ensure you have enough capacity for the extracted data.
Deployment
You can deploy this Actor to the Apify platform using the following steps:
-
Log in to your Apify account:
apify login
-
Deploy your Actor:
apify push
This will deploy and build the Actor on the Apify Platform. You can find your newly created Actor under Actors -> My Actors.
Resources
For more information on developing with Apify and Crawlee, check out these resources:
By following these steps and customizing the input as needed, you can easily use this Actor to extract data from Redbubble based on your specific search terms.