Sephora Reviews Spider avatar
Sephora Reviews Spider

Under maintenance

Pricing

$10.00 / 1,000 results

Go to Store
Sephora Reviews Spider

Sephora Reviews Spider

Under maintenance

Developed by

GetDataForMe

GetDataForMe

Maintained by Community

The Sephora Reviews Spider is an Apify Actor that scrapes detailed product reviews from Sephora. Input URLs to extract ratings, review text, product names, and user details like skin tone. Ideal for sentiment analysis, market research, and consumer insights with structured JSON output.

0.0 (0)

Pricing

$10.00 / 1,000 results

0

1

1

Last modified

a day ago

Apify Template for Scrapy Spiders

This repository serves as a template for deploying Scrapy spiders to Apify. It is automatically updated by a GitHub Actions workflow in the central repository (getdataforme/central_repo) when changes are pushed to spider files in src/spiders/ or src/custom/. Below is an overview of the automated tasks performed to keep this repository in sync.

Automated Tasks

The following tasks are executed by the GitHub Actions workflow when a spider file (e.g., src/spiders/example/example_parser_spider.py) is modified in the central repository:

  1. Repository Creation:

    • Creates a new Apify repository (e.g., example_apify) from this template (apify_template) using the GitHub API, if it doesn't already exist.
    • Grants push permissions to the scraping team in the getdataforme organization.
  2. Spider File Sync:

    • Copies the modified spider file (e.g., example_parser_spider.py) from the central repository to src/spiders/ in this repository.
    • Copies the associated requirements.txt (if present) from the spider's directory (e.g., src/spiders/example/) to the root of this repository.
  3. Input Schema Generation:

    • Runs generate_input_schema.py to create .actor/input_schema.json.
    • Parses the spider's __init__ method (e.g., def __init__(self, location:str, item_limit:int=100, county:str="Japan", *args, **kwargs)) to generate a JSON schema.
    • Supports types: string, integer, boolean, number (for Python str, int, bool, float).
    • Uses prefill for strings and default for non-strings, with appropriate editor values (textfield, number, checkbox).
    • Marks parameters without defaults (e.g., location) as required.
  4. Main Script Update:

    • Runs update_main.py to update src/main.py.
    • Updates the actor_input section to fetch input values matching the spider's __init__ parameters (e.g., location, item_limit, county).
    • Updates the process.crawl call to pass these parameters to the spider (e.g., process.crawl(Spider, location=location, item_limit=item_limit, county=county)).
    • Preserves existing settings, comments, and proxy configurations.
  5. Actor Configuration Update:

    • Updates .actor/actor.json to set the name field based on the repository name, removing the _apify suffix (e.g., example_apifyexample).
    • Uses jq to modify the JSON file while preserving other fields (e.g., title, description, input).
  6. Commit and Push:

    • Commits changes to src/spiders/$spider_file, requirements.txt, .actor/input_schema.json, src/main.py, and .actor/actor.json.
    • Pushes the changes to the main branch of this repository.

Repository Structure

  • src/spiders/: Contains the Scrapy spider file (e.g., example_parser_spider.py).
  • src/main.py: Main script to run the spider with Apify Actor integration.
  • .actor/input_schema.json: JSON schema defining the spider's input parameters.
  • .actor/actor.json: Actor configuration with the repository name and metadata.
  • requirements.txt: Python dependencies for the spider.
  • Dockerfile: Docker configuration for running the Apify Actor.

Prerequisites

  • The central repository (getdataforme/central_repo) must contain:
    • generate_input_schema.py and update_main.py in the root.
    • Spider files in src/spiders/ or src/custom/ with a valid __init__ method.
  • The GitHub Actions workflow requires a GITHUB_TOKEN with repository creation and write permissions.
  • jq and python3 are installed in the workflow environment.

Testing

To verify the automation:

  1. Push a change to a spider file in src/spiders/ or src/custom/ in the central repository.
  2. Check the generated Apify repository (e.g., getdataforme/example_apify) for:
    • Updated src/spiders/$spider_file.
    • Correct input_schema.json with parameters matching the spider's __init__.
    • Updated src/main.py with correct actor_input and process.crawl lines.
    • Updated .actor/actor.json with the correct name field.

Notes

Warning: This Apify actor repository is automatically generated and updated by the GitHub Actions workflow in getdataforme/central_repo. Do not edit this repository directly. To modify the spider, update the corresponding file in src/spiders/ or src/custom/ in the central repository, and the workflow will sync changes to this repository, including:

  • Copying the spider file to src/spiders/.
  • Generating .actor/input_schema.json based on the spider’s __init__ parameters.
  • Updating src/main.py with correct input handling and spider execution.
  • Setting the name field in .actor/actor.json (e.g., example for example_apify).

Verification: After the workflow completes, verify the actor by checking:

  • src/spiders/$spider_file matches the central repository.
  • .actor/input_schema.json includes all __init__ parameters with correct types and defaults.
  • src/main.py has updated actor_input and process.crawl lines.
  • .actor/actor.json has the correct name.
  • Optionally, deploy the actor to Apify and test with sample inputs to ensure functionality.
  • The workflow supports multiple spider types (scrapy, hrequest, playwright) based on the file path (src/spiders/, src/custom/*/hrequest/, src/custom/*/playwright/).
  • Commits with [apify] in the message update only Apify repositories; [internal] updates only internal repositories; otherwise, both are updated.
  • Ensure the spider's __init__ uses supported types (str, int, bool, float) to avoid schema generation errors.

For issues, check the GitHub Actions logs in the central repository or contact the scraping team.