
Sephora Reviews Spider
Under maintenance
Pricing
$10.00 / 1,000 results

Sephora Reviews Spider
Under maintenance
The Sephora Reviews Spider is an Apify Actor that scrapes detailed product reviews from Sephora. Input URLs to extract ratings, review text, product names, and user details like skin tone. Ideal for sentiment analysis, market research, and consumer insights with structured JSON output.
0.0 (0)
Pricing
$10.00 / 1,000 results
0
1
1
Last modified
a day ago
Apify Template for Scrapy Spiders
This repository serves as a template for deploying Scrapy spiders to Apify. It is automatically updated by a GitHub Actions workflow in the central repository (getdataforme/central_repo
) when changes are pushed to spider files in src/spiders/
or src/custom/
. Below is an overview of the automated tasks performed to keep this repository in sync.
Automated Tasks
The following tasks are executed by the GitHub Actions workflow when a spider file (e.g., src/spiders/example/example_parser_spider.py
) is modified in the central repository:
-
Repository Creation:
- Creates a new Apify repository (e.g.,
example_apify
) from this template (apify_template
) using the GitHub API, if it doesn't already exist. - Grants push permissions to the
scraping
team in thegetdataforme
organization.
- Creates a new Apify repository (e.g.,
-
Spider File Sync:
- Copies the modified spider file (e.g.,
example_parser_spider.py
) from the central repository tosrc/spiders/
in this repository. - Copies the associated
requirements.txt
(if present) from the spider's directory (e.g.,src/spiders/example/
) to the root of this repository.
- Copies the modified spider file (e.g.,
-
Input Schema Generation:
- Runs
generate_input_schema.py
to create.actor/input_schema.json
. - Parses the spider's
__init__
method (e.g.,def __init__(self, location:str, item_limit:int=100, county:str="Japan", *args, **kwargs)
) to generate a JSON schema. - Supports types:
string
,integer
,boolean
,number
(for Pythonstr
,int
,bool
,float
). - Uses
prefill
for strings anddefault
for non-strings, with appropriateeditor
values (textfield
,number
,checkbox
). - Marks parameters without defaults (e.g.,
location
) asrequired
.
- Runs
-
Main Script Update:
- Runs
update_main.py
to updatesrc/main.py
. - Updates the
actor_input
section to fetch input values matching the spider's__init__
parameters (e.g.,location
,item_limit
,county
). - Updates the
process.crawl
call to pass these parameters to the spider (e.g.,process.crawl(Spider, location=location, item_limit=item_limit, county=county)
). - Preserves existing settings, comments, and proxy configurations.
- Runs
-
Actor Configuration Update:
- Updates
.actor/actor.json
to set thename
field based on the repository name, removing the_apify
suffix (e.g.,example_apify
→example
). - Uses
jq
to modify the JSON file while preserving other fields (e.g.,title
,description
,input
).
- Updates
-
Commit and Push:
- Commits changes to
src/spiders/$spider_file
,requirements.txt
,.actor/input_schema.json
,src/main.py
, and.actor/actor.json
. - Pushes the changes to the
main
branch of this repository.
- Commits changes to
Repository Structure
src/spiders/
: Contains the Scrapy spider file (e.g.,example_parser_spider.py
).src/main.py
: Main script to run the spider with Apify Actor integration..actor/input_schema.json
: JSON schema defining the spider's input parameters..actor/actor.json
: Actor configuration with the repository name and metadata.requirements.txt
: Python dependencies for the spider.Dockerfile
: Docker configuration for running the Apify Actor.
Prerequisites
- The central repository (
getdataforme/central_repo
) must contain:generate_input_schema.py
andupdate_main.py
in the root.- Spider files in
src/spiders/
orsrc/custom/
with a valid__init__
method.
- The GitHub Actions workflow requires a
GITHUB_TOKEN
with repository creation and write permissions. jq
andpython3
are installed in the workflow environment.
Testing
To verify the automation:
- Push a change to a spider file in
src/spiders/
orsrc/custom/
in the central repository. - Check the generated Apify repository (e.g.,
getdataforme/example_apify
) for:- Updated
src/spiders/$spider_file
. - Correct
input_schema.json
with parameters matching the spider's__init__
. - Updated
src/main.py
with correctactor_input
andprocess.crawl
lines. - Updated
.actor/actor.json
with the correctname
field.
- Updated
Notes
Warning: This Apify actor repository is automatically generated and updated by the GitHub Actions workflow in
getdataforme/central_repo
. Do not edit this repository directly. To modify the spider, update the corresponding file insrc/spiders/
orsrc/custom/
in the central repository, and the workflow will sync changes to this repository, including:
- Copying the spider file to
src/spiders/
.- Generating
.actor/input_schema.json
based on the spider’s__init__
parameters.- Updating
src/main.py
with correct input handling and spider execution.- Setting the
name
field in.actor/actor.json
(e.g.,example
forexample_apify
).Verification: After the workflow completes, verify the actor by checking:
src/spiders/$spider_file
matches the central repository..actor/input_schema.json
includes all__init__
parameters with correct types and defaults.src/main.py
has updatedactor_input
andprocess.crawl
lines..actor/actor.json
has the correctname
.- Optionally, deploy the actor to Apify and test with sample inputs to ensure functionality.
- The workflow supports multiple spider types (
scrapy
,hrequest
,playwright
) based on the file path (src/spiders/
,src/custom/*/hrequest/
,src/custom/*/playwright/
). - Commits with
[apify]
in the message update only Apify repositories;[internal]
updates only internal repositories; otherwise, both are updated. - Ensure the spider's
__init__
uses supported types (str
,int
,bool
,float
) to avoid schema generation errors.
For issues, check the GitHub Actions logs in the central repository or contact the scraping
team.
On this page
Share Actor: