Ebay Italian Seller Info Scraper
Pricing
$40.00/month + usage
Pricing
$40.00/month + usage
Rating
0.0
(0)
Developer

GetDataForMe
Actor stats
0
Bookmarked
3
Total users
1
Monthly active users
6 days ago
Last modified
Categories
Share
Apify Template for Scrapy Spiders
This repository serves as a template for deploying Scrapy spiders to Apify. It is automatically updated by a GitHub Actions workflow in the central repository (getdataforme/central_repo) when changes are pushed to spider files in src/spiders/ or src/custom/. Below is an overview of the automated tasks performed to keep this repository in sync.
Automated Tasks
The following tasks are executed by the GitHub Actions workflow when a spider file (e.g., src/spiders/example/example_parser_spider.py) is modified in the central repository:
-
Repository Creation:
- Creates a new Apify repository (e.g.,
example_apify) from this template (apify_template) using the GitHub API, if it doesn't already exist. - Grants push permissions to the
scrapingteam in thegetdataformeorganization.
- Creates a new Apify repository (e.g.,
-
Spider File Sync:
- Copies the modified spider file (e.g.,
example_parser_spider.py) from the central repository tosrc/spiders/in this repository. - Copies the associated
requirements.txt(if present) from the spider's directory (e.g.,src/spiders/example/) to the root of this repository.
- Copies the modified spider file (e.g.,
-
Input Schema Generation:
- Runs
generate_input_schema.pyto create.actor/input_schema.json. - Parses the spider's
__init__method (e.g.,def __init__(self, location:str, item_limit:int=100, county:str="Japan", *args, **kwargs)) to generate a JSON schema. - Supports types:
string,integer,boolean,number(for Pythonstr,int,bool,float). - Uses
prefillfor strings anddefaultfor non-strings, with appropriateeditorvalues (textfield,number,checkbox). - Marks parameters without defaults (e.g.,
location) asrequired.
- Runs
-
Main Script Update:
- Runs
update_main.pyto updatesrc/main.py. - Updates the
actor_inputsection to fetch input values matching the spider's__init__parameters (e.g.,location,item_limit,county). - Updates the
process.crawlcall to pass these parameters to the spider (e.g.,process.crawl(Spider, location=location, item_limit=item_limit, county=county)). - Preserves existing settings, comments, and proxy configurations.
- Runs
-
Actor Configuration Update:
- Updates
.actor/actor.jsonto set thenamefield based on the repository name, removing the_apifysuffix (e.g.,example_apify→example). - Uses
jqto modify the JSON file while preserving other fields (e.g.,title,description,input).
- Updates
-
Commit and Push:
- Commits changes to
src/spiders/$spider_file,requirements.txt,.actor/input_schema.json,src/main.py, and.actor/actor.json. - Pushes the changes to the
mainbranch of this repository.
- Commits changes to
Repository Structure
src/spiders/: Contains the Scrapy spider file (e.g.,example_parser_spider.py).src/main.py: Main script to run the spider with Apify Actor integration..actor/input_schema.json: JSON schema defining the spider's input parameters..actor/actor.json: Actor configuration with the repository name and metadata.requirements.txt: Python dependencies for the spider.Dockerfile: Docker configuration for running the Apify Actor.
Prerequisites
- The central repository (
getdataforme/central_repo) must contain:generate_input_schema.pyandupdate_main.pyin the root.- Spider files in
src/spiders/orsrc/custom/with a valid__init__method.
- The GitHub Actions workflow requires a
GITHUB_TOKENwith repository creation and write permissions. jqandpython3are installed in the workflow environment.
Testing
To verify the automation:
- Push a change to a spider file in
src/spiders/orsrc/custom/in the central repository. - Check the generated Apify repository (e.g.,
getdataforme/example_apify) for:- Updated
src/spiders/$spider_file. - Correct
input_schema.jsonwith parameters matching the spider's__init__. - Updated
src/main.pywith correctactor_inputandprocess.crawllines. - Updated
.actor/actor.jsonwith the correctnamefield.
- Updated
Notes
Warning: This Apify actor repository is automatically generated and updated by the GitHub Actions workflow in
getdataforme/central_repo. Do not edit this repository directly. To modify the spider, update the corresponding file insrc/spiders/orsrc/custom/in the central repository, and the workflow will sync changes to this repository, including:
- Copying the spider file to
src/spiders/.- Generating
.actor/input_schema.jsonbased on the spider’s__init__parameters.- Updating
src/main.pywith correct input handling and spider execution.- Setting the
namefield in.actor/actor.json(e.g.,exampleforexample_apify).Verification: After the workflow completes, verify the actor by checking:
src/spiders/$spider_filematches the central repository..actor/input_schema.jsonincludes all__init__parameters with correct types and defaults.src/main.pyhas updatedactor_inputandprocess.crawllines..actor/actor.jsonhas the correctname.- Optionally, deploy the actor to Apify and test with sample inputs to ensure functionality.
- The workflow supports multiple spider types (
scrapy,hrequest,playwright) based on the file path (src/spiders/,src/custom/*/hrequest/,src/custom/*/playwright/). - Commits with
[apify]in the message update only Apify repositories;[internal]updates only internal repositories; otherwise, both are updated. - Ensure the spider's
__init__uses supported types (str,int,bool,float) to avoid schema generation errors.
For issues, check the GitHub Actions logs in the central repository or contact the scraping team.