Ebay Italian Seller Info Scraper avatar
Ebay Italian Seller Info Scraper

Pricing

$40.00/month + usage

Go to Apify Store
Ebay Italian Seller Info Scraper

Ebay Italian Seller Info Scraper

scrape the flower seed seller professional email

Pricing

$40.00/month + usage

Rating

0.0

(0)

Developer

GetDataForMe

GetDataForMe

Maintained by Community

Actor stats

0

Bookmarked

3

Total users

1

Monthly active users

6 days ago

Last modified

Categories

Share

Apify Template for Scrapy Spiders

This repository serves as a template for deploying Scrapy spiders to Apify. It is automatically updated by a GitHub Actions workflow in the central repository (getdataforme/central_repo) when changes are pushed to spider files in src/spiders/ or src/custom/. Below is an overview of the automated tasks performed to keep this repository in sync.

Automated Tasks

The following tasks are executed by the GitHub Actions workflow when a spider file (e.g., src/spiders/example/example_parser_spider.py) is modified in the central repository:

  1. Repository Creation:

    • Creates a new Apify repository (e.g., example_apify) from this template (apify_template) using the GitHub API, if it doesn't already exist.
    • Grants push permissions to the scraping team in the getdataforme organization.
  2. Spider File Sync:

    • Copies the modified spider file (e.g., example_parser_spider.py) from the central repository to src/spiders/ in this repository.
    • Copies the associated requirements.txt (if present) from the spider's directory (e.g., src/spiders/example/) to the root of this repository.
  3. Input Schema Generation:

    • Runs generate_input_schema.py to create .actor/input_schema.json.
    • Parses the spider's __init__ method (e.g., def __init__(self, location:str, item_limit:int=100, county:str="Japan", *args, **kwargs)) to generate a JSON schema.
    • Supports types: string, integer, boolean, number (for Python str, int, bool, float).
    • Uses prefill for strings and default for non-strings, with appropriate editor values (textfield, number, checkbox).
    • Marks parameters without defaults (e.g., location) as required.
  4. Main Script Update:

    • Runs update_main.py to update src/main.py.
    • Updates the actor_input section to fetch input values matching the spider's __init__ parameters (e.g., location, item_limit, county).
    • Updates the process.crawl call to pass these parameters to the spider (e.g., process.crawl(Spider, location=location, item_limit=item_limit, county=county)).
    • Preserves existing settings, comments, and proxy configurations.
  5. Actor Configuration Update:

    • Updates .actor/actor.json to set the name field based on the repository name, removing the _apify suffix (e.g., example_apifyexample).
    • Uses jq to modify the JSON file while preserving other fields (e.g., title, description, input).
  6. Commit and Push:

    • Commits changes to src/spiders/$spider_file, requirements.txt, .actor/input_schema.json, src/main.py, and .actor/actor.json.
    • Pushes the changes to the main branch of this repository.

Repository Structure

  • src/spiders/: Contains the Scrapy spider file (e.g., example_parser_spider.py).
  • src/main.py: Main script to run the spider with Apify Actor integration.
  • .actor/input_schema.json: JSON schema defining the spider's input parameters.
  • .actor/actor.json: Actor configuration with the repository name and metadata.
  • requirements.txt: Python dependencies for the spider.
  • Dockerfile: Docker configuration for running the Apify Actor.

Prerequisites

  • The central repository (getdataforme/central_repo) must contain:
    • generate_input_schema.py and update_main.py in the root.
    • Spider files in src/spiders/ or src/custom/ with a valid __init__ method.
  • The GitHub Actions workflow requires a GITHUB_TOKEN with repository creation and write permissions.
  • jq and python3 are installed in the workflow environment.

Testing

To verify the automation:

  1. Push a change to a spider file in src/spiders/ or src/custom/ in the central repository.
  2. Check the generated Apify repository (e.g., getdataforme/example_apify) for:
    • Updated src/spiders/$spider_file.
    • Correct input_schema.json with parameters matching the spider's __init__.
    • Updated src/main.py with correct actor_input and process.crawl lines.
    • Updated .actor/actor.json with the correct name field.

Notes

Warning: This Apify actor repository is automatically generated and updated by the GitHub Actions workflow in getdataforme/central_repo. Do not edit this repository directly. To modify the spider, update the corresponding file in src/spiders/ or src/custom/ in the central repository, and the workflow will sync changes to this repository, including:

  • Copying the spider file to src/spiders/.
  • Generating .actor/input_schema.json based on the spider’s __init__ parameters.
  • Updating src/main.py with correct input handling and spider execution.
  • Setting the name field in .actor/actor.json (e.g., example for example_apify).

Verification: After the workflow completes, verify the actor by checking:

  • src/spiders/$spider_file matches the central repository.
  • .actor/input_schema.json includes all __init__ parameters with correct types and defaults.
  • src/main.py has updated actor_input and process.crawl lines.
  • .actor/actor.json has the correct name.
  • Optionally, deploy the actor to Apify and test with sample inputs to ensure functionality.
  • The workflow supports multiple spider types (scrapy, hrequest, playwright) based on the file path (src/spiders/, src/custom/*/hrequest/, src/custom/*/playwright/).
  • Commits with [apify] in the message update only Apify repositories; [internal] updates only internal repositories; otherwise, both are updated.
  • Ensure the spider's __init__ uses supported types (str, int, bool, float) to avoid schema generation errors.

For issues, check the GitHub Actions logs in the central repository or contact the scraping team.