Trademe Properties Parser Spider
Pricing
from $9.00 / 1,000 results
Trademe Properties Parser Spider
Extract detailed rental property listings from TradeMe.co.nz effortlessly. This Apify Actor scrapes key data like prices, locations, amenities, and agent info, delivering structured JSON for analysis, monitoring, and integration....
Pricing
from $9.00 / 1,000 results
Rating
0.0
(0)
Developer

GetDataForMe
Actor stats
0
Bookmarked
2
Total users
2
Monthly active users
a day ago
Last modified
Categories
Share
Apify Template for Scrapy Spiders
This repository serves as a template for deploying Scrapy spiders to Apify. It is automatically updated by a GitHub Actions workflow in the central repository (getdataforme/central_repo) when changes are pushed to spider files in src/spiders/ or src/custom/. Below is an overview of the automated tasks performed to keep this repository in sync.
Automated Tasks
The following tasks are executed by the GitHub Actions workflow when a spider file (e.g., src/spiders/example/example_parser_spider.py) is modified in the central repository:
-
Repository Creation:
- Creates a new Apify repository (e.g.,
example_apify) from this template (apify_template) using the GitHub API, if it doesn't already exist. - Grants push permissions to the
scrapingteam in thegetdataformeorganization.
- Creates a new Apify repository (e.g.,
-
Spider File Sync:
- Copies the modified spider file (e.g.,
example_parser_spider.py) from the central repository tosrc/spiders/in this repository. - Copies the associated
requirements.txt(if present) from the spider's directory (e.g.,src/spiders/example/) to the root of this repository.
- Copies the modified spider file (e.g.,
-
Input Schema Generation:
- Runs
generate_input_schema.pyto create.actor/input_schema.json. - Parses the spider's
__init__method (e.g.,def __init__(self, location:str, item_limit:int=100, county:str="Japan", *args, **kwargs)) to generate a JSON schema. - Supports types:
string,integer,boolean,number(for Pythonstr,int,bool,float). - Uses
prefillfor strings anddefaultfor non-strings, with appropriateeditorvalues (textfield,number,checkbox). - Marks parameters without defaults (e.g.,
location) asrequired.
- Runs
-
Main Script Update:
- Runs
update_main.pyto updatesrc/main.py. - Updates the
actor_inputsection to fetch input values matching the spider's__init__parameters (e.g.,location,item_limit,county). - Updates the
process.crawlcall to pass these parameters to the spider (e.g.,process.crawl(Spider, location=location, item_limit=item_limit, county=county)). - Preserves existing settings, comments, and proxy configurations.
- Runs
-
Actor Configuration Update:
- Updates
.actor/actor.jsonto set thenamefield based on the repository name, removing the_apifysuffix (e.g.,example_apify→example). - Uses
jqto modify the JSON file while preserving other fields (e.g.,title,description,input).
- Updates
-
Commit and Push:
- Commits changes to
src/spiders/$spider_file,requirements.txt,.actor/input_schema.json,src/main.py, and.actor/actor.json. - Pushes the changes to the
mainbranch of this repository.
- Commits changes to
Repository Structure
src/spiders/: Contains the Scrapy spider file (e.g.,example_parser_spider.py).src/main.py: Main script to run the spider with Apify Actor integration..actor/input_schema.json: JSON schema defining the spider's input parameters..actor/actor.json: Actor configuration with the repository name and metadata.requirements.txt: Python dependencies for the spider.Dockerfile: Docker configuration for running the Apify Actor.
Prerequisites
- The central repository (
getdataforme/central_repo) must contain:generate_input_schema.pyandupdate_main.pyin the root.- Spider files in
src/spiders/orsrc/custom/with a valid__init__method.
- The GitHub Actions workflow requires a
GITHUB_TOKENwith repository creation and write permissions. jqandpython3are installed in the workflow environment.
Testing
To verify the automation:
- Push a change to a spider file in
src/spiders/orsrc/custom/in the central repository. - Check the generated Apify repository (e.g.,
getdataforme/example_apify) for:- Updated
src/spiders/$spider_file. - Correct
input_schema.jsonwith parameters matching the spider's__init__. - Updated
src/main.pywith correctactor_inputandprocess.crawllines. - Updated
.actor/actor.jsonwith the correctnamefield.
- Updated
Notes
Warning: This Apify actor repository is automatically generated and updated by the GitHub Actions workflow in
getdataforme/central_repo. Do not edit this repository directly. To modify the spider, update the corresponding file insrc/spiders/orsrc/custom/in the central repository, and the workflow will sync changes to this repository, including:
- Copying the spider file to
src/spiders/.- Generating
.actor/input_schema.jsonbased on the spider’s__init__parameters.- Updating
src/main.pywith correct input handling and spider execution.- Setting the
namefield in.actor/actor.json(e.g.,exampleforexample_apify).Verification: After the workflow completes, verify the actor by checking:
src/spiders/$spider_filematches the central repository..actor/input_schema.jsonincludes all__init__parameters with correct types and defaults.src/main.pyhas updatedactor_inputandprocess.crawllines..actor/actor.jsonhas the correctname.- Optionally, deploy the actor to Apify and test with sample inputs to ensure functionality.
- The workflow supports multiple spider types (
scrapy,hrequest,playwright) based on the file path (src/spiders/,src/custom/*/hrequest/,src/custom/*/playwright/). - Commits with
[apify]in the message update only Apify repositories;[internal]updates only internal repositories; otherwise, both are updated. - Ensure the spider's
__init__uses supported types (str,int,bool,float) to avoid schema generation errors.
For issues, check the GitHub Actions logs in the central repository or contact the scraping team.