Extract comprehensive data from InvestorLift marketplace properties including property details, pricing, location data, and account information. Intelligently fetches the complete list of available properties first, then processes them in parallel batches for maximum efficiency.
Property Detail API: When enriching, uses /api/properties/{id} first for structured JSON (account_id, account.title, condition, description, main_image). HTML parsing as fallback if API returns 403/404
Seller rating: wholesaler_rating and wholesaler_review_count from /api/account-stats/{accountId}/rating. Fetched when account_id is available (from API or page). Cached per account to avoid duplicate requests
Wholesaler from API: apiRowToItem now uses API columns (account_title, wholesaler_company, wholesaler_name, account_id) when the properties list API returns them
account_id: Output field for seller account ID. Extracted from Nuxt deal data, profile-photos URL, or API
enrichWithDetails in input schema: Checkbox visible in Apify Console. Required for wholesaler columns to be populated
Changed
Full details scope: Now enriches Historical and All modes (Phase 2), not just Active. Wholesaler, description, lot_size, condition populated for historical deals when enabled
fetchDealDetail: Tries Property Detail API first, fallback to HTML page fetch + Cheerio parsing
Fixed
Wholesaler columns empty: Root cause was (1) API columns ignored in parser, (2) Phase 2 never enriched. Both fixed
[1.3.0] - 2026-03-01
Added
Minimalist input UI: Step-based layout (What do you want? → Data quality → Proxy). Airbnb Pro Host style
Auto-detect range: Historical/All modes — start and end IDs derived from API. No manual Start/Stop IDs
256 workers for historical (default). Up to 384
Proxy sticky per worker: One proxy per worker for connection reuse
API timeout + retry: 30s timeout, 2 retries on API fetch
429 handling: Backoff on rate limit (5s, 10s) before retry
Changed
Puppeteer removed: Switched to HTTP (got-scraping) + Cheerio. Faster, cheaper
Checkpoint: Serialized writes (1 writer) to avoid Apify KVS 429. Every 2000 IDs
Full details: Clarified — only affects Active mode. Historical/Specific already fetch full pages
README: Aligned with minimalist UI, auto-detect range, resume steps
Input schema: Removed from UI — useMaxIdFromApi, historicalIdStart, historicalIdEnd, detailConcurrency, historicalConcurrency, excludeIds (still supported via API)
Fixed
Too many parallel set requests: Checkpoint writes serialized; single writer
Proxy pool init: Parallel newUrl() calls for faster startup
[1.2.0] - 2026-03-01
Added
CHANGELOG: Version history for the Store page
Modular architecture: Code split into api.js, parser.js, scraper.js, and utils.js for maintainability and testability
Changed
README: Rewritten in English with orias-scraper-style structure — input/output examples, use cases, local development, troubleshooting
Code organization: Main orchestration in main.js; API fetch logic in api.js; page parsing in parser.js; HTTP/browser scraping in scraper.js; date/checkpoint utilities in utils.js
Removed
Dead code: Unreachable HTML pagination branch (scrape mode is always active_only, historical, or date_range — all use API or historical loop)
Unused constants: MS_PER_DEAL, batchSize
Unused functions: fetchPage, extractDealLinks (legacy marketplace HTML pagination)
Unused input parsing: maxPages, startPage, currentPage for non-existent mode
[1.1.0] - 2026-02
Added
Enrichment mode: enrichWithDetails — visits each deal page with Puppeteer for description, condition, lot_size, wholesaler, img_url
API-first: Active and date_range modes use marketplace API for fast bulk fetch
Proxy-chain: Anonymized proxy for Puppeteer when using Apify residential proxy (fixes ERR_INVALID_AUTH)