Extract comprehensive data from InvestorLift marketplace properties including property details, pricing, location data, and account information. Intelligently fetches the complete list of available properties first, then processes them in parallel batches for maximum efficiency.
Minimalist input UI: Step-based layout (What do you want? → Data quality → Proxy). Airbnb Pro Host style
Auto-detect range: Historical/All modes — start and end IDs derived from API. No manual Start/Stop IDs
256 workers for historical (default). Up to 384
Proxy sticky per worker: One proxy per worker for connection reuse
API timeout + retry: 30s timeout, 2 retries on API fetch
429 handling: Backoff on rate limit (5s, 10s) before retry
Changed
Puppeteer removed: Switched to HTTP (got-scraping) + Cheerio. Faster, cheaper
Checkpoint: Serialized writes (1 writer) to avoid Apify KVS 429. Every 2000 IDs
Full details: Clarified — only affects Active mode. Historical/Specific already fetch full pages
README: Aligned with minimalist UI, auto-detect range, resume steps
Input schema: Removed from UI — useMaxIdFromApi, historicalIdStart, historicalIdEnd, detailConcurrency, historicalConcurrency, excludeIds (still supported via API)
Fixed
Too many parallel set requests: Checkpoint writes serialized; single writer
Proxy pool init: Parallel newUrl() calls for faster startup
[1.2.0] - 2026-03-01
Added
CHANGELOG: Version history for the Store page
Modular architecture: Code split into api.js, parser.js, scraper.js, and utils.js for maintainability and testability
Changed
README: Rewritten in English with orias-scraper-style structure — input/output examples, use cases, local development, troubleshooting
Code organization: Main orchestration in main.js; API fetch logic in api.js; page parsing in parser.js; HTTP/browser scraping in scraper.js; date/checkpoint utilities in utils.js
Removed
Dead code: Unreachable HTML pagination branch (scrape mode is always active_only, historical, or date_range — all use API or historical loop)
Unused constants: MS_PER_DEAL, batchSize
Unused functions: fetchPage, extractDealLinks (legacy marketplace HTML pagination)
Unused input parsing: maxPages, startPage, currentPage for non-existent mode
[1.1.0] - 2026-02
Added
Enrichment mode: enrichWithDetails — visits each deal page with Puppeteer for description, condition, lot_size, wholesaler, img_url
API-first: Active and date_range modes use marketplace API for fast bulk fetch
Proxy-chain: Anonymized proxy for Puppeteer when using Apify residential proxy (fixes ERR_INVALID_AUTH)