All notable changes to this Actor are documented here.
- 403 retry regression:
client.py RETRY_STATUSES was missing 403,
causing Republic and StartEngine to fail immediately on any 403 response
with a RuntimeError("non-retryable") instead of retrying. Root cause of
~36% run failures observed in 30-day stats.
- Proxy architecture: restored per-attempt
AsyncSession creation with
IP rotation via ProxyConfig callback pattern. The previous architecture
created a single shared AsyncSession at startup (fixed proxy URL), which
meant all retry attempts within _get_with_retry used the same exit IP —
defeating the block-rotation mechanism entirely.
- Proxy group fallback:
main.py now tries RESIDENTIAL first, then
falls back to BUYPROXIES94952 (5 provisioned USA IPs). On FREE plan,
RESIDENTIAL has availableCount=0; without this fallback the actor ran
with direct datacenter routing which all three targets block.
- Initial release: three equity-crowdfunding sources (Wefunder, Republic, StartEngine) unified into one schema.
- Wefunder primary path via
/-/companies/explore JSON API — full founder + tagline + raise-progress + valuation payload per row.
- Republic secondary path via SSR shell anchor extraction + JSON-LD breadcrumb name resolution.
- StartEngine secondary path via
sitemap-private-offerings.xml slug enumeration (detail-page render is JS-gated; v2 Camoufox upgrade documented).
- Pay-Per-Event pricing:
$0.05 actor-start + $0.005/result-row; ~$5.05 per 1k rows (6x cheaper than Crunchbase scrapers).
- Per-source failure non-fatal — one source blocking does not abort the run; only all-sources-blocked exits non-zero.
- Exponential backoff on
429 / 503 (honours Retry-After); max 5 attempts.
curl-cffi chrome131 browser TLS impersonation per ADR-0002; Apify Proxy BUYPROXIES94952 enabled by default.
- Status-filter knob maps to Wefunder
tagFilters (active→status_funding, funded→status_funded, all→no filter).
- Industry-filter knob applies a case-insensitive substring over
tagline OR industry.