All notable changes to the ORIAS Intermediary Scraper are documented in this file.
The format is based on Keep a Changelog .
Apify Store version follows actor.json version (two-part MAJOR.MINOR ).
Console input — step-based form (goal → input → depth → filters → run size), shorter mode/export enum labels, visible geo filters, New user? path (Catalog · COA · 25 profiles)
Try preset — .actor/INPUT.json aligned with Console prefill (catalog + COA + contacts + cap 25)
README — Store-grade structure: Who is this for, fill rates, tiered pricing table, limitations, how it works, interlinks
Console input — field order and benefit-first copy aligned with workflow (mode → depth → list → cap)
Output schema — Contacts dataset view link in Output tab
Monitor baseline persisted in actor-scoped named KV store orias-monitor (cross-run delta)
Monitor baseline key format ORIAS_BASELINE-COA (slashes broke Apify KV API)
Validate mode HTTP retries on proxy/network errors (up to 3 attempts)
Modes: enrich , catalog , validate , monitor (legacy urls / categories still supported)
Validate tier-1 JSON lookup + optional full profile
Monitor delta vs KV baseline per category set
Geo post-filters: filterDepartment , filterZipPrefix , filterNameContains
Dataset views: overview , contacts , outreachCrm , compliance
PPE billing via Actor.pushData + Actor.charge(actor-start)
Published tasks kit (published-tasks/ , 4 tasks bootstrapped)
Store polish: benefit-first input schema, SEO metadata, RUN_LOG output link
Tests: csvParser, throttle, input, billing, delta, filters, scraper (57 total)
Apify actor version 1.0
Default cloud proxy: datacenter via { useApifyProxy: true }
src/lib/validate.js , delta.js , billing.js , catalog.js , profileCrawler.js
.actor/key_value_store_schema.json
exportLevel input : register , contacts , full
src/lib/ modules : documentIds , csvParser , discoverer , scraper , throttle , urls , runLog
Stable CSV document IDs for COA/CIF/COBSP
Register export: progressive pushData for ~27k COA rows without Cheerio crawl
RUN_LOG KV flush during long runs
Tests: csvParser.test.js , throttle.test.js
Docs: docs/csv-schema.md , docs/endpoint-audit.json
Throttling: maxRequestsPerMinute on CheerioCrawler
Category discovery merges CSV register fields with profile scrape when contacts/full
Apify actor version 0.4
Resurrection support via RequestQueue for large crawls
README timeout and resurrection guidance
Default timeout raised to 8h (28800s) in actor.json
Category mode with CSV discovery (COA, CIF, COBSP) and paginated search (AGA, MIA, …)
discoverer.js module
Refactored main.js for discovery → scrape pipeline
Input schema: mode , categories
Website from link and email-domain fallback
Invalid URL error for websites without protocol
Output schema with dataset views
Address parsing, phone E.164, association URLs, progress logging
RequestList for faster startup; memory default 256 MB
Nullable association fields in dataset schema
Initial release — SIREN/URL input, profile extraction, JSON/CSV export