Changelog
All notable changes to Hybrid Vision Spider will be documented in this file.
The format is based on Keep a Changelog ,
and this project adheres to Semantic Versioning .
[1.0.0] - 2025-01-03
Added
Initial release of Hybrid Vision Spider
Hybrid scraping mode (HTML + Vision + Fallback)
OpenAI GPT-4o-mini Vision API integration
Schema-driven extraction with JSON Schema validation
Token budget and vision page limits
Rate limiting with semaphore (2 concurrent Vision calls)
KVS persistence for screenshots and HTML
STATS.json with detailed metrics
PPE billing events
Anti-bot protection with fingerprinting
Intelligent proxy policy (datacenter/residential)
Error histogram and performance tracking
Webhook support for progress notifications
Comprehensive security features (log sanitization, HMAC signatures)
Security
API key scrubbing in all logs
HMAC SHA-256 webhook signature verification
Secret sanitization in error messages
URL query parameter masking
Multi-stage Docker build with minification
Separate concurrency limits (HTML: 16, Browser: 6, Vision: 2)
HTML truncation at 200KB
Exponential backoff retry logic
Heuristic pre-filtering for common fields
[Unreleased]
Added
Simple URL list input (urlList) merged with advanced request sources for easier onboarding.
Branded logging banner with compliance notice and structured sections for each processed URL.
Rich dataset schema definition so the Apify Output tab displays descriptions and confidence metadata.
Removed
Camoufox optional dependency (Playwright Chromium/Firefox support remains) to eliminate deprecated transitive packages.
Planned
Multi-LLM support (Claude, Gemini)
RAG integration with embeddings
Autonomous agent mode
Video frame parsing