Runs on-device OCR (tesseract.js) on the phone and email images to populate the 'phone' and 'email' text fields with the actual number/address — in addition to the image URL. Requires scrapePhone and/or scrapeEmail to be enabled.
💰 COST IMPACT (per listing): OCR is CPU-bound and adds roughly 200–600ms per image. A run with both scrapePhone and scrapeEmail enabled can take ~1s extra per listing, plus a one-time ~10MB language-data download and ~1–2s worker warm-up on startup. You will likely need more Actor memory (≥1024 MB recommended) and your run will be slower → more compute-unit cost. Only enable this if you actually need the plain-text values; otherwise keep the image URL alone.
⚠️ PRIVACY / GDPR: Turning this on means you are no longer just storing an image — you are extracting and processing personal data (phone numbers, email addresses) in machine-readable form. The same GDPR responsibilities noted on 'scrapePhone' and 'scrapeEmail' apply and intensify here: lawful basis, purpose limitation, retention, and data-subject rights are entirely on you as the run operator. Do not enable unless you have decided that your use case is lawful.
Accuracy is high on clean images but not guaranteed — always validate downstream.