Bug 1 — proxy exhaustion no longer causes FAILED runs.
_resolve_proxy_or_die now retries proxy resolution PROXY_RESOLVE_ATTEMPTS (3)
times with PROXY_RESOLVE_BACKOFF_S (2 s) backoff instead of immediately calling
raise SystemExit(1). When all retries are exhausted the function returns None;
main() emits a clear set_status_message and exits cleanly via return — the
run status is SUCCEEDED (0 rows) rather than FAILED. Traces the 2 historical FAILED
cloud runs caused by transient FREE-tier IP-pool exhaustion.
Bug 2 — Google "not available" error page no longer produces false citations.
_try_text_fallback in src/parser.py now requires the heading text to exactly
equal TEXT_FALLBACK_NEEDLE ("ai overview") rather than merely start with it.
Added _heading_signals_error guard that inspects the enclosing container text for
TEXT_FALLBACK_ERROR_PHRASES ("not available", "can't generate",
"cannot generate", "try again later", "not available for this search"); any
match suppresses the fallback hit. Prevents ai_overview_appeared=True rows with
scraped SERP links as fake citations when Google returned the error block.
Bug 3 — CAPTCHA marker is unambiguous (observability).
Added blocked_by_captcha: bool = False field to ResultRow. _marker_row sets
it True; _no_overview_row leaves it False. _is_captcha_marker now checks
rows[0].blocked_by_captcha directly instead of duck-typing on
, eliminating false "CAPTCHA-blocked" status messages for queries that simply
did not trigger an AI Overview.