Cloud-data fix for the v0.1 QA-concerns (empty player metadata + empty
transcripts on cloud).
- Proxy default flipped to direct (
useApifyProxy: false). YouTube
degrades both its innertube player endpoint and its transcript endpoint
for the shared BUYPROXIES94952 datacenter proxy pool — the v0.1 default
(useApifyProxy: true) routed every fetch into that blocked pool, so
cloud runs emitted near-empty rows. Running direct from Apify's native
egress restores player metadata; transcripts are best-effort. Residential
proxy remains an opt-in for high-volume runs.
- Added a
WARNING log when the player endpoint returns HTTP 200 with no
videoDetails, so future IP-degradation surfaces in run logs.
Initial release. Build 0.1.1. Cloud QA runId=ITQnRX3MBvmVYLPtY (SUCCEEDED, 3 rows, 2 qa-concerns — see notes.md).
- Scrapes recent Shorts from one or more YouTube channel handles.
- Fetches spoken transcripts via
youtube-transcript-api through Apify Proxy.
- Heuristic 0.0–1.0 sponsorship score from spoken strong/weak signals,
disclosure hashtags, and @mention/domain patterns.
- PPE:
actor-start ($0.005) + result ($0.004 per Short).
- Fields:
channel, video_id, url, title, published_text,
view_count, length_seconds, description, tags, has_transcript,
transcript_chars, hashtags, mentions, detected_brands,
sponsorship_signals, sponsorship_score, is_likely_sponsored,
scraped_at.