1# YouTube Thumbnail Downloader — Input/Output Schema
2
3This document describes the design of the actor's input and output. It is a
4design document, not the actual `.actor/input_schema.json` / output schema —
5those will be generated from this spec once the actor is implemented.
6
7---
8
9## Input
10
11The actor accepts a single `startUrls`-style input: a list of YouTube video
12URLs. For each URL, the actor fetches the requested thumbnail resolution(s)
13and saves the image(s) to the default Key-Value Store. One dataset item is
14produced per (video × resolution) pair.
15
16### Input fields
17
18
19|------------------|----------------|----------|---------|-------------|
20| `startUrls` | `string[]` | yes | — | List of YouTube watch / share / shorts URLs. Each URL is parsed to a video ID via regex. Invalid entries are skipped and reported in the run log. |
21| `resolutions` | `string[]` | no | `["maxresdefault"]` | Which thumbnail qualities to download. Allowed values: `maxresdefault` (1280×720), `sddefault` (640×480), `hqdefault` (480×360), `mqdefault` (320×180), `default` (120×90). Unknown values are ignored. |
22| `filenamePrefix` | `string` | no | `"youtube-thumbnail-"` | Prefix for filenames saved in the KV-store. Final filename: `<prefix><videoId>_<resolution>.jpg` |
23| `proxy` | `object` | no | — | Optional Apify proxy config. Useful if YouTube throttles the actor IP during large runs. |
24
25### Example input
26
27```json
28{
29 "startUrls": [
30 "https://www.youtube.com/watch?v=dQw4w9WgXcQ",
31 "https://youtu.be/9bZkp7q19f0",
32 "https://www.youtube.com/shorts/jNQXAC9IVRw"
33 ],
34 "resolutions": [
35 "maxresdefault",
36 "sddefault"
37 ],
38 "filenamePrefix": "yt-thumb-"
39}
40```
41
42### URL parsing rules
43
44Accept any URL that contains an 11-character YouTube video ID. Patterns that
45must be handled:
46
47- `https://www.youtube.com/watch?v=<ID>`
48- `https://m.youtube.com/watch?v=<ID>`
49- `https://youtu.be/<ID>`
50- `https://www.youtube.com/shorts/<ID>`
51- `https://www.youtube.com/embed/<ID>`
52
53Anything that doesn't match is logged at `warning` level and skipped — it does
54not fail the run.
55
56### Resolution notes
57
58- `maxresdefault` may be missing for older videos. The actor should fall back
59 to `hqdefault` in that case and record the actually-served resolution in
60 the output.
61- All thumbnail URLs are public on `img.youtube.com`, so no scraping of the
62 watch page is required — we can construct the CDN URL directly from the
63 video ID.
64
65---
66
67## Output
68
69The actor writes to two places:
70
711. **Default Key-Value Store** — the actual image files, one per
72 (video × resolution) pair. Filename: `<filenamePrefix><videoId>_<resolution>.<ext>` where the extension matches the served content (`.jpg` for JPEG, `.webp` for WebP).
732. **Default Dataset** — one JSON item per saved image, described below.
74
75### Dataset item shape
76
77
78|--------------------|----------|-------------|
79| `videoId` | `string` | The 11-character YouTube video ID extracted from the input URL. |
80| `sourceUrl` | `string` | The original input URL the user provided (verbatim, for traceability). |
81| `title` | `string` | Best-effort video title. Empty string if it could not be looked up. (Optional enrichment — see below.) |
82| `channelTitle` | `string` | Best-effort channel name. Empty string if unavailable. |
83| `resolution` | `string` | The requested resolution key (`maxresdefault`, `sddefault`, …). |
84| `resolutionServed` | `string` | The resolution actually fetched — may differ from `resolution` if we fell back from `maxresdefault` to `hqdefault`. |
85| `thumbnailUrl` | `string` | The `img.youtube.com` URL the image was downloaded from. |
86| `width` | `number` | Pixel width of the saved image. |
87| `height` | `number` | Pixel height of the saved image. |
88| `bytes` | `number` | File size in bytes. |
89| `contentType` | `string` | MIME type of the saved image — `image/jpeg` (default) or `image/webp` when YouTube serves WebP only for that resolution. |
90| `keyValueStoreKey` | `string` | The filename in the default KV-store. Extension matches `contentType`: `.jpg` for JPEG, `.webp` for WebP (e.g. `yt-thumb-dQw4w9WgXcQ_maxresdefault.jpg`). |
91| `publicDownloadUrl`| `string` | A URL the user can click to download/view the image. |
92| `fetchedAt` | `string` | ISO-8601 timestamp of when the image was downloaded. |
93
94### Example output (one row per saved image)
95
96```json
97[
98 {
99 "videoId": "dQw4w9WgXcQ",
100 "sourceUrl": "https://www.youtube.com/watch?v=dQw4w9WgXcQ",
101 "title": "Rick Astley - Never Gonna Give You Up",
102 "channelTitle": "Rick Astley",
103 "resolution": "maxresdefault",
104 "resolutionServed": "maxresdefault",
105 "thumbnailUrl": "https://i.ytimg.com/vi/dQw4w9WgXcQ/maxresdefault.jpg",
106 "width": 1280,
107 "height": 720,
108 "bytes": 112834,
109 "contentType": "image/jpeg",
110 "keyValueStoreKey": "yt-thumb-dQw4w9WgXcQ_maxresdefault.jpg",
111 "publicDownloadUrl": "https://api.apify.com/v2/key-value-stores/<storeId>/records/yt-thumb-dQw4w9WgXcQ_maxresdefault.jpg",
112 "fetchedAt": "2026-06-19T20:35:12.000Z"
113 },
114 {
115 "videoId": "dQw4w9WgXcQ",
116 "sourceUrl": "https://www.youtube.com/watch?v=dQw4w9WgXcQ",
117 "title": "Rick Astley - Never Gonna Give You Up",
118 "channelTitle": "Rick Astley",
119 "resolution": "sddefault",
120 "resolutionServed": "sddefault",
121 "thumbnailUrl": "https://i.ytimg.com/vi/dQw4w9WgXcQ/sddefault.jpg",
122 "width": 640,
123 "height": 480,
124 "bytes": 54210,
125 "keyValueStoreKey": "yt-thumb-dQw4w9WgXcQ_sddefault.jpg",
126 "publicDownloadUrl": "https://api.apify.com/v2/key-value-stores/<storeId>/records/yt-thumb-dQw4w9WgXcQ_sddefault.jpg",
127 "fetchedAt": "2026-06-19T20:35:13.000Z"
128 }
129]
130```
131
132### Title / channel enrichment (optional)
133
134Pulling the video title and channel name requires either scraping the watch
135page or using the YouTube Data API. To keep the actor dependency-free and
136free of API-key requirements, **enrichment is off by default**. If we add
137it later, the input will gain an optional `enrich: boolean` flag (default
138`false`) and an optional `youtubeApiKey` field. The output schema already
139accommodates this — those fields default to `""` when enrichment is off.
140
141---
142
143## Edge cases / behavior
144
145- **Duplicate resolutions in input** — de-duplicated before fetching.
146- **Invalid URL** — logged, skipped, run continues.
147- **`maxresdefault` returns 404** — fall back to `hqdefault`; record the
148 actual `resolutionServed` in the output.
149- **Empty `startUrls`** — actor exits cleanly with a warning, no items
150 pushed to the dataset, no files written.
151- **Network error mid-download** — logged, that (video × resolution) pair is
152 skipped, run continues.