Instagram Transcript Scraper
Pricing
from $5.00 / 1,000 results
Instagram Transcript Scraper
Extract transcripts from Instagram videos and reels using auto-generated captions or AI-powered speech-to-text. Returns clean, timestamped transcript segments with full video metadata.
Pricing
from $5.00 / 1,000 results
Rating
4.2
(4)
Developer
Crawler Bros
Maintained by CommunityActor stats
9
Bookmarked
559
Total users
154
Monthly active users
0.35 hours
Issues response
4 days ago
Last modified
Categories
Share
Turn Instagram videos and reels into clean, structured transcripts — automatically.
Extract spoken content from public Instagram videos with full timestamped segments, video metadata, and support for 30+ languages. Built for developers, data teams, content researchers, and AI pipelines that need text from video at scale.
What This Actor Does
The Instagram Transcript Scraper navigates to an Instagram video URL, downloads the audio, and converts it to text using one of two methods:
- Native Captions — Reads Instagram's auto-generated captions directly from the API. Instant and zero extra cost when available.
- Whisper AI — Uses OpenAI's Whisper speech-to-text model locally on the container. Works on any video with speech, regardless of whether Instagram generated captions.
The auto mode (default) tries native captions first and only falls back to Whisper when needed — giving you the best coverage at the lowest processing time.
Input Parameters
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
videoUrls | array | Yes | — | One or more Instagram video/reel URLs to transcribe |
transcriptionMethod | string | No | auto | auto, native, or whisper — see Transcription Methods |
whisperModel | string | No | base | Whisper model size: tiny, base, or small |
language | string | No | auto-detect | Language code (e.g. en, es, fr). Leave empty for automatic detection |
Supported URL Formats
https://www.instagram.com/reel/ABC123xyz/https://www.instagram.com/p/ABC123xyz/https://www.instagram.com/tv/ABC123xyz/
Input Example
{"videoUrls": ["https://www.instagram.com/reel/DV29mBcMQwp/","https://www.instagram.com/p/DULBkEngpxg/"],"transcriptionMethod": "auto","whisperModel": "base","language": ""}
Output Format
Each transcript segment is returned as a separate dataset row. All rows for the same video share the same metadata — this lets you filter by video, join on code, or stream segments in order by segmentIndex.
On error (private post, deleted video, unsupported media type), the actor emits a single row with url, errMsg, and timestamp only — no empty fields.
Output Fields
| Field | Type | Description |
|---|---|---|
url | string | Canonical Instagram URL of the video |
code | string | Instagram shortcode (unique identifier) |
pk | string | Instagram internal numeric media ID |
id | string | Combined media ID (pk_userId format) |
title | string | Video caption text |
img | string | Thumbnail image URL |
videoUrl | string | Direct URL to the highest-quality MP4 |
audioUrl | string | Direct URL to the audio-only track (omitted when unavailable) |
createTime | string | Post creation time in UTC ISO 8601 format (e.g. 2025-05-10T14:32:03Z) |
likeCount | integer | Number of likes |
commentCount | integer | Number of comments |
userPk | string | Creator's numeric Instagram user ID |
userName | string | Creator's Instagram handle |
userFullName | string | Creator's display name |
avatarUri | string | Creator's profile picture URL |
fullText | string | Complete transcript of the entire video |
transcriptionMethod | string | native or whisper |
segmentIndex | integer | Zero-based index of this segment |
segmentStart | number | Segment start time in seconds |
segmentEnd | number | Segment end time in seconds |
segmentText | string | Transcript text for this segment |
totalSegments | integer | Total segments in this video's transcript |
timestamp | string | UTC ISO timestamp of when this record was scraped |
errMsg | string | Error description (only present on failed records) |
Output Example — Success
{"url": "https://www.instagram.com/p/DV29mBcMQwp/","code": "DV29mBcMQwp","pk": "3852537424986049577","id": "3852537424986049577_16278726","title": "On Friday, US President Donald Trump claimed Iran's air force is \"no longer\"...","img": "https://scontent-iad6-1.cdninstagram.com/...","videoUrl": "https://scontent-iad6-1.cdninstagram.com/...mp4","createTime": "2025-05-11T05:12:03Z","likeCount": 16799,"commentCount": 1159,"userPk": "16278726","userName": "bbcnews","userFullName": "BBC News","avatarUri": "https://scontent-iad3-1.cdninstagram.com/...","fullText": "On Friday, U.S. President Donald Trump claimed Iran's Air Force is no longer...","transcriptionMethod": "whisper","timestamp": "2026-06-08T06:09:05.841Z","segmentIndex": 0,"segmentStart": 0,"segmentEnd": 4.36,"segmentText": "On Friday, U.S. President Donald Trump claimed Iran's Air Force is no longer, as a result","totalSegments": 22}
Output Example — Error
{"url": "https://www.instagram.com/reel/DELETED123/","errMsg": "Could not extract video data from page. The post may not exist or may not be a video.","timestamp": "2026-06-08T06:09:15.168Z"}
Transcription Methods
Auto (Recommended)
Tries Instagram's native captions first. Falls back to Whisper AI automatically when native captions are unavailable. Best balance of speed and coverage for most workloads.
Native
Only reads Instagram's built-in auto-generated captions. Fastest option — no audio download needed. May not be available on older posts, non-Reel content, or videos without speech.
Whisper AI
Always downloads the video and runs local AI speech-to-text. Consistent coverage for any video with speech, independent of Instagram's captioning availability.
Model comparison:
| Model | Size | Speed | Accuracy | Best For |
|---|---|---|---|---|
tiny | 39 MB | Fastest | Basic | Quick previews, speed-critical pipelines |
base | 74 MB | Fast | Good | Most use cases |
small | 244 MB | Moderate | Very good | Accented speech, technical or specialized content |
Use Cases
- AI agents & LLM pipelines — Feed Instagram video speech into RAG systems, summarizers, or classifiers
- Content research — Extract and analyze what creators are saying across a topic or niche
- Social media monitoring — Capture spoken claims in video content for brand or news tracking
- Subtitle generation — Generate timestamped captions for repurposed video content
- Competitive intelligence — Batch-transcribe competitor or industry video content
- Accessibility — Build searchable archives of spoken video content
Supported Languages
The Whisper model supports 99+ languages with automatic detection. For best accuracy on non-English content, set the language field explicitly. Supported dropdown options include:
English, Spanish, French, German, Italian, Portuguese, Russian, Chinese, Japanese, Korean, Arabic, Hindi, Dutch, Polish, Turkish, Swedish, Danish, Finnish, Norwegian, Czech, Romanian, Hungarian, Indonesian, Vietnamese, Thai, Ukrainian, Hebrew, Persian, Malay, and more.
Limitations
- Public videos only — Private accounts, restricted posts, and login-gated content cannot be scraped.
- Audio quality matters — Whisper accuracy degrades on videos with heavy background music, multiple overlapping speakers, or very low recording quality.
- Native captions not always available — Instagram doesn't generate captions for all videos. Short clips, older posts, or posts with music-only audio may have no native captions; the actor falls back to Whisper automatically in
automode. - Instagram CDN URLs expire — The
videoUrl,audioUrl,img, andavatarUriURLs in the output are time-limited CDN links. Download media promptly; do not rely on these URLs as permanent storage. - Rate limiting — Processing many videos in rapid succession may trigger temporary rate limits. The actor includes automatic delays between requests.
FAQ
Q: Does this work with Instagram Reels, regular video posts, and IGTV?
A: Yes. All three URL formats (/reel/, /p/, /tv/) are supported.
Q: What if the video has no speech (music only or silent)?
A: Whisper will return an empty or near-empty transcript. Native captions won't exist. The actor returns a single row with empty fullText and segmentText.
Q: Can I process multiple videos in one run?
A: Yes. Provide as many URLs as you need in the videoUrls array. The actor processes them sequentially with automatic pauses between requests.
Q: Why does my run return "errMsg": "Could not extract video data..."?
A: The post is likely private, deleted, or restricted for your region. Verify the post is publicly accessible in a browser without being logged in.
Q: How accurate is Whisper transcription?
A: The base model is accurate for clear, single-speaker English audio. For accented speech, fast speech, or specialized vocabulary, use small. For multilingual content, setting the language field explicitly improves results.
Q: Are the video/audio URLs in the output permanent?
A: No. Instagram CDN URLs are signed and expire within hours or days. Download the media during the run if you need to store it.
Q: What languages does auto-detection support?
A: Whisper's automatic language detection covers 99+ languages. Detection is most reliable for videos with at least 30 seconds of speech.
Q: Does this require Instagram login or cookies?
A: The actor uses a shared cookie pool for Instagram session access. No credentials are required from the user.