Instagram Transcript Scraper
Pricing
from $5.00 / 1,000 results
Instagram Transcript Scraper
Extract transcripts from Instagram videos and reels using auto-generated captions or AI-powered speech-to-text. Returns clean, timestamped transcript segments with full video metadata.
Pricing
from $5.00 / 1,000 results
Rating
5.0
(13)
Developer
Crawler Bros
Actor stats
15
Bookmarked
18
Total users
11
Monthly active users
8 days ago
Last modified
Categories
Share
Turn Instagram videos and reels into clean, structured transcripts with ease.
Extract spoken content from public Instagram videos — designed for developers, data teams, and AI agents that need text, not just media.
Overview
The Instagram Transcript Scraper extracts transcripts from Instagram video posts and reels using a dual-strategy approach:
- Native Captions — Extracts Instagram's auto-generated captions when available (fastest, zero extra cost)
- Whisper AI — Falls back to OpenAI's Whisper speech-to-text model for accurate transcription of any video with speech
What You Get
For each Instagram video URL, the scraper returns:
- Full transcript text (complete speech-to-text)
- Timestamped segments for precise alignment
- Video title & thumbnail
- Video and audio playback URLs
- Engagement metrics (likes, comments)
- Creator profile information (username, display name, avatar)
- Transcription method used (native or whisper)
- Error status per request (safe for automation)
All data is returned in structured JSON, ready to feed into databases, AI pipelines, or downstream services.
Input Parameters
| Field | Type | Required | Description |
|---|---|---|---|
videoUrls | array | Yes | List of Instagram video/reel URLs to transcribe |
transcriptionMethod | string | No | auto (default), native, or whisper |
whisperModel | string | No | tiny, base (default), or small |
language | string | No | Language code (e.g. en, es). Empty = auto-detect |
Input Example
{"videoUrls": ["https://www.instagram.com/reel/DGtml1wNRoM/","https://www.instagram.com/p/DULBkEngpxg/"],"transcriptionMethod": "auto","whisperModel": "base"}
Output Format
Each transcript segment is returned as a separate dataset row, with full video metadata included on every row for easy filtering and export.
| Field | Type | Description |
|---|---|---|
url | string | Normalized Instagram video URL |
code | string | Instagram shortcode |
pk | string | Instagram internal post ID |
id | string | Combined media identifier |
title | string | Video caption text |
img | string | Video thumbnail URL |
videoUrl | string | Direct video playback URL |
audioUrl | string | Direct audio playback URL |
createTime | number | Video creation timestamp (Unix seconds) |
likeCount | number | Number of likes |
commentCount | number | Number of comments |
userPk | string | Creator user ID |
userName | string | Creator username |
userFullName | string | Creator display name |
avatarUri | string | Creator avatar image URL |
fullText | string | Complete transcript text |
segmentIndex | number | Segment index (0-based) |
segmentStart | number | Segment start time in seconds |
segmentEnd | number | Segment end time in seconds |
segmentText | string | Text for this specific segment |
totalSegments | number | Total number of segments for this video |
transcriptionMethod | string | Method used: native or whisper |
errMsg | string | Error message (empty if success) |
timestamp | string | Response timestamp (ISO 8601) |
Output Example
{"url": "https://www.instagram.com/p/DTxiM0Ijqvz/","code": "DTxiM0Ijqvz","pk": "3814980773552761843","id": "3814980773552761843_48143082417","title": "#explore","img": "https://scontent-...","videoUrl": "https://scontent-...","audioUrl": "https://scontent-...","createTime": 1769001195,"likeCount": 171650,"commentCount": 781,"userPk": "48143082417","userName": "theavamariee","userFullName": "Ava Marie","avatarUri": "https://scontent-...","fullText": "Excuse me, miss. What? Your shirt's backwards.","transcriptionMethod": "whisper","timestamp": "2026-03-14T12:00:00.000Z","segmentIndex": 0,"segmentStart": 0.3,"segmentEnd": 1.16,"segmentText": "Excuse me, miss.","totalSegments": 3,"errMsg": ""}
Typical Use Cases
- AI agents & LLM pipelines — Speech-to-text ingestion for content understanding
- Content summarization — Get the spoken word from any Instagram video
- Social media analysis — Analyze what creators are saying at scale
- Subtitle generation — Create captions for repurposing content
- Trend & language research — Study speech patterns across video content
- Batch transcription — Process hundreds of videos in one run
Transcription Methods
Auto (Recommended)
Tries native Instagram captions first. If not available, automatically falls back to Whisper AI. Best balance of speed and coverage.
Native
Only uses Instagram's auto-generated captions. Fastest and cheapest, but may not be available for all videos (especially older posts or non-Reels content).
Whisper
Always uses the Whisper AI speech-to-text model. Works for any video with speech. Choose model size based on your accuracy/speed needs:
| Model | Size | Speed | Accuracy | Best For |
|---|---|---|---|---|
tiny | 39 MB | Fastest | Basic | Quick previews |
base | 74 MB | Fast | Good | Most use cases |
small | 244 MB | Moderate | Very good | High-accuracy needs |
FAQ
Q: Does this work with Instagram Reels? A: Yes! Reels, regular video posts, and IGTV are all supported.
Q: What if the video has no speech? A: The scraper will return an empty transcript with an appropriate message. Music-only or silent videos produce no transcript.
Q: What languages are supported? A: Whisper supports 99+ languages with automatic detection. You can also specify a language code to improve accuracy.
Q: How accurate is the transcription?
A: Native captions use Instagram's own speech recognition. Whisper base model provides good accuracy for most content. Use small for better results with accented or complex speech.
Q: Can I process videos in bulk?
A: Yes! Provide multiple URLs in the videoUrls array. The scraper processes them sequentially with automatic delays between requests.
Notes
- Transcript accuracy depends on audio clarity and language
- Each transcript segment is a separate dataset row for easy export and filtering
- The scraper uses browser automation with anti-detection measures for reliable access
- Please ensure compliance with Instagram's terms and local regulations