Instagram Transcript Scraper avatar

Instagram Transcript Scraper

Pricing

from $5.00 / 1,000 results

Go to Apify Store
Instagram Transcript Scraper

Instagram Transcript Scraper

Extract transcripts from Instagram videos and reels using auto-generated captions or AI-powered speech-to-text. Returns clean, timestamped transcript segments with full video metadata.

Pricing

from $5.00 / 1,000 results

Rating

4.2

(4)

Developer

Crawler Bros

Crawler Bros

Maintained by Community

Actor stats

9

Bookmarked

559

Total users

154

Monthly active users

0.35 hours

Issues response

4 days ago

Last modified

Share

Turn Instagram videos and reels into clean, structured transcripts — automatically.

Extract spoken content from public Instagram videos with full timestamped segments, video metadata, and support for 30+ languages. Built for developers, data teams, content researchers, and AI pipelines that need text from video at scale.


What This Actor Does

The Instagram Transcript Scraper navigates to an Instagram video URL, downloads the audio, and converts it to text using one of two methods:

  1. Native Captions — Reads Instagram's auto-generated captions directly from the API. Instant and zero extra cost when available.
  2. Whisper AI — Uses OpenAI's Whisper speech-to-text model locally on the container. Works on any video with speech, regardless of whether Instagram generated captions.

The auto mode (default) tries native captions first and only falls back to Whisper when needed — giving you the best coverage at the lowest processing time.


Input Parameters

FieldTypeRequiredDefaultDescription
videoUrlsarrayYesOne or more Instagram video/reel URLs to transcribe
transcriptionMethodstringNoautoauto, native, or whisper — see Transcription Methods
whisperModelstringNobaseWhisper model size: tiny, base, or small
languagestringNoauto-detectLanguage code (e.g. en, es, fr). Leave empty for automatic detection

Supported URL Formats

https://www.instagram.com/reel/ABC123xyz/
https://www.instagram.com/p/ABC123xyz/
https://www.instagram.com/tv/ABC123xyz/

Input Example

{
"videoUrls": [
"https://www.instagram.com/reel/DV29mBcMQwp/",
"https://www.instagram.com/p/DULBkEngpxg/"
],
"transcriptionMethod": "auto",
"whisperModel": "base",
"language": ""
}

Output Format

Each transcript segment is returned as a separate dataset row. All rows for the same video share the same metadata — this lets you filter by video, join on code, or stream segments in order by segmentIndex.

On error (private post, deleted video, unsupported media type), the actor emits a single row with url, errMsg, and timestamp only — no empty fields.

Output Fields

FieldTypeDescription
urlstringCanonical Instagram URL of the video
codestringInstagram shortcode (unique identifier)
pkstringInstagram internal numeric media ID
idstringCombined media ID (pk_userId format)
titlestringVideo caption text
imgstringThumbnail image URL
videoUrlstringDirect URL to the highest-quality MP4
audioUrlstringDirect URL to the audio-only track (omitted when unavailable)
createTimestringPost creation time in UTC ISO 8601 format (e.g. 2025-05-10T14:32:03Z)
likeCountintegerNumber of likes
commentCountintegerNumber of comments
userPkstringCreator's numeric Instagram user ID
userNamestringCreator's Instagram handle
userFullNamestringCreator's display name
avatarUristringCreator's profile picture URL
fullTextstringComplete transcript of the entire video
transcriptionMethodstringnative or whisper
segmentIndexintegerZero-based index of this segment
segmentStartnumberSegment start time in seconds
segmentEndnumberSegment end time in seconds
segmentTextstringTranscript text for this segment
totalSegmentsintegerTotal segments in this video's transcript
timestampstringUTC ISO timestamp of when this record was scraped
errMsgstringError description (only present on failed records)

Output Example — Success

{
"url": "https://www.instagram.com/p/DV29mBcMQwp/",
"code": "DV29mBcMQwp",
"pk": "3852537424986049577",
"id": "3852537424986049577_16278726",
"title": "On Friday, US President Donald Trump claimed Iran's air force is \"no longer\"...",
"img": "https://scontent-iad6-1.cdninstagram.com/...",
"videoUrl": "https://scontent-iad6-1.cdninstagram.com/...mp4",
"createTime": "2025-05-11T05:12:03Z",
"likeCount": 16799,
"commentCount": 1159,
"userPk": "16278726",
"userName": "bbcnews",
"userFullName": "BBC News",
"avatarUri": "https://scontent-iad3-1.cdninstagram.com/...",
"fullText": "On Friday, U.S. President Donald Trump claimed Iran's Air Force is no longer...",
"transcriptionMethod": "whisper",
"timestamp": "2026-06-08T06:09:05.841Z",
"segmentIndex": 0,
"segmentStart": 0,
"segmentEnd": 4.36,
"segmentText": "On Friday, U.S. President Donald Trump claimed Iran's Air Force is no longer, as a result",
"totalSegments": 22
}

Output Example — Error

{
"url": "https://www.instagram.com/reel/DELETED123/",
"errMsg": "Could not extract video data from page. The post may not exist or may not be a video.",
"timestamp": "2026-06-08T06:09:15.168Z"
}

Transcription Methods

Tries Instagram's native captions first. Falls back to Whisper AI automatically when native captions are unavailable. Best balance of speed and coverage for most workloads.

Native

Only reads Instagram's built-in auto-generated captions. Fastest option — no audio download needed. May not be available on older posts, non-Reel content, or videos without speech.

Whisper AI

Always downloads the video and runs local AI speech-to-text. Consistent coverage for any video with speech, independent of Instagram's captioning availability.

Model comparison:

ModelSizeSpeedAccuracyBest For
tiny39 MBFastestBasicQuick previews, speed-critical pipelines
base74 MBFastGoodMost use cases
small244 MBModerateVery goodAccented speech, technical or specialized content

Use Cases

  • AI agents & LLM pipelines — Feed Instagram video speech into RAG systems, summarizers, or classifiers
  • Content research — Extract and analyze what creators are saying across a topic or niche
  • Social media monitoring — Capture spoken claims in video content for brand or news tracking
  • Subtitle generation — Generate timestamped captions for repurposed video content
  • Competitive intelligence — Batch-transcribe competitor or industry video content
  • Accessibility — Build searchable archives of spoken video content

Supported Languages

The Whisper model supports 99+ languages with automatic detection. For best accuracy on non-English content, set the language field explicitly. Supported dropdown options include:

English, Spanish, French, German, Italian, Portuguese, Russian, Chinese, Japanese, Korean, Arabic, Hindi, Dutch, Polish, Turkish, Swedish, Danish, Finnish, Norwegian, Czech, Romanian, Hungarian, Indonesian, Vietnamese, Thai, Ukrainian, Hebrew, Persian, Malay, and more.


Limitations

  • Public videos only — Private accounts, restricted posts, and login-gated content cannot be scraped.
  • Audio quality matters — Whisper accuracy degrades on videos with heavy background music, multiple overlapping speakers, or very low recording quality.
  • Native captions not always available — Instagram doesn't generate captions for all videos. Short clips, older posts, or posts with music-only audio may have no native captions; the actor falls back to Whisper automatically in auto mode.
  • Instagram CDN URLs expire — The videoUrl, audioUrl, img, and avatarUri URLs in the output are time-limited CDN links. Download media promptly; do not rely on these URLs as permanent storage.
  • Rate limiting — Processing many videos in rapid succession may trigger temporary rate limits. The actor includes automatic delays between requests.

FAQ

Q: Does this work with Instagram Reels, regular video posts, and IGTV?
A: Yes. All three URL formats (/reel/, /p/, /tv/) are supported.

Q: What if the video has no speech (music only or silent)?
A: Whisper will return an empty or near-empty transcript. Native captions won't exist. The actor returns a single row with empty fullText and segmentText.

Q: Can I process multiple videos in one run?
A: Yes. Provide as many URLs as you need in the videoUrls array. The actor processes them sequentially with automatic pauses between requests.

Q: Why does my run return "errMsg": "Could not extract video data..."?
A: The post is likely private, deleted, or restricted for your region. Verify the post is publicly accessible in a browser without being logged in.

Q: How accurate is Whisper transcription?
A: The base model is accurate for clear, single-speaker English audio. For accented speech, fast speech, or specialized vocabulary, use small. For multilingual content, setting the language field explicitly improves results.

Q: Are the video/audio URLs in the output permanent?
A: No. Instagram CDN URLs are signed and expire within hours or days. Download the media during the run if you need to store it.

Q: What languages does auto-detection support?
A: Whisper's automatic language detection covers 99+ languages. Detection is most reliable for videos with at least 30 seconds of speech.

Q: Does this require Instagram login or cookies?
A: The actor uses a shared cookie pool for Instagram session access. No credentials are required from the user.