AI Video Data Extractor: Youtube, Instagram, TikTok, FB, X, etc avatar

AI Video Data Extractor: Youtube, Instagram, TikTok, FB, X, etc

Pricing

from $20.00 / 1,000 results

Go to Apify Store
AI Video Data Extractor: Youtube, Instagram, TikTok, FB, X, etc

AI Video Data Extractor: Youtube, Instagram, TikTok, FB, X, etc

Turn any video into structured JSON with AI. Define your custom schema (strings, numbers, arrays, objects, enums, booleans), provide video URLs, and get perfectly formatted data back. Works on YouTube, TikTok, X, Instagram, Facebook. 99+ languages. Smart retry on rate limits. No parsing code needed.

Pricing

from $20.00 / 1,000 results

Rating

5.0

(1)

Developer

InVideoIQ

InVideoIQ

Maintained by Community

Actor stats

2

Bookmarked

6

Total users

4

Monthly active users

3 days ago

Last modified

Share

πŸš€ AI Video Data Extractor: Turn Any Video Into Structured JSON

Define a JSON schema. Paste video URLs from YouTube, TikTok, X, Instagram, Facebook, Vimeo, Loom, and more.

Get back clean, structured JSON that exactly matches your schema.

No transcript parsing. No brittle prompts. No inconsistent outputs.

Just structured data ready for APIs, databases, CRMs, spreadsheets, or AI pipelines.

πŸ’° $0.02 per video β€’ 🌍 99+ languages β€’ ⚑ Fast extraction


🎯 What Can You Extract From Videos?

Most video AI tools give you summaries or transcripts.

Those are useful for reading, but hard to automate.

This actor goes one step further:

➑️ Extract structured data directly from videos

Examples of what you can extract:

  • companies mentioned
  • products and brands
  • people speaking
  • pricing or offers
  • claims or positioning
  • topics and key insights
  • quotes or hooks
  • FAQs and objections
  • CTAs or promotions

Instead of reading the video yourself, you get a structured dataset.


πŸ’‘ Example: How To Extract Structured Data From a Video

Input

Video URL:

https://youtube.com/watch?v=veRbckoCwkc

Schema:

{
"main_topic": {
"type": "String",
"description": "Main theme of the video"
},
"people_mentioned": {
"type": "Array",
"description": "People mentioned in the video",
"items": {
"type": "String",
"description": "Name of a person mentioned"
}
},
"tools_mentioned": {
"type": "Array",
"description": "Tools or products mentioned",
"items": {
"type": "String",
"description": "Name of the tool"
}
}
}

Output

{
"video_url": "...",
"main_topic": "Optimizing productivity using AI tools",
"people_mentioned": ["Sam Altman"],
"tools_mentioned": ["ChatGPT", "Notion AI"]
}

Now your pipeline can store or process the data automatically.


✨ Why Use AI Video Data Extractor?

Most video tools produce:

  • transcripts
  • summaries
  • chat interfaces

Those are great for humans but difficult for automation.

This actor is designed for data pipelines and automation workflows.

Custom schema extraction

You define the exact JSON structure you want returned.

Multi-platform support

Works with video URLs from:

  • YouTube
  • TikTok
  • Instagram
  • Facebook
  • X (Twitter)
  • Vimeo
  • Loom
  • Dailymotion
  • Rumble

Built for automation

Outputs clean JSON ready for:

  • APIs
  • databases
  • spreadsheets
  • CRMs
  • LLM pipelines
  • data analysis

Fast extraction

If a transcript already exists, the actor goes directly to structured extraction.

Cost-efficient

Most standard runs cost:

$0.02 per result

Even when analyzing thousands of videos.

Works even when transcripts don't exist

For videos without subtitles, the actor can automatically generate a transcript first using speech to text models, then run extraction.


πŸ”„ Extract Data From Videos Without Subtitles

Some videos (especially Instagram) do not provide subtitles.

When transcribe_if_transcript_missing is enabled, the actor automatically:

  1. Attempts extraction using available transcripts
  2. If none exist, generates a transcript via speech-to-text models
  3. Caches the transcript at backend and runs the extraction again using the generated transcript

You don't need to handle transcription yourself.

This significantly improves success rate for social video sources.

Important notes:

  • Instagram videos go directly to transcription first
  • Transcription fallback only applies to non-YouTube platforms
  • Transcription adds an additional cost but only for the first time. Once the transcript is cached, you can run as many schemas on it without addition transcription costs

πŸ‘₯ Who Uses AI Video Data Extractor?

This actor is designed for teams that process video content at scale.

Lead generation teams

Extract:

  • company names
  • founders
  • pricing offers
  • CTAs
  • emails or phone numbers
  • pain points

Market research teams

Turn competitor videos into structured datasets:

  • topics
  • positioning
  • pricing
  • features
  • objections

Content teams

Extract:

  • hooks
  • key quotes
  • product mentions
  • content angles
  • topics

AI builders

Feed structured outputs into:

  • AI agents
  • RAG pipelines
  • enrichment workflows
  • scoring systems

Brand and e-commerce teams

Analyze social videos for:

  • product mentions
  • promotions
  • sentiment
  • creator messaging

πŸ“ˆ Use Cases for Video Data Extraction

  • Extract products, brands, prices, discounts, and claims from TikTok or Instagram videos
  • Convert YouTube interviews into speaker insights, key takeaways, objections, and company mentions
  • Turn webinars into FAQs, action items, roadmap items, and feature requests
  • Monitor creators for brand mentions or sponsorships trend spotting, competitor monitoring, and content research
  • Extract structured fields for CRM enrichment, sales intelligence, or knowledge base ingestion
  • Build datasets from video content for classification, benchmarking, or LLM evaluation

πŸ› οΈ How To Extract Data From Videos Using This Actor

Step 1: Open the actor and enter your video URLs

Paste one or more public video URLs into the video_urls field. You can mix platforms freely in a single run.

Step 2: Define your JSON schema

In the schema field, describe exactly which fields you want extracted. Each field needs a type and a description.

Step 3: (Optional) Add extraction instructions

Use what_to_extract to guide the AI with natural language, for example: "Focus on the products discussed and the speaker's opinion on each."

Step 4: Run and get structured JSON

The actor retrieves or generates the transcript, runs AI extraction, and returns clean JSON matching your schema: one dataset item per video. You can download the extracted dataset in JSON, CSV, Excel, or HTML format directly from the Apify dashboard.

Run it your way

Because this is an Apify Actor, you also get:

  • API access: Call it programmatically from any language or platform: check the API tab for ready-made code examples
  • Scheduling: Set up recurring runs to monitor video content automatically
  • Integrations: Connect to Zapier, Make, Google Sheets, webhooks, and more
  • Monitoring: Track run history, costs, and results from the Apify dashboard

πŸ”— Supported Video Platforms

PlatformNotes
YouTubeFull support, uses available subtitles
TikTokFull support
InstagramRequires transcribe_if_transcript_missing enabled (most Instagram videos lack subtitle tracks)
FacebookPublic videos
X (Twitter)Paste the tweet URL containing the video
LoomFull support
DailymotionFull support
VimeoFull support
RumbleFull support

πŸ“‹ Input and Output Example

{
"main_topic": {
"type": "String",
"description": "The overarching theme of the discussion"
},
"summary": {
"type": "String",
"description": "A 3-4 sentence summary covering the key points, recommendations, and takeaways from the video"
},
"foundational_habits": {
"type": "Array",
"description": "Basic habits required before adding supplements such as sleep or nutrition",
"items": {
"type": "String",
"description": "Name of a foundational habit"
}
},
"supplements_mentioned": {
"type": "Array",
"description": "List of all supplements discussed",
"items": {
"type": "Object",
"description": "Information about a specific supplement",
"properties": {
"name": {
"type": "String",
"description": "Name of the supplement"
},
"category": {
"type": "Enum",
"values": ["Fatty Acid", "Amino Acid/Protein", "Adaptogen", "Vitamin/Mineral", "Other"],
"description": "Categorization of the supplement"
},
"recommended_dosage_mg": {
"type": "Number",
"description": "Recommended daily dosage in milligrams if mentioned. Use 0 if not mentioned."
},
"is_weight_dependent": {
"type": "Boolean",
"description": "Whether the dosage needs to be adjusted based on body weight"
}
}
}
}
}

Example Dataset Output

{
"video_url": "https://www.youtube.com/watch?v=veRbckoCwkc",
"main_topic": "Optimal Supplementation for Health and Performance",
"summary": "The video explores how to optimize health and performance through targeted supplementation. It emphasizes that foundational habits like sleep, nutrition, and exercise must be in place before adding supplements. Key supplements discussed include Omega-3 fatty acids for general health and Creatine for performance, with specific dosage guidance provided.",
"foundational_habits": [
"Getting adequate sleep",
"Proper nutrition and hydration",
"Regular exercise routine"
],
"supplements_mentioned": [
{
"name": "Omega-3 Fatty Acids",
"category": "Fatty Acid",
"recommended_dosage_mg": 1000,
"is_weight_dependent": false
},
{
"name": "Creatine Monohydrate",
"category": "Amino Acid/Protein",
"recommended_dosage_mg": 5000,
"is_weight_dependent": true
}
]
}

πŸ’³ How Much Does AI Video Data Extraction Cost?

Pricing is designed to stay affordable at scale. On the Apify free plan, you get $5 of platform usage credits per month, enough to run hundreds of extractions and test the actor before committing to a paid plan.

Standard extraction

$0.02 per result

In the normal case:

1 video = 1 result

Transcription fallback

If transcript generation is required:

+$0.035 per transcription

Long transcript scaling

Every 15,000 tokens counts as 1 billed result unit.

Approximate reference:

15,000 tokens β‰ˆ 1 hour 15 minutes of speech

Examples

  • Normal extraction with total_tokens = 8,000: $0.02
  • Normal extraction with total_tokens = 12,000: $0.02
  • Successful extraction with total_tokens = 32,000: $0.06
  • Video "X" needs transcription fallback and extraction succeeds with total_tokens = 12,000: $0.055 the first time
  • A secondary extraction on the same video "X" succeeds with total_tokens = 12,000: $0.02

⚠️ Schema Rules

To keep extraction reliable:

  • Every field must include a description
  • Max 10 root fields
  • Max 3 nesting levels
  • Level 3 must contain only primitive values
  • Max 10 subfields per object

Supported types:

String
Number
Boolean
Integer
Array
Object
Enum

Tip: the best schemas are specific. Instead of asking for a vague "summary", define the business fields you actually want, such as products, pain_points, pricing, claims, cta, audience, or sentiment.


⚠️ Limitations

  • Transcript length limit: Very long videos (over 3 hours) may fail if the transcript exceeds the processing token limit.
  • Transcript availability: If a video has no available transcript, enable transcribe_if_transcript_missing to automatically generate one via speech-to-text. Currently, the transcription fallback does not support YouTube videos.
  • Fallback adds cost: Enabling transcript generation improves coverage but incurs an additional speech-to-text charge (only on the first run β€” transcripts are cached for subsequent extractions).

❓ FAQ

This actor processes publicly available video content and does not extract private user data such as email addresses, gender, or location β€” only information that users have chosen to share publicly. However, results may contain personal data. Personal data is protected by the GDPR in the European Union and by other regulations around the world. You should not extract personal data unless you have a legitimate reason to do so. If you're unsure whether your reason is legitimate, consult your lawyers.

What happens if a video has no transcript?

If transcribe_if_transcript_missing is enabled, the actor automatically generates a transcript using speech-to-text and then runs extraction. This works for most social video platforms. YouTube videos currently use available subtitles only.

Can I run different schemas on the same video?

Yes. Run the same video through multiple schemas β€” one for lead gen, another for content research, another for brand sentiment β€” and merge the results in your pipeline. Transcription is cached after the first run, so subsequent extractions only incur the base extraction cost.

What languages are supported?

The actor supports 99+ languages. Results are returned in the same language used in your schema descriptions and field definitions.

How do I integrate the results into my workflow?

You can download results as JSON, CSV, Excel, or HTML from the Apify dashboard. You can also access results programmatically via the API tab, connect to Zapier, Make, Google Sheets, or use webhooks for real-time data transfer.


Use this actor when you need structured JSON answers from video content.

Use one of the related actors below when your need is different or combine them for more powerful workflows:

  • Video Transcript Extractor: pay per result, $10 / 1000 results. Best for transcript retrieval plus rich metadata.
  • Video Transcript Scraper: rental model, $20 / month + usage. Best if you prefer the rental pricing model for transcript and metadata retrieval.
  • Video Transcriber: best when you need speech-to-text for videos that do not already have transcripts or subtitles.

Workflow ideas

  • Transcript + Extraction: Use Video Transcript Extractor to get raw transcripts for archiving, then run this actor on the same URLs with a custom schema for structured insights, two complementary outputs from a single video library.
  • Social monitoring pipeline: Schedule this actor to run daily on new creator or competitor video URLs. Feed the structured JSON into Google Sheets, a database, or a webhook for automated alerting.
  • Multi-schema analysis: Run the same video through multiple schemas, one for lead gen fields, another for content research, another for brand sentiment and merge the results in your pipeline.

πŸ’¬ Support

Feature requests and improvements are welcome.

Open an issue in the Issues tab if you need:

  • new schema capabilities
  • platform improvements
  • bug fixes
  • performance enhancements

Need a custom workflow or integration? Reach out through the Issues tab, we're happy to help tailor the actor to your use case.