Sora Scraper avatar
Sora Scraper

Pricing

Pay per event

Go to Apify Store
Sora Scraper

Sora Scraper

Developed by

Lexis Solutions

Lexis Solutions

Maintained by Community

Discover AI-generated video insights from OpenAI’s Sora 2 community—extract posts, media, user profiles, comments, and engagement metrics. Perfect for trend analysis, content curation, influencer tracking, and research. Fast, reliable, and fully customizable.

5.0 (4)

Pricing

Pay per event

7

16

16

Last modified

2 days ago

Sora Scraper

Sora Scraper

The Sora Scraper is an Apify actor for OpenAI's Sora 2 — the next generation of AI video creation platform. Extract posts, videos, engagement metrics, user profiles, and comments from Sora's community with unprecedented ease.


✨ Key Features

  • 🚀 First-to-Market: The only scraper built specifically for Sora 2.
  • 🎬 Media Downloads: Download videos, thumbnails, and GIFs directly from posts.
  • 💬 Comment Extraction: Capture detailed comment threads with engagement data.
  • 👤 Rich Profile Data: Extract complete user profiles including followers, verified status, and more.
  • 📊 Engagement Metrics: Views, unique views, likes, remixes, replies, and recursive replies.
  • 🔍 Flexible Search: Query any topic and discover Sora's creative community.
  • ⚙️ Granular Control: Configure comment limits, media downloads, and result counts.
  • 📦 Structured Output: Normalized JSON data ready for analysis and integration.

💡 Why It's Important

Sora represents OpenAI's breakthrough in AI-generated video content, and Sora 2 is the latest evolution. With this scraper, you can:

  • Monitor trending content in the AI video generation space.
  • Analyze user engagement patterns and viral content characteristics.
  • Archive creative works for research, inspiration, or competitive analysis.
  • Track community growth and user interactions in real-time.
  • Build datasets for AI content analysis, sentiment studies, and trend forecasting.

👤 Who Is It For?

  • AI Researchers studying generative video trends and user behavior.
  • Content Creators seeking inspiration and understanding what resonates.
  • Marketing Agencies monitoring brand mentions and creative trends.
  • Data Scientists building datasets for machine learning and analytics.
  • Media Companies tracking viral content and emerging creators.
  • Developers building applications around AI-generated content.

🚀 Business Use Cases

  • Trend Analysis: Identify viral prompts, themes, and creative patterns.
  • Content Curation: Aggregate and showcase top Sora creations.
  • Competitive Intelligence: Track how competitors use AI video generation.
  • Influencer Discovery: Find trending creators and their engagement rates.
  • Brand Monitoring: Track mentions and sentiment around your brand.
  • Research & Development: Build datasets for AI content analysis.
  • Market Research: Understand user preferences in AI-generated content.

🛠 Input Schema

The actor accepts the following input:

{
"query": "SpongeBob",
"numOfComments": 10,
"downloadVideo": false,
"downloadThumbnail": false,
"downloadGIF": false,
"maxItems": 10,
"proxyConfiguration": {
"useApifyProxy": false
}
}

Input Parameters

ParameterTypeRequiredDescription
querystringYesSearch query to find Sora posts (e.g., "SpongeBob", "sunset timelapse")
numOfCommentsintegerNoMaximum number of comments to extract per post (default: 10)
downloadVideobooleanNoDownload video files to key-value store. Key stored in videoStoreKey (default: false)
downloadThumbnailbooleanNoDownload thumbnail images to key-value store. Key stored in thumbnailStoreKey (default: false)
downloadGIFbooleanNoDownload GIF previews to key-value store. Key stored in gifStoreKey (default: false)
maxItemsintegerNoMaximum number of posts to scrape (default: 10)
proxyConfigurationobjectNoApify proxy configuration for requests

Notes:

  • Required field: query is mandatory.
  • Media Downloads: Enabling video, thumbnail, or GIF downloads will increase run time but provide direct access to media files in the key-value store.
  • Comments: Set numOfComments to control how many comments are extracted per post. Comments include full profile data and engagement metrics.
  • Performance: Higher maxItems and media downloads will consume more resources and time.

📦 Output Schema

Each dataset item contains comprehensive post data:

{
"id": "s_68dca5d7d4ac8191987e9c6393d498d4",
"text": "spongebob as a ww2 leader speaking about the scourge of fish ruining bikini bottom wearing axis power uniform",
"caption": null,
"link": "https://sora.chatgpt.com/p/s_68dca5d7d4ac8191987e9c6393d498d4",
"coverUrl": "https://videos.openai.com/vg-assets/...",
"gifUrl": "https://videos.openai.com/vg-assets/...",
"postedAt": 1759290839.830908,
"updatedAt": 1759936985.530838,
"likes": 1289,
"replies": 43,
"views": 39477,
"uniqueViews": 24713,
"remixes": 76,
"recursiveReplies": 70,
"dislikeCount": 0,
"workspaceId": null,
"postedToPublic": true,
"emoji": "🧽",
"attachments": [
{
"id": "s_68dca5d7d4ac8191987e9c6393d498d4-attachment-0",
"title": "New Video",
"url": "https://sdmntprsouthcentralus.oaiusercontent.com/files/...",
"downloadableUrl": "https://sdmntprsouthcentralus.oaiusercontent.com/files/...",
"thumbnail": "https://videos.openai.com/vg-assets/...",
"gif": "https://videos.openai.com/vg-assets/...",
"width": 352,
"height": 640,
"generationId": "gen_01k6eyadhqezmskzd31pp2n2xm",
"generationType": "video_gen"
}
],
"profile": {
"id": "user-vmw00GfT7mSYdcIST7bLbwCF",
"username": "jakeleventhal",
"displayName": "Jake Leventhal",
"profilePictureUrl": "https://sdmntprnorthcentralus.oaiusercontent.com/files/...",
"coverPhotoUrl": null,
"link": "https://sora.chatgpt.com/profile/jakeleventhal",
"verified": false,
"followerCount": 2664,
"followingCount": 7,
"postCount": 61,
"replyCount": 0,
"likesReceivedCount": 22854,
"remixCount": 1606,
"cameoCount": 33,
"isBlocked": false,
"followedBy": [],
"planType": null,
"createdAt": 1753852741.285583,
"updatedAt": 1759951105.520806,
"bannedAt": null,
"calpicoIsEnabled": true,
"soraWhoCanMessageMe": "followees_only",
"isPublicFigure": false,
"location": null,
"description": null,
"birthday": null,
"website": null
},
"videoStoreKey": "s_68dca5d7d4ac8191987e9c6393d498d4_video_0.mp4",
"thumbnailStoreKey": "s_68dca5d7d4ac8191987e9c6393d498d4_thumbnail_0.webp",
"gifStoreKey": "s_68dca5d7d4ac8191987e9c6393d498d4_gif_0.gif",
"comments": [
{
"id": "68dcb87374948191bc6c9f88b5ea723e",
"text": "Ts gonna be the reason Viacom gonna shut this down😭😭",
"caption": null,
"postedAt": 1759295603.455459,
"updatedAt": 1759530609.903473,
"likes": 16,
"parentPostId": "s_68dca5d7d4ac8191987e9c6393d498d4",
"rootPostId": "s_68dca5d7d4ac8191987e9c6393d498d4",
"postUrl": "https://sora.chatgpt.com/p/s_68dca5d7d4ac8191987e9c6393d498d4",
"profile": {
"id": "user-PDq6JrFlZ0qjFVKrdeAmiTnh",
"username": "skipppz",
"displayName": "C",
"profilePictureUrl": "https://cdn.openai.com/sora/images/profile_placeholder_v4.png",
"verified": false,
"followerCount": 1,
"followingCount": 2,
"postCount": 9,
"replyCount": 9,
"likesReceivedCount": 98,
"remixCount": 2,
"cameoCount": 0
}
}
]
}

Output Fields Explained

Post Data

  • id: Unique post identifier
  • text: The prompt/description used to generate the video
  • caption: Optional caption text
  • link: Direct link to the post on Sora
  • coverUrl: URL to the cover image
  • gifUrl: URL to the animated GIF preview
  • emoji: Associated emoji for the post

Engagement Metrics

  • likes: Number of likes
  • replies: Direct reply count
  • views: Total view count
  • uniqueViews: Unique viewer count
  • remixes: Number of times the video was remixed
  • recursiveReplies: Total replies including nested threads
  • dislikeCount: Number of dislikes

Timestamps

  • postedAt: Unix timestamp when post was created
  • updatedAt: Unix timestamp of last update

Attachments

  • id: Attachment identifier
  • title: Attachment title
  • url: Direct video URL
  • downloadableUrl: URL for downloading
  • thumbnail: Thumbnail image URL
  • gif: GIF preview URL
  • width / height: Video dimensions
  • generationId: Sora generation ID
  • generationType: Type of generation (e.g., "video_gen")

Profile Data

Complete user profile including:

  • Username, display name, profile picture
  • Verification status
  • Follower/following counts
  • Post and reply counts
  • Likes received, remix count, cameo count
  • Account creation and update timestamps
  • Privacy settings and location

Downloaded Media Keys

  • videoStoreKey: Key-value store key for downloaded video (provided only when downloadVideo is enabled)
  • thumbnailStoreKey: Key-value store key for downloaded thumbnail (provided only when downloadThumbnail is enabled)
  • gifStoreKey: Key-value store key for downloaded GIF (provided only when downloadGIF is enabled)

Comments

Array of comment objects with:

  • Comment text and timestamps
  • Like counts
  • Parent and root post IDs
  • Full profile data for commenter
  • Post URL for context

🎯 Advanced Features

Media Download System

When you enable media downloads (downloadVideo, downloadThumbnail, or downloadGIF), files are automatically saved to Apify's key-value store with predictable keys:

  • Videos: {postId}_video_{index}.mp4
  • Thumbnails: {postId}_thumbnail_{index}.webp
  • GIFs: {postId}_gif_{index}.gif

Access downloaded files programmatically or through the Apify console's key-value store tab.

Comment Threading

Comments maintain parent-child relationships through parentPostId and rootPostId fields, allowing you to reconstruct conversation threads. Each comment includes:

  • Full commenter profile
  • Engagement metrics (likes)
  • Timestamps for tracking conversation flow

Engagement Analytics

Track multiple engagement dimensions:

  • Virality: views and uniqueViews show reach
  • Interaction: likes, replies, and recursiveReplies measure engagement depth
  • Creativity: remixes show how content inspires others
  • Trend tracking: Compare metrics across posts to identify patterns

🔧 Best Practices

  1. Start Small: Test with maxItems: 10 to understand output structure before scaling.
  2. Media Downloads: Only enable media downloads when necessary — they significantly increase run time.
  3. Comment Limits: Adjust numOfComments based on your needs. High-engagement posts can have hundreds of comments.
  4. Proxy Configuration: Use Apify proxies for reliable access and to respect rate limits.

🌟 Why Choose Our Sora Scraper?

First to Market — The only Sora 2 scraper available
Comprehensive Data — Posts, profiles, comments, engagement metrics
Media Support — Download videos, thumbnails, and GIFs
Production Ready — Structured output, error handling, proxy support
Well Maintained — Regular updates as Sora evolves
Expert Support — Backed by certified Apify Partners


👀 p.s.

Got feedback or need an extension?

Lexis Solutions is a certified Apify Partner. We can help you with custom solutions or data extraction projects.

Contact us over Email or LinkedIn

Support Our Work 💝

If you're happy with our work and scrapers, you're welcome to leave us a company review here and leave a review for the scrapers you're subscribed to. It will take you less than a minute but it will mean a lot to us!

Image Credit: https://sora.chatgpt.com/