Chatgpt Conversation Extractor
Pricing
from $50.00 / 1,000 results
Chatgpt Conversation Extractor
This scraper extracts the conversation history from public ChatGPT conversations
Pricing
from $50.00 / 1,000 results
Rating
0.0
(0)
Developer

KLINZINGER
Actor stats
0
Bookmarked
3
Total users
2
Monthly active users
10 days ago
Last modified
Categories
Share
An Apify Actor that extracts conversation data from publicly shared ChatGPT conversations. The Actor navigates to shared conversation URLs and extracts the full conversation history including all messages, timestamps, and metadata.
Overview
This Actor extracts conversation data from ChatGPT's publicly shared conversations by accessing the data embedded in the page through React Router's data loader. The data is not fetched via a separate API endpoint but is embedded server-side and accessible through the browser's JavaScript environment.
How It Works
- The Actor navigates to the provided ChatGPT share URLs using Puppeteer
- Waits for the page to fully load and React Router to initialize
- Extracts conversation data from
window.__reactRouterDataRouter.state.loaderData - Parses the conversation tree structure into a linear array of messages
- Outputs structured data including:
- Conversation metadata (title, timestamps, share ID)
- Parsed messages in chronological order
- Optionally, the complete raw conversation data
Input
The Actor accepts the following input parameters:
- startUrls (required): Array of ChatGPT share URLs to extract
- Example:
https://chatgpt.com/share/693011c8-0a3c-8006-b6cf-77d844d1bb51
- Example:
- includeRawData (optional, default:
true): Whether to include the complete raw conversation data in the output
Example Input
{"startUrls": [{"url": "https://chatgpt.com/share/693011c8-0a3c-8006-b6cf-77d844d1bb51"}],"includeRawData": true}
Output
The Actor outputs structured data to the dataset with the following fields:
- url: The ChatGPT share URL
- shareId: Extracted share ID from the URL
- title: Conversation title
- createTime: Unix timestamp when conversation was created
- updateTime: Unix timestamp when conversation was last updated
- messageCount: Number of messages in the conversation
- messages: Array of parsed messages, each containing:
role: Message role ("user" or "assistant")content: Message content texttimestamp: Unix timestamp when message was createdmessageId: Unique message identifierstatus: Message status
- rawData (if
includeRawDatais true): Complete raw conversation data with full tree structure
Example Output
{"url": "https://chatgpt.com/share/693011c8-0a3c-8006-b6cf-77d844d1bb51","shareId": "693011c8-0a3c-8006-b6cf-77d844d1bb51","title": "Example Conversation","createTime": 1764757960.044993,"updateTime": 1764757965.106983,"messageCount": 54,"messages": [{"role": "user","content": "Hello, how are you?","timestamp": 1764256500.3946629,"messageId": "message_id_1","status": "finished_successfully"},{"role": "assistant","content": "I'm doing well, thank you!","timestamp": 1764256501.1234567,"messageId": "message_id_2","status": "finished_successfully"}],"rawData": { /* complete raw conversation data */ }}
Data Structure
ChatGPT conversations are stored in a tree structure where:
- Each message has a
parentreference to its parent message - Each message has a
childrenarray with child message IDs - Messages are organized in threads/branches
- The Actor traverses this tree to extract messages in chronological order
Limitations
- Only works for publicly shared conversations
- Requires JavaScript execution (uses Puppeteer browser automation)
- Cannot access private conversations without authentication
- Data structure may change as ChatGPT updates their platform
- Rate limiting may apply if extracting many conversations
Use Cases
- Archiving publicly shared conversations
- Analyzing conversation patterns and structures
- Converting conversations to other formats (Markdown, CSV, etc.)
- Building conversation datasets for training or analysis
- Creating backups of shared conversations
- Research and analysis of AI conversation patterns
Getting Started
Local Development
- Install dependencies:
$npm install
- Run the Actor locally:
$apify run
The Actor will read input from storage/key_value_stores/default/INPUT.json. Create this file with your ChatGPT share URLs:
{"startUrls": [{"url": "https://chatgpt.com/share/YOUR_SHARE_ID"}]}
Deploy to Apify
- Log in to Apify:
$apify login
- Deploy your Actor:
$apify push
Technical Details
Extraction Method
The Actor uses the following approach to extract conversation data:
- Page Navigation: Uses Puppeteer to navigate to the ChatGPT share URL
- Wait for React Router: Waits for
window.__reactRouterDataRouterto be available - Data Extraction: Accesses the conversation data from:
window.__reactRouterDataRouter.state.loaderData['routes/share.$shareId.($action)'].serverResponse.data
- Tree Traversal: Parses the conversation tree structure by:
- Finding the root message (message without a parent)
- Traversing the tree recursively through children
- Extracting messages in chronological order
Error Handling
If extraction fails, the Actor will:
- Log detailed error information
- Push error data to the dataset for debugging
- Continue processing other URLs if multiple are provided
Resources
License
ISC


