Chatgpt Conversation Extractor avatar
Chatgpt Conversation Extractor

Pricing

from $50.00 / 1,000 results

Go to Apify Store
Chatgpt Conversation Extractor

Chatgpt Conversation Extractor

This scraper extracts the conversation history from public ChatGPT conversations

Pricing

from $50.00 / 1,000 results

Rating

0.0

(0)

Developer

KLINZINGER

KLINZINGER

Maintained by Community

Actor stats

0

Bookmarked

3

Total users

2

Monthly active users

10 days ago

Last modified

Share

An Apify Actor that extracts conversation data from publicly shared ChatGPT conversations. The Actor navigates to shared conversation URLs and extracts the full conversation history including all messages, timestamps, and metadata.

Overview

This Actor extracts conversation data from ChatGPT's publicly shared conversations by accessing the data embedded in the page through React Router's data loader. The data is not fetched via a separate API endpoint but is embedded server-side and accessible through the browser's JavaScript environment.

How It Works

  1. The Actor navigates to the provided ChatGPT share URLs using Puppeteer
  2. Waits for the page to fully load and React Router to initialize
  3. Extracts conversation data from window.__reactRouterDataRouter.state.loaderData
  4. Parses the conversation tree structure into a linear array of messages
  5. Outputs structured data including:
    • Conversation metadata (title, timestamps, share ID)
    • Parsed messages in chronological order
    • Optionally, the complete raw conversation data

Input

The Actor accepts the following input parameters:

  • startUrls (required): Array of ChatGPT share URLs to extract
    • Example: https://chatgpt.com/share/693011c8-0a3c-8006-b6cf-77d844d1bb51
  • includeRawData (optional, default: true): Whether to include the complete raw conversation data in the output

Example Input

{
"startUrls": [
{
"url": "https://chatgpt.com/share/693011c8-0a3c-8006-b6cf-77d844d1bb51"
}
],
"includeRawData": true
}

Output

The Actor outputs structured data to the dataset with the following fields:

  • url: The ChatGPT share URL
  • shareId: Extracted share ID from the URL
  • title: Conversation title
  • createTime: Unix timestamp when conversation was created
  • updateTime: Unix timestamp when conversation was last updated
  • messageCount: Number of messages in the conversation
  • messages: Array of parsed messages, each containing:
    • role: Message role ("user" or "assistant")
    • content: Message content text
    • timestamp: Unix timestamp when message was created
    • messageId: Unique message identifier
    • status: Message status
  • rawData (if includeRawData is true): Complete raw conversation data with full tree structure

Example Output

{
"url": "https://chatgpt.com/share/693011c8-0a3c-8006-b6cf-77d844d1bb51",
"shareId": "693011c8-0a3c-8006-b6cf-77d844d1bb51",
"title": "Example Conversation",
"createTime": 1764757960.044993,
"updateTime": 1764757965.106983,
"messageCount": 54,
"messages": [
{
"role": "user",
"content": "Hello, how are you?",
"timestamp": 1764256500.3946629,
"messageId": "message_id_1",
"status": "finished_successfully"
},
{
"role": "assistant",
"content": "I'm doing well, thank you!",
"timestamp": 1764256501.1234567,
"messageId": "message_id_2",
"status": "finished_successfully"
}
],
"rawData": { /* complete raw conversation data */ }
}

Data Structure

ChatGPT conversations are stored in a tree structure where:

  • Each message has a parent reference to its parent message
  • Each message has a children array with child message IDs
  • Messages are organized in threads/branches
  • The Actor traverses this tree to extract messages in chronological order

Limitations

  • Only works for publicly shared conversations
  • Requires JavaScript execution (uses Puppeteer browser automation)
  • Cannot access private conversations without authentication
  • Data structure may change as ChatGPT updates their platform
  • Rate limiting may apply if extracting many conversations

Use Cases

  • Archiving publicly shared conversations
  • Analyzing conversation patterns and structures
  • Converting conversations to other formats (Markdown, CSV, etc.)
  • Building conversation datasets for training or analysis
  • Creating backups of shared conversations
  • Research and analysis of AI conversation patterns

Getting Started

Local Development

  1. Install dependencies:
$npm install
  1. Run the Actor locally:
$apify run

The Actor will read input from storage/key_value_stores/default/INPUT.json. Create this file with your ChatGPT share URLs:

{
"startUrls": [
{
"url": "https://chatgpt.com/share/YOUR_SHARE_ID"
}
]
}

Deploy to Apify

  1. Log in to Apify:
$apify login
  1. Deploy your Actor:
$apify push

Technical Details

Extraction Method

The Actor uses the following approach to extract conversation data:

  1. Page Navigation: Uses Puppeteer to navigate to the ChatGPT share URL
  2. Wait for React Router: Waits for window.__reactRouterDataRouter to be available
  3. Data Extraction: Accesses the conversation data from:
    window.__reactRouterDataRouter.state.loaderData['routes/share.$shareId.($action)'].serverResponse.data
  4. Tree Traversal: Parses the conversation tree structure by:
    • Finding the root message (message without a parent)
    • Traversing the tree recursively through children
    • Extracting messages in chronological order

Error Handling

If extraction fails, the Actor will:

  • Log detailed error information
  • Push error data to the dataset for debugging
  • Continue processing other URLs if multiple are provided

Resources

License

ISC