Dataset2GPT
2 hours trial then $7.95/month - No credit card required now
Dataset2GPT
2 hours trial then $7.95/month - No credit card required now
Dataset2GPT reads any dataset on Apify and uses GPT-based AI to perform a variety of text-processing tasks—such as summarization, analysis, classification, or transformation. Whether you want a comprehensive 2,000-word synopsis, key insights, sentiment analysis, or custom NLP transformations, Dataset2GPT can handle it.
For a complete tutorial on how to integrate Dataset2GPT to analyse Reddit comments in bulk, please see our tutorial using our gpt-powered summary actor here.
Features
- GPT-Powered Processing using GPT-4o-mini for best price/performance ratio
- Flexible Output Length: Control how long or detailed the result should be (100–2,000 words or more)
- Task Focus: Provide a specific topic or angle for the AI to focus on (e.g., “marketing analysis,” “customer feedback,” “sentiment extraction”)
- Token Usage & Cost Tracking: Monitor how many tokens the AI uses and the associated cost
- PDF Export with Markdown formatting for easy sharing
- Optional Email Delivery of processed results
How It Works
- Input: Your Apify actor collects or generates data in a dataset.
- Processing: Dataset2GPT reads the dataset content as plain text, then uses GPT to perform your desired task.
- AI Output: The AI generates the requested output (e.g., summary, analysis, transformation).
- Delivery: Retrieve the output in multiple formats (dataset record, PDF, email).
Benefits
- Rapid Insights: Quickly extract meaning or structure from large datasets.
- Plain Text Processing: No complex parsing required—just feed text.
- Seamless Integration: Works with any Apify actor that outputs to a dataset.
- Automation: Set up once and let Dataset2GPT handle the heavy lifting.
Perfect For
- Market Researchers
- Social Media Managers
- Data Scientists
- Business Intelligence Teams
- Content Analysts
- Anyone working with scraped or collected data
Input Parameters
Parameter | Type | Required | Default | Description |
---|---|---|---|---|
datasetId | string | No | - | Dataset ID for testing. When used as an integration, dataset ID is handled automatically |
openaiApiKey | string | Yes | - | Your OpenAI API key for GPT-based processing |
summaryLength | integer | No | 1000 | Desired length of output in words (100–2000). You can also use it for controlling output detail |
focusTopic | string | No | - | Specific topic or angle to guide the AI’s processing (e.g., “sentiment analysis” or “key points”) |
targetEmail | string | No | - | Email address to send the results to (requires EMAIL_SUPPORT=true) |
Output Formats
-
Dataset Record
1{ 2 "output": "The AI-generated text (e.g., summary, analysis, transformations)...", 3 "tokenUsage": { 4 "promptTokens": 1234, 5 "completionTokens": 567, 6 "totalTokens": 1801 7 }, 8 "costs": { 9 "promptCost": 0.001234, 10 "completionCost": 0.000567, 11 "totalCost": 0.001801 12 } 13}
- The
output
field contains the GPT-processed text. - Token usage and cost breakdown are provided for transparency.
- The
-
PDF Output
- Stored in a key-value store under the
OUTPUT
key. - Includes Markdown formatting for better readability.
- Downloadable via the Apify console or API.
- Stored in a key-value store under the
-
Email Delivery (Optional)
- Sends the GPT output in HTML and plain text formats.
- Requires
EMAIL_SUPPORT=true
and a validtargetEmail
.
Getting Started
- Add Dataset2GPT to Your Apify Actor: For example, if you have an actor scraping Reddit or Twitter, attach Dataset2GPT as an integration step.
- Use the Default Dataset: In your input configuration, set
"datasetId": "{{resource.defaultDatasetId}}"
to automatically consume the scraped data. - Run Your Apify Actor: Once it finishes, Dataset2GPT will read the dataset and produce the requested AI output.
- Retrieve Your Results:
- Download from the dataset.
- Grab the PDF from the key-value store.
- Or check your inbox if you enabled email delivery.
Usage Examples
As an Integration (Recommended)
1{ 2 "openaiApiKey": "your-openai-api-key", 3 "summaryLength": 1500, 4 "focusTopic": "product feedback analysis" 5}
Here, Dataset2GPT will generate a ~1500-word analysis focusing on product feedback.
For Testing
1{ 2 "datasetId": "your-dataset-id", 3 "openaiApiKey": "your-openai-api-key", 4 "summaryLength": 1000 5}
Useful for local testing. Specify a known
datasetId
.
With Email Delivery
1{ 2 "openaiApiKey": "your-openai-api-key", 3 "targetEmail": "user@example.com", 4 "focusTopic": "sentiment extraction" 5}
Generates text focusing on sentiments and sends it to
user@example.com
.
Environment Variables
EMAIL_SUPPORT
: Enable/disable email functionality (default: true)MAX_PARALLEL_REQUESTS
: Controls parallel API requests (default: 10)PROMPT_TOKEN_COST
: Cost per 1M prompt tokens (default: 0.150)COMPLETION_TOKEN_COST
: Cost per 1M completion tokens (default: 0.075)
With Dataset2GPT, turn any dataset into meaningful insights, summaries, transformations, or advanced NLP tasks—powered by GPT.
Actor Metrics
1 monthly user
-
0 No stars yet
>99% runs succeeded
Created in Jan 2025
Modified 5 days ago