Polars AI Data Transformer
Pricing
Pay per event
Polars AI Data Transformer
Transform datasets using natural language. Upload CSV/Excel/JSON, describe your transformation in plain English, get results + reusable Python code. Powered by AI.
Pricing
Pay per event
Rating
0.0
(0)
Developer
Salesmart Srl
Actor stats
0
Bookmarked
2
Total users
1
Monthly active users
5 days ago
Last modified
Categories
Share
AI Data Transformer
Transform any dataset using plain English. No coding required.
Describe what you want in natural language, get transformed data + reusable Python code.
Table of Contents
- Quick Start
- Getting Your Apify API Token
- Pricing
- Choosing the Right Mode
- 4 Operating Modes
- Complete Input Options
- How It Works
- Writing Effective Prompts
- API Examples
- Output Format
Quick Start
Option 1: With file URL
curl -X POST "https://api.apify.com/v2/acts/salesmart-srl~polars-ai-data-transformer/runs?token=YOUR_APIFY_TOKEN" \-H "Content-Type: application/json" \-d '{"datasetUrls": ["https://example.com/data.csv"],"prompt": "Group by country and sum sales"}'
Option 2: Direct JSON data (no file needed!)
curl -X POST "https://api.apify.com/v2/acts/salesmart-srl~polars-ai-data-transformer/run-sync-get-dataset-items?token=YOUR_APIFY_TOKEN" \-H "Content-Type: application/json" \-d '{"inputData": [{"product": "iPhone", "price": 999, "qty": 10},{"product": "iPad", "price": 799, "qty": 5}],"prompt": "Calculate total value (price * qty) for each product"}'
Response includes output_data with all transformed rows directly in JSON.
That's it. No LLM API key needed for Basic mode.
Getting Your Apify API Token
To use this Actor via API, you need an Apify API token.
Step 1: Create Apify Account
Go to apify.com and sign up (free).
Step 2: Get Your API Token
- Log in to Apify Console
- Click your profile icon (top right)
- Go to Settings → Integrations
- Copy your Personal API Token
Your token looks like: apify_api_xxxxxxxxxxxxxxxxxxxxx
Step 3: Use the Token
Option A: Query parameter
https://api.apify.com/v2/acts/salesmart-srl~polars-ai-data-transformer/runs?token=YOUR_TOKEN
Option B: Authorization header
Authorization: Bearer YOUR_TOKEN
Free Tier
Apify offers $5 free credits monthly. Basic transformations cost ~$0.0015 each, so you get ~3,000 free transformations per month.
Pricing
Pay-per-event pricing. You only pay when a transformation runs successfully.
| Mode | Apify Fee | LLM Cost | Total Cost |
|---|---|---|---|
| Basic | $0.0015 | Included | $0.0015 |
| Premium | $0.20 | Included | $0.20 |
| BYOK | $0.001 | Your API | $0.001 + API |
| BYOK Premium | $0.001 | Your API | $0.001 + API |
Volume Discounts
| Tier | BYOK | Basic | Premium |
|---|---|---|---|
| No Discount | $0.001 | $0.0015 | $0.20 |
| Bronze | $0.0007 | $0.00117 | $0.167 |
| Silver | $0.0004 | $0.00083 | $0.133 |
| Gold | $0.0001 | $0.0005 | $0.10 |
Choosing the Right Mode
Decision Tree
Do you need Google Search grounding or complex reasoning?│├─ NO → Do you have your own LLM API key?│ ││ ├─ NO → Use BASIC ($0.0015)│ │ Simple, fast, no setup│ ││ └─ YES → Use BYOK ($0.001 + your API)│ Lowest cost if you have free API credits│└─ YES → Do you have a Google API key?│├─ NO → Use PREMIUM ($0.20)│ All features included, no setup│└─ YES → Use BYOK PREMIUM ($0.001 + your API)Same features as Premium, use your credits
Mode Comparison
| Feature | Basic | Premium | BYOK | BYOK Premium |
|---|---|---|---|---|
| Simple aggregations | Yes | Yes | Yes | Yes |
| Filtering & sorting | Yes | Yes | Yes | Yes |
| Data cleaning | Yes | Yes | Yes | Yes |
| E-commerce migrations | Limited | Best | Limited | Best |
| Google Search grounding | No | Yes | No | Yes |
| Extended thinking/reasoning | No | Yes | No | Yes |
| RAG memory (learns over time) | Yes | Yes | Yes | Yes |
| Requires LLM API key | No | No | Yes | Yes |
| Cost | $0.0015 | $0.20 | $0.001+API | $0.001+API |
Premium vs BYOK Premium: What's the Difference?
Nothing, except billing.
Both modes use:
- Gemini 2.5 Pro with extended thinking
- Google Search grounding
- RAG memory system
The only difference:
- Premium: We pay Google, you pay us $0.20
- BYOK Premium: You pay Google directly, you pay us $0.001
When to use BYOK Premium:
- You have Google Cloud credits
- You have an enterprise Google agreement
- You want to track API usage in your own Google console
4 Operating Modes
Mode 1: Basic (Hosted)
Use when: Simple transformations, high volume, budget-conscious
{"datasetUrls": ["https://example.com/data.csv"],"prompt": "Group by country and sum sales, show top 10"}
What you get:
- Gemini 2.5 Flash-Lite (fast, efficient)
- No API key required
- $0.0015 per transformation
Good for:
- Aggregations (sum, count, average)
- Filtering and sorting
- Basic calculations
- Data reformatting
Not ideal for:
- "Convert to Shopify format" (doesn't know Shopify schema)
- Complex multi-step reasoning
Mode 2: Premium (Hosted)
Use when: Complex transformations, e-commerce migrations, need accuracy
{"datasetUrls": ["https://example.com/magento-products.csv"],"prompt": "Transform to Shopify product import format","useAdvancedFeatures": true}
What you get:
- Gemini 2.5 Pro (most capable model)
- Extended thinking (reasons through complex problems)
- Google Search grounding (knows external formats)
- RAG memory (improves over time)
- No API key required
- $0.20 per transformation
Good for:
- E-commerce platform migrations (Magento→Shopify, etc.)
- Format conversions (to Stripe, Mailchimp, etc.)
- Complex multi-step transformations
- Tasks requiring external knowledge
Why it costs more:
- Uses Gemini Pro (~$1.25/1M tokens)
- Google Search queries (~$35/1K queries)
- We bundle these costs into a flat $0.20 fee
Mode 3: BYOK (Bring Your Own Key)
Use when: You have LLM API credits, want lowest cost
{"datasetUrls": ["https://example.com/data.csv"],"prompt": "Filter active users and calculate totals","llmProvider": "groq","groqApiKey": "gsk_..."}
What you get:
- Your choice of LLM provider
- RAG memory (improves over time)
- $0.001 Apify fee + your API costs
Supported providers:
| Provider | Model | API Cost | Get Key |
|---|---|---|---|
| Groq | Llama 3.3 70B | FREE tier | console.groq.com |
| Gemini 2.0 Flash | ~$0.10/1M tokens | aistudio.google.com | |
| OpenAI | GPT-4o | ~$5/1M tokens | platform.openai.com |
| Anthropic | Claude Sonnet 4 | ~$3/1M tokens | console.anthropic.com |
Recommended: Groq (FREE)
Groq offers a generous free tier. Combined with our $0.001 fee, you can run thousands of transformations for almost nothing.
Mode 4: BYOK Premium
Use when: You have Google API credits AND need Premium features
{"datasetUrls": ["https://example.com/products.csv"],"prompt": "Convert to Shopify product CSV format","llmProvider": "google","googleApiKey": "AIza...","useAdvancedFeatures": true}
What you get:
- Same as Premium: Gemini Pro + Google Search + RAG
- Uses YOUR Google API key
- $0.001 Apify fee + your Google API costs
Your Google API costs:
- Gemini 2.5 Pro: ~$1.25/1M input, ~$5/1M output tokens
- Google Search grounding: ~$35 per 1,000 queries
Why use this instead of Premium?
- You have Google Cloud credits to use up
- Your company has a Google enterprise agreement
- You want API usage in your own Google console
- You're doing very high volume and want direct billing
Complete Input Options
Required
| Field | Type | Description |
|---|---|---|
prompt | string | Natural language description of transformation |
Data Sources (at least one required)
| Field | Type | Description |
|---|---|---|
inputData | array | Direct JSON data - no file hosting needed! |
datasetUrls | string[] | URLs to data files (CSV, Excel, JSON, Parquet) |
uploadedFiles | file[] | Direct file uploads via Apify Console |
apifyDatasetId | string | ID of existing Apify dataset |
Recommended: inputData for API integrations - single call with data in, results out.
Mode Selection
| Field | Type | Default | Description |
|---|---|---|---|
useAdvancedFeatures | boolean | false | Enable Premium features (reasoning + grounding) |
llmProvider | string | - | BYOK provider: groq, google, openai, anthropic |
groqApiKey | string | - | Your Groq API key |
googleApiKey | string | - | Your Google API key |
openaiApiKey | string | - | Your OpenAI API key |
anthropicApiKey | string | - | Your Anthropic API key |
Output Options
| Field | Type | Default | Description |
|---|---|---|---|
outputFormat | string | csv | Output format: csv, json, parquet, xlsx |
includeGeneratedCode | boolean | true | Include Python code in output |
maxRetries | number | 3 | Max code generation retry attempts |
Mode Selection Logic
IF llmProvider is set AND corresponding API key is provided:IF useAdvancedFeatures is true AND llmProvider is "google":→ BYOK PREMIUM (your Google key + Premium features)ELSE:→ BYOK (your key, basic features)ELSE:IF useAdvancedFeatures is true:→ PREMIUM (hosted, $0.20)ELSE:→ BASIC (hosted, $0.0015)
How It Works
Processing Pipeline
1. INPUT VALIDATION├─ Parse prompt and options├─ Detect mode (Basic/Premium/BYOK)└─ Validate data URLs2. DATA LOADING├─ Load from inputData (direct JSON - zero I/O!)├─ Or fetch from URLs (CSV, Excel, JSON, Parquet)├─ Auto-detect format and encoding├─ Extract schema (column names, types, sample values)└─ Handle multiple sources (auto-merge)3. RAG SEARCH├─ Search Pinecone for similar past transformations├─ If found (>85% similarity), include as context└─ Helps LLM generate better code4. CODE GENERATION├─ Send prompt + schema + RAG context to LLM├─ LLM generates Polars transformation code└─ Validate code structure5. EXECUTION├─ Execute code in sandboxed environment├─ Validate output (no empty results, correct types)└─ Retry if errors (up to maxRetries)6. OUTPUT├─ Export transformed data (CSV/JSON/Parquet/Excel)├─ Save generated code└─ Return metadata (rows, timing, etc.)7. LEARNING├─ Save successful transformation to Pinecone└─ Future similar requests benefit from this
RAG Memory System
The system learns from every successful transformation:
- Before generation: Searches for similar prompts in Pinecone
- If found: Includes similar code as context for better results
- After success: Saves the new transformation
- Over time: Accuracy improves as memory grows
Current memory: 22+ successful transformations and growing.
Writing Effective Prompts
Structure
[ACTION] + [COLUMNS] + [CONDITIONS] + [OUTPUT]
Examples by Complexity
Simple (use Basic):
Group by 'region' column, sum 'revenue', sort descending, top 10
Medium (use Basic or Premium):
Filter rows where status is 'active' and created_at > 2024-01-01,calculate total and average order_value per customer
Complex (use Premium):
Convert Magento 2 product export to Shopify CSV format:- sku -> Handle (lowercase, replace spaces with dashes)- name -> Title- description -> Body (HTML)- price -> Variant Price- qty -> Variant Inventory Qty- product_online -> Published (1=true, 0=false)Only include simple products (exclude configurable/bundle)Add Vendor column with value "Imported from Magento"
Tips
| Do | Don't |
|---|---|
| Name specific columns | Say "transform the data" |
| Specify output format | Assume system knows your schema |
| Use Premium for migrations | Use Basic for Shopify/Stripe formats |
| Break complex tasks into steps | Write 500-word prompts |
API Examples
Direct JSON Input (Recommended for API)
Single call with data in, results out. No file hosting needed.
curl -X POST "https://api.apify.com/v2/acts/salesmart-srl~polars-ai-data-transformer/run-sync-get-dataset-items?token=YOUR_TOKEN" \-H "Content-Type: application/json" \-d '{"inputData": [{"product": "iPhone", "price": 999, "quantity": 10},{"product": "iPad", "price": 799, "quantity": 5},{"product": "MacBook", "price": 1999, "quantity": 3}],"prompt": "Calculate total_value = price * quantity, sort by total_value descending","outputFormat": "json"}'
Response includes output_data array with all transformed rows.
Basic Mode (with URL)
curl -X POST "https://api.apify.com/v2/acts/salesmart-srl~polars-ai-data-transformer/runs?token=YOUR_TOKEN" \-H "Content-Type: application/json" \-d '{"datasetUrls": ["https://example.com/sales.csv"],"prompt": "Group by region, sum revenue, sort descending"}'
Premium Mode
curl -X POST "https://api.apify.com/v2/acts/salesmart-srl~polars-ai-data-transformer/runs?token=YOUR_TOKEN" \-H "Content-Type: application/json" \-d '{"datasetUrls": ["https://example.com/magento-products.csv"],"prompt": "Transform to Shopify product import CSV format","useAdvancedFeatures": true}'
BYOK Mode (Groq - FREE)
curl -X POST "https://api.apify.com/v2/acts/salesmart-srl~polars-ai-data-transformer/runs?token=YOUR_TOKEN" \-H "Content-Type: application/json" \-d '{"datasetUrls": ["https://example.com/data.csv"],"prompt": "Calculate monthly trends","llmProvider": "groq","groqApiKey": "gsk_xxxxx"}'
BYOK Premium Mode
curl -X POST "https://api.apify.com/v2/acts/salesmart-srl~polars-ai-data-transformer/runs?token=YOUR_TOKEN" \-H "Content-Type: application/json" \-d '{"datasetUrls": ["https://example.com/products.csv"],"prompt": "Convert to Shopify format with all required columns","llmProvider": "google","googleApiKey": "AIza_xxxxx","useAdvancedFeatures": true}'
Python SDK
from apify_client import ApifyClientclient = ApifyClient("YOUR_APIFY_TOKEN")# Direct JSON input (recommended)run = client.actor("salesmart-srl/polars-ai-data-transformer").call(run_input={"inputData": [{"product": "iPhone", "price": 999, "qty": 10},{"product": "iPad", "price": 799, "qty": 5},],"prompt": "Calculate total = price * qty, sort descending",})# Get results from dataset (includes output_data)dataset = client.dataset(run["defaultDatasetId"])items = list(dataset.iterate_items())result = items[0]print(f"Status: {result['status']}")print(f"Output rows: {result['output_rows']}")print(f"Transformed data: {result['output_data']}") # Full data!print(f"Generated code: {result['generated_code']}")# With file URLrun = client.actor("salesmart-srl/polars-ai-data-transformer").call(run_input={"datasetUrls": ["https://example.com/data.csv"],"prompt": "Group by category and sum sales",})# Premium transformationrun = client.actor("salesmart-srl/polars-ai-data-transformer").call(run_input={"datasetUrls": ["https://example.com/products.csv"],"prompt": "Convert to Shopify product CSV","useAdvancedFeatures": True,})
Output Format
Response Structure
{"status": "success","input_sources_count": 1,"input_rows_total": 1000,"input_columns": ["sku", "name", "price", "qty"],"output_rows": 50,"output_columns": ["Handle", "Title", "Variant Price"],"output_file": "transformed_data.csv","execution_time_ms": 1234,"generation_info": {"provider": "google_pro","tokens_used": 4500,"generation_time_ms": 890,"attempts": 1},"generated_code": "import polars as pl\n\nresult = ...","output_preview": [{"Handle": "product-1", "Title": "Product One", "Variant Price": 29.99}],"output_data": [{"Handle": "product-1", "Title": "Product One", "Variant Price": 29.99},{"Handle": "product-2", "Title": "Product Two", "Variant Price": 49.99}],"warnings": [],"errors": []}
Output Fields
| Field | Description |
|---|---|
output_preview | First 10 rows (always present) |
output_data | Full transformed data (if < 10MB) - use this for API integrations! |
output_file | Filename in Key-Value Store (for large files) |
Generated Code
Every transformation returns reusable Python code:
import polars as pl# Load your datadf = pl.read_csv("your_data.csv")# Generated transformation (copy this!)result = (df.lazy().filter(pl.col("status") == "active").group_by("region").agg(pl.col("revenue").sum().alias("total_revenue"),pl.col("orders").count().alias("order_count")).sort("total_revenue", descending=True).head(10).collect())# Saveresult.write_csv("output.csv")
Performance
- Handles millions of rows efficiently
- Typical transformation: 1-3 seconds
- Uses Polars (Rust-based, 10-100x faster than Pandas)
- Lazy evaluation for memory efficiency
- Parallel processing for multi-file inputs
Privacy and Security
- Encrypted: API keys encrypted with AES-256
- Isolated: Data processed in isolated containers
- No retention: Data deleted after run completion
- No training: Your data is never used to train models
- BYOK: Full control over your LLM API keys
Support
- Issues: GitHub Issues
- Actor page: Apify Store
Changelog
v0.4 (December 2024)
- NEW:
inputData- Pass data directly as JSON, no file hosting needed - NEW:
output_data- Full transformed data in response (if < 10MB) - Single API call: data in, results out
- Perfect for API integrations and automation
v0.3 (December 2024)
- Migrated to google-genai SDK
- ThinkingConfig for extended reasoning
- Improved Google Search grounding
- Code cleanup and optimization
v0.2 (December 2024)
- 4-tier pricing: Basic, Premium, BYOK, BYOK Premium
- Premium: Gemini Pro + Google Search + RAG
- RAG system with Pinecone
- Multi-file support
v0.1 (December 2024)
- Initial release
- Multi-provider LLM support
- CSV, Excel, JSON, Parquet I/O