SQL on Files
Pricing
$3.00/month + usage
Go to Apify Store
SQL on Files
Run SQL queries on CSV, JSON, and Parquet files using DuckDB. No database setup required. Upload files, provide URLs, or query Apify Datasets directly. Full SQL support: JOINs, aggregations, window functions. Export as JSON, CSV, or Parquet. Lightning-fast analytical queries.
Pricing
$3.00/month + usage
Rating
5.0
(1)
Developer

Web Harvester
Maintained by Community
Actor stats
1
Bookmarked
2
Total users
1
Monthly active users
4 days ago
Last modified
Categories
Share
🦆 Run SQL queries on CSV, JSON, and Parquet files using DuckDB. No database setup required!
🎯 What This Actor Does
Query any data file with SQL - no database needed:
- DuckDB Powered - Lightning-fast analytical queries
- Multi-Format - CSV, JSON, Parquet support
- Flexible Input - Upload files, URLs, or Apify Datasets
- Full SQL - JOINs, aggregations, window functions
- Export Options - JSON, CSV, or Parquet output
🚀 Use Cases
| Use Case | Description |
|---|---|
| Data Analysis | Query scraped data with SQL |
| Transformations | Clean and reshape data |
| Aggregations | Group, count, sum, average |
| Filtering | Extract specific records |
| Joins | Combine multiple datasets |
| Export | Convert between formats |
📥 Input Examples
Simple Query
{"fileUrl": "https://example.com/data.csv","query": "SELECT * FROM data WHERE price > 100 ORDER BY price DESC LIMIT 10"}
Aggregation
{"fileUrl": "https://example.com/sales.csv","query": "SELECT category, COUNT(*) as count, AVG(price) as avg_price FROM data GROUP BY category"}
From Apify Dataset
{"datasetId": "abc123xyz","query": "SELECT url, title, price FROM data WHERE price IS NOT NULL"}
⚙️ Configuration
| Parameter | Type | Default | Description |
|---|---|---|---|
query | string | - | Required. SQL query to execute |
file | string | - | Upload a CSV/JSON/Parquet file |
fileUrl | string | - | URL to download file from |
datasetId | string | - | Load from Apify Dataset |
outputFormat | string | json | Output: json, csv, parquet |
limit | integer | 10000 | Max rows to return |
📤 Output
JSON Output (Default)
Results pushed to Dataset:
[{ "category": "Electronics", "count": 1520, "avg_price": 299.99 },{ "category": "Books", "count": 892, "avg_price": 24.50 }]
CSV/Parquet Output
{"format": "csv","rows": 1520,"columns": 3,"downloadUrl": "https://api.apify.com/v2/..."}
🦆 SQL Tips
-- Basic filteringSELECT * FROM data WHERE column LIKE '%keyword%'-- AggregationsSELECT category, COUNT(*), SUM(price) FROM data GROUP BY category-- Window functionsSELECT *, ROW_NUMBER() OVER (PARTITION BY category ORDER BY price DESC) as rank FROM data-- Date handlingSELECT *, DATE_TRUNC('month', date_column) as month FROM data-- JSON extractionSELECT json_column->>'$.nested.field' as value FROM data-- Pattern matchingSELECT * FROM data WHERE name ~ '^[A-Z].*'
💰 Cost Estimation
| Data Size | Approx. Time | Compute Units |
|---|---|---|
| 1 MB | ~5 seconds | ~0.005 |
| 10 MB | ~15 seconds | ~0.02 |
| 100 MB | ~1 minute | ~0.1 |
🔧 Technical Details
- Language: Python 3.12
- Engine: DuckDB 0.10+
- Memory: 256MB-1GB (scales with data)
- Speed: 1M+ rows/second for analytics
📄 License
MIT License - see LICENSE for details.