SQL on Files avatar
SQL on Files

Pricing

$3.00/month + usage

Go to Apify Store
SQL on Files

SQL on Files

Run SQL queries on CSV, JSON, and Parquet files using DuckDB. No database setup required. Upload files, provide URLs, or query Apify Datasets directly. Full SQL support: JOINs, aggregations, window functions. Export as JSON, CSV, or Parquet. Lightning-fast analytical queries.

Pricing

$3.00/month + usage

Rating

5.0

(1)

Developer

Web Harvester

Web Harvester

Maintained by Community

Actor stats

1

Bookmarked

2

Total users

1

Monthly active users

4 days ago

Last modified

Share

🦆 Run SQL queries on CSV, JSON, and Parquet files using DuckDB. No database setup required!

Apify Actor License: MIT

🎯 What This Actor Does

Query any data file with SQL - no database needed:

  • DuckDB Powered - Lightning-fast analytical queries
  • Multi-Format - CSV, JSON, Parquet support
  • Flexible Input - Upload files, URLs, or Apify Datasets
  • Full SQL - JOINs, aggregations, window functions
  • Export Options - JSON, CSV, or Parquet output

🚀 Use Cases

Use CaseDescription
Data AnalysisQuery scraped data with SQL
TransformationsClean and reshape data
AggregationsGroup, count, sum, average
FilteringExtract specific records
JoinsCombine multiple datasets
ExportConvert between formats

📥 Input Examples

Simple Query

{
"fileUrl": "https://example.com/data.csv",
"query": "SELECT * FROM data WHERE price > 100 ORDER BY price DESC LIMIT 10"
}

Aggregation

{
"fileUrl": "https://example.com/sales.csv",
"query": "SELECT category, COUNT(*) as count, AVG(price) as avg_price FROM data GROUP BY category"
}

From Apify Dataset

{
"datasetId": "abc123xyz",
"query": "SELECT url, title, price FROM data WHERE price IS NOT NULL"
}

⚙️ Configuration

ParameterTypeDefaultDescription
querystring-Required. SQL query to execute
filestring-Upload a CSV/JSON/Parquet file
fileUrlstring-URL to download file from
datasetIdstring-Load from Apify Dataset
outputFormatstringjsonOutput: json, csv, parquet
limitinteger10000Max rows to return

📤 Output

JSON Output (Default)

Results pushed to Dataset:

[
{ "category": "Electronics", "count": 1520, "avg_price": 299.99 },
{ "category": "Books", "count": 892, "avg_price": 24.50 }
]

CSV/Parquet Output

{
"format": "csv",
"rows": 1520,
"columns": 3,
"downloadUrl": "https://api.apify.com/v2/..."
}

🦆 SQL Tips

-- Basic filtering
SELECT * FROM data WHERE column LIKE '%keyword%'
-- Aggregations
SELECT category, COUNT(*), SUM(price) FROM data GROUP BY category
-- Window functions
SELECT *, ROW_NUMBER() OVER (PARTITION BY category ORDER BY price DESC) as rank FROM data
-- Date handling
SELECT *, DATE_TRUNC('month', date_column) as month FROM data
-- JSON extraction
SELECT json_column->>'$.nested.field' as value FROM data
-- Pattern matching
SELECT * FROM data WHERE name ~ '^[A-Z].*'

💰 Cost Estimation

Data SizeApprox. TimeCompute Units
1 MB~5 seconds~0.005
10 MB~15 seconds~0.02
100 MB~1 minute~0.1

🔧 Technical Details

  • Language: Python 3.12
  • Engine: DuckDB 0.10+
  • Memory: 256MB-1GB (scales with data)
  • Speed: 1M+ rows/second for analytics

📄 License

MIT License - see LICENSE for details.