Pricing

$3.00/month + usage

Try for free

Go to Apify Store

SQL on Files

Try for free

Run SQL queries on CSV, JSON, and Parquet files using DuckDB. No database setup required. Upload files, provide URLs, or query Apify Datasets directly. Full SQL support: JOINs, aggregations, window functions. Export as JSON, CSV, or Parquet. Lightning-fast analytical queries.

Pricing

$3.00/month + usage

Rating

5.0

(1)

Developer

Web Harvester

Actor stats

Bookmarked

Total users

Monthly active users

24 days ago

Last modified

🎯 What This Actor Does

Query any data file with SQL - no database needed:

DuckDB Powered - Lightning-fast analytical queries
Multi-Format - CSV, JSON, Parquet support
Flexible Input - Upload files, URLs, or Apify Datasets
Full SQL - JOINs, aggregations, window functions
Export Options - JSON, CSV, or Parquet output

🚀 Use Cases

Use Case	Description
Data Analysis	Query scraped data with SQL
Transformations	Clean and reshape data
Aggregations	Group, count, sum, average
Filtering	Extract specific records
Joins	Combine multiple datasets
Export	Convert between formats

📥 Input Examples

Simple Query

{
    "fileUrl": "https://example.com/data.csv",
    "query": "SELECT * FROM data WHERE price > 100 ORDER BY price DESC LIMIT 10"
}

Aggregation

{
    "fileUrl": "https://example.com/sales.csv",
    "query": "SELECT category, COUNT(*) as count, AVG(price) as avg_price FROM data GROUP BY category"
}

From Apify Dataset

{
    "datasetId": "abc123xyz",
    "query": "SELECT url, title, price FROM data WHERE price IS NOT NULL"
}

⚙️ Configuration

Parameter	Type	Default	Description
`query`	string	-	Required. SQL query to execute
`file`	string	-	Upload a CSV/JSON/Parquet file
`fileUrl`	string	-	URL to download file from
`datasetId`	string	-	Load from Apify Dataset
`outputFormat`	string	`json`	Output: json, csv, parquet
`limit`	integer	`10000`	Max rows to return

📤 Output

JSON Output (Default)

Results pushed to Dataset:

[
    { "category": "Electronics", "count": 1520, "avg_price": 299.99 },
    { "category": "Books", "count": 892, "avg_price": 24.50 }
]

CSV/Parquet Output

{
    "format": "csv",
    "rows": 1520,
    "columns": 3,
    "downloadUrl": "https://api.apify.com/v2/..."
}

🦆 SQL Tips

-- Basic filtering
SELECT * FROM data WHERE column LIKE '%keyword%'

-- Aggregations
SELECT category, COUNT(*), SUM(price) FROM data GROUP BY category

-- Window functions
SELECT *, ROW_NUMBER() OVER (PARTITION BY category ORDER BY price DESC) as rank FROM data

-- Date handling
SELECT *, DATE_TRUNC('month', date_column) as month FROM data

-- JSON extraction
SELECT json_column->>'$.nested.field' as value FROM data

-- Pattern matching
SELECT * FROM data WHERE name ~ '^[A-Z].*'

💰 Cost Estimation

Data Size	Approx. Time	Compute Units
1 MB	~5 seconds	~0.005
10 MB	~15 seconds	~0.02
100 MB	~1 minute	~0.1

🔧 Technical Details

Language: Python 3.12
Engine: DuckDB 0.10+
Memory: 256MB-1GB (scales with data)
Speed: 1M+ rows/second for analytics

📄 License

MIT License - see LICENSE for details.

SQL Query

useful-tools/sql-query

Run SQL queries over Apify Platform (currently supports only Datasets).

Useful tools

Elite Database Sql Lite

thepattyroller/elite-database-sql-lite

SQL query generator and optimizer. Convert natural language to SQL queries, generate CRUD operations, and optimize existing queries. Perfect for developers and data analysts who need quick SQL assistance.

Logan Kiser

Dataset Query Engine

jiri.spilka/dataset-query-engine

Use natural language queries to retrieve results from an Apify dataset. This Actor provides a query engine that loads a dataset, executes SQL queries, and synthesizes results.

Jiří Spilka

4.6

Auto Insight AI

eager_cornet/sql-explainer

AutoInsight AI is your interactive AI-powered SQL tutor that helps you learn SQL the way professionals master it through guided practice, real examples, instant feedback, and visual execution results.

vikash kumar

Natural Language Dataset Query

apify/natural-language-dataset-query

Use natural language queries to retrieve results from an Apify dataset. This Actor provides a query engine that loads a dataset, executes SQL queries, and synthesizes results. It works as an MCP (Model Context Protocol) server or REST API in Actor standby mode.

Apify

SmartData Executor

professional_jostle/SmartData-executor

Run structured data processing on CSV or JSON files. Clean, filter, aggregate, and transform datasets using simple parameters. Designed for analysts, automation workflows, and ETL pipelines. Outputs results as Apify Datasets with execution metadata.

Am Af

Excel to CSV Converter

web.harvester/excel-to-csv

Convert Excel files (XLSX, XLS, ODS) to CSV format. Extract all sheets or specific ones. Configurable delimiter, date formatting, skip empty rows. Batch processing multiple files. Optional JSON output to Dataset. Handles large files efficiently. Perfect for ETL pipelines.

Web Harvester

5.0

Microsoft SQL Server Insert

petr_cermak/MSSQL-insert

This actor takes a crawler execution and inserts its results into a remote MSSQL database.

Petr Cermak

Apk Scraper

thenetaji/apk-downloader

📱 Find and extract direct download links for APK files from websites. Also discovers other downloadable content like videos, images, and audio files. Perfect for gathering Android apps and media files from various sources.

The Netaji

5.0

Zip Download Extraction Scraper

fresh_cliff/zip-download-extraction-scraper

Download and extract zip files automatically. Extract archives, process documents, analyze logs, backup files. Batch extract text, JSON, CSV content. Real-time data extraction API.