Fuzzy Search Dataset Actor
Pricing
from $0.001 / actor start
Pricing
from $0.001 / actor start
Rating
0.0
(0)
Developer
Tin
Actor stats
0
Bookmarked
1
Total users
0
Monthly active users
8 hours ago
Last modified
Categories
Share
Search any Apify dataset using typo-tolerant fuzzy matching. Point this Actor at an existing dataset, provide a query, and get back ranked results — even when spellings are imperfect. Try it directly in Apify Console.
What does Fuzzy Search Dataset do?
This Actor loads records from any Apify dataset and runs a fuzzy full-text search across one or more fields. It handles typos, partial matches, and word-order variations automatically. For example, searching "iphon pro mx" can still return results for "iPhone 15 Pro Max".
It's ideal for post-processing scraped data — after collecting a large dataset with another Actor, use this one to instantly build a search layer on top of it without any external infrastructure.
Why use Fuzzy Search Dataset?
- No search engine needed — works directly on any Apify dataset without Elasticsearch, Algolia, or similar tools
- Typo-tolerant — handles misspellings, abbreviations, and partial queries out of the box
- Multi-field search — search across
title,description,brand, nested fields likeproduct.name, or any combination - Tunable relevance — control strictness, field weights, and minimum match length to fit your data
- Automation-ready — trigger via API, schedule it, or chain it after a scraping Actor in a workflow
How to use Fuzzy Search Dataset
- Run a scraping Actor to build the source dataset (or use any existing dataset you already have in Apify Console).
- Copy the dataset ID from the dataset URL or the Storage section of Apify Console.
- Open this Actor and paste the dataset ID into the Dataset ID field.
- Enter your search query and choose which fields to search.
- Run the Actor — results are written to its output dataset, ranked by relevance score.
Input
Configure the Actor in the Input tab or pass a JSON object via the API.
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
datasetId | string | ✅ | — | ID of the Apify dataset to search |
query | string | ✅ | — | Text to search for |
fields | array of strings | ["title"] | Dataset fields to search (supports dot notation for nested fields) | |
limit | integer | 20 | Maximum number of results to return (1–1000) | |
threshold | number | 0.35 | Fuzzy strictness — 0.0 = exact only, 1.0 = match anything | |
ignoreLocation | boolean | true | Allow matches anywhere in the text, not just at the start | |
minMatchCharLength | integer | 2 | Minimum characters in a token before it counts as a match | |
includeScore | boolean | true | Attach a relevance score to each result (lower = better match) | |
includeMatches | boolean | false | Include matched text ranges — useful for keyword highlighting | |
extendedSearch | boolean | false | Enable advanced query syntax (^starts-with, !exclude, =exact) | |
weights | object | — | Per-field importance weights, e.g. {"title": 0.7, "description": 0.3} |
Example input:
{"datasetId": "UoYaa1QjGdgdJrSHA","query": "iphon pro max","fields": ["title", "description"],"limit": 10,"threshold": 0.35,"weights": {"title": 0.8,"description": 0.2}}
Output
Results are pushed to the Actor's default dataset as a single object containing the query, total result count, and an array of ranked matches.
Example output:
{"query": "iphon pro max","totalResults": 3,"results": [{"rank": 1,"score": 0.04,"item": {"title": "Apple iPhone 15 Pro Max","description": "6.7-inch Super Retina XDR display, A17 Pro chip","brand": "Apple","price": 1199}},{"rank": 2,"score": 0.18,"item": {"title": "iPhone 14 Pro Max","description": "48MP main camera, Dynamic Island","brand": "Apple","price": 999}}]}
You can download results in JSON, CSV, Excel, or HTML from the dataset tab in Apify Console or via the Apify API.
Output data fields
| Field | Format | Description |
|---|---|---|
query | text | The search query that was used |
totalResults | number | Number of results returned |
results[].rank | number | 1-based position in results (1 = best match) |
results[].score | number | Relevance score (0.0 = perfect match, 1.0 = no match) |
results[].item | object | Full original record from the source dataset |
Tips and advanced options
Tuning the threshold:
0.2— strict, good for product codes and exact names0.35— balanced (default), works well for product titles and descriptions0.6— loose, useful for free-text fields or short queries
Multi-field search with weights:
To boost title matches above description matches, set weights to {"title": 0.8, "description": 0.2}. Weights must sum to 1.0 across all searched fields.
Advanced query syntax (when extendedSearch is enabled):
| Syntax | Meaning | Example |
|---|---|---|
=iphone | Exact match | =iPhone 15 |
^apple | Starts with | ^Apple |
!samsung | Exclude | !Samsung |
'pro | Includes token | 'pro |
Performance: The Actor loads the entire dataset into memory. For datasets over 100k records, consider filtering the source dataset first or increasing the Actor's memory allocation.
Pricing / Cost estimation
This Actor processes data in-memory without using a browser or proxy, so compute costs are low. A typical run over 10,000 records completes in under 30 seconds. Apify provides a free tier sufficient for many use cases.
FAQ and support
Is this legal? This Actor only reads from datasets you own or have access to on the Apify platform. It does not scrape any external websites.
Can I use this with datasets from other Actors? Yes — as long as you have access to the dataset ID, this Actor can read it.
The results are empty or not what I expected. Try raising the threshold value (e.g. to 0.5), enabling ignoreLocation, or adding more fields to the fields array.
Found a bug or want a feature? Open an issue in the Issues tab on this Actor's page. Custom solutions are also available — reach out via the Apify platform.