Fuzzy Search Dataset Actor avatar

Fuzzy Search Dataset Actor

Pricing

from $0.001 / actor start

Go to Apify Store
Fuzzy Search Dataset Actor

Fuzzy Search Dataset Actor

Search any Apify dataset using typo-tolerant fuzzy matching.

Pricing

from $0.001 / actor start

Rating

0.0

(0)

Developer

Tin

Tin

Maintained by Community

Actor stats

0

Bookmarked

1

Total users

0

Monthly active users

8 hours ago

Last modified

Share

Search any Apify dataset using typo-tolerant fuzzy matching. Point this Actor at an existing dataset, provide a query, and get back ranked results — even when spellings are imperfect. Try it directly in Apify Console.

What does Fuzzy Search Dataset do?

This Actor loads records from any Apify dataset and runs a fuzzy full-text search across one or more fields. It handles typos, partial matches, and word-order variations automatically. For example, searching "iphon pro mx" can still return results for "iPhone 15 Pro Max".

It's ideal for post-processing scraped data — after collecting a large dataset with another Actor, use this one to instantly build a search layer on top of it without any external infrastructure.

Why use Fuzzy Search Dataset?

  • No search engine needed — works directly on any Apify dataset without Elasticsearch, Algolia, or similar tools
  • Typo-tolerant — handles misspellings, abbreviations, and partial queries out of the box
  • Multi-field search — search across title, description, brand, nested fields like product.name, or any combination
  • Tunable relevance — control strictness, field weights, and minimum match length to fit your data
  • Automation-ready — trigger via API, schedule it, or chain it after a scraping Actor in a workflow

How to use Fuzzy Search Dataset

  1. Run a scraping Actor to build the source dataset (or use any existing dataset you already have in Apify Console).
  2. Copy the dataset ID from the dataset URL or the Storage section of Apify Console.
  3. Open this Actor and paste the dataset ID into the Dataset ID field.
  4. Enter your search query and choose which fields to search.
  5. Run the Actor — results are written to its output dataset, ranked by relevance score.

Input

Configure the Actor in the Input tab or pass a JSON object via the API.

FieldTypeRequiredDefaultDescription
datasetIdstringID of the Apify dataset to search
querystringText to search for
fieldsarray of strings["title"]Dataset fields to search (supports dot notation for nested fields)
limitinteger20Maximum number of results to return (1–1000)
thresholdnumber0.35Fuzzy strictness — 0.0 = exact only, 1.0 = match anything
ignoreLocationbooleantrueAllow matches anywhere in the text, not just at the start
minMatchCharLengthinteger2Minimum characters in a token before it counts as a match
includeScorebooleantrueAttach a relevance score to each result (lower = better match)
includeMatchesbooleanfalseInclude matched text ranges — useful for keyword highlighting
extendedSearchbooleanfalseEnable advanced query syntax (^starts-with, !exclude, =exact)
weightsobjectPer-field importance weights, e.g. {"title": 0.7, "description": 0.3}

Example input:

{
"datasetId": "UoYaa1QjGdgdJrSHA",
"query": "iphon pro max",
"fields": ["title", "description"],
"limit": 10,
"threshold": 0.35,
"weights": {
"title": 0.8,
"description": 0.2
}
}

Output

Results are pushed to the Actor's default dataset as a single object containing the query, total result count, and an array of ranked matches.

Example output:

{
"query": "iphon pro max",
"totalResults": 3,
"results": [
{
"rank": 1,
"score": 0.04,
"item": {
"title": "Apple iPhone 15 Pro Max",
"description": "6.7-inch Super Retina XDR display, A17 Pro chip",
"brand": "Apple",
"price": 1199
}
},
{
"rank": 2,
"score": 0.18,
"item": {
"title": "iPhone 14 Pro Max",
"description": "48MP main camera, Dynamic Island",
"brand": "Apple",
"price": 999
}
}
]
}

You can download results in JSON, CSV, Excel, or HTML from the dataset tab in Apify Console or via the Apify API.

Output data fields

FieldFormatDescription
querytextThe search query that was used
totalResultsnumberNumber of results returned
results[].ranknumber1-based position in results (1 = best match)
results[].scorenumberRelevance score (0.0 = perfect match, 1.0 = no match)
results[].itemobjectFull original record from the source dataset

Tips and advanced options

Tuning the threshold:

  • 0.2 — strict, good for product codes and exact names
  • 0.35 — balanced (default), works well for product titles and descriptions
  • 0.6 — loose, useful for free-text fields or short queries

Multi-field search with weights:

To boost title matches above description matches, set weights to {"title": 0.8, "description": 0.2}. Weights must sum to 1.0 across all searched fields.

Advanced query syntax (when extendedSearch is enabled):

SyntaxMeaningExample
=iphoneExact match=iPhone 15
^appleStarts with^Apple
!samsungExclude!Samsung
'proIncludes token'pro

Performance: The Actor loads the entire dataset into memory. For datasets over 100k records, consider filtering the source dataset first or increasing the Actor's memory allocation.

Pricing / Cost estimation

This Actor processes data in-memory without using a browser or proxy, so compute costs are low. A typical run over 10,000 records completes in under 30 seconds. Apify provides a free tier sufficient for many use cases.

FAQ and support

Is this legal? This Actor only reads from datasets you own or have access to on the Apify platform. It does not scrape any external websites.

Can I use this with datasets from other Actors? Yes — as long as you have access to the dataset ID, this Actor can read it.

The results are empty or not what I expected. Try raising the threshold value (e.g. to 0.5), enabling ignoreLocation, or adding more fields to the fields array.

Found a bug or want a feature? Open an issue in the Issues tab on this Actor's page. Custom solutions are also available — reach out via the Apify platform.

Resources