Pricing

from $0.001 / actor start

Go to Apify Store

Fuzzy Search Dataset Actor

Try for free

Search any Apify dataset using typo-tolerant fuzzy matching.

Pricing

from $0.001 / actor start

Rating

0.0

(0)

Developer

Tin

Actor stats

Bookmarked

Total users

Monthly active users

15 days ago

Last modified

What does Fuzzy Search Dataset do?

This Actor loads records from any Apify dataset and runs a fuzzy full-text search across one or more fields. It handles typos, partial matches, and word-order variations automatically. For example, searching "iphon pro mx" can still return results for "iPhone 15 Pro Max".

It's ideal for post-processing scraped data — after collecting a large dataset with another Actor, use this one to instantly build a search layer on top of it without any external infrastructure.

Why use Fuzzy Search Dataset?

No search engine needed — works directly on any Apify dataset without Elasticsearch, Algolia, or similar tools
Typo-tolerant — handles misspellings, abbreviations, and partial queries out of the box
Multi-field search — search across title, description, brand, nested fields like product.name, or any combination
Tunable relevance — control strictness, field weights, and minimum match length to fit your data
Automation-ready — trigger via API, schedule it, or chain it after a scraping Actor in a workflow

How to use Fuzzy Search Dataset

Run a scraping Actor to build the source dataset (or use any existing dataset you already have in Apify Console).
Copy the dataset ID from the dataset URL or the Storage section of Apify Console.
Open this Actor and paste the dataset ID into the Dataset ID field.
Enter your search query and choose which fields to search.
Run the Actor — results are written to its output dataset, ranked by relevance score.

Input

Configure the Actor in the Input tab or pass a JSON object via the API.

Field	Type	Required	Default	Description
`datasetId`	string	✅	—	ID of the Apify dataset to search
`query`	string	✅	—	Text to search for
`fields`	array of strings		`["title"]`	Dataset fields to search (supports dot notation for nested fields)
`limit`	integer		`20`	Maximum number of results to return (1–1000)
`threshold`	number		`0.35`	Fuzzy strictness — `0.0` = exact only, `1.0` = match anything
`ignoreLocation`	boolean		`true`	Allow matches anywhere in the text, not just at the start
`minMatchCharLength`	integer		`2`	Minimum characters in a token before it counts as a match
`includeScore`	boolean		`true`	Attach a relevance score to each result (lower = better match)
`includeMatches`	boolean		`false`	Include matched text ranges — useful for keyword highlighting
`extendedSearch`	boolean		`false`	Enable advanced query syntax (`^starts-with`, `!exclude`, `=exact`)
`weights`	object		—	Per-field importance weights, e.g. `{"title": 0.7, "description": 0.3}`

Example input:

{
    "datasetId": "UoYaa1QjGdgdJrSHA",
    "query": "iphon pro max",
    "fields": ["title", "description"],
    "limit": 10,
    "threshold": 0.35,
    "weights": {
        "title": 0.8,
        "description": 0.2
    }
}

Output

Results are pushed to the Actor's default dataset as a single object containing the query, total result count, and an array of ranked matches.

Example output:

{
    "query": "iphon pro max",
    "totalResults": 3,
    "results": [
        {
            "rank": 1,
            "score": 0.04,
            "item": {
                "title": "Apple iPhone 15 Pro Max",
                "description": "6.7-inch Super Retina XDR display, A17 Pro chip",
                "brand": "Apple",
                "price": 1199
            }
        },
        {
            "rank": 2,
            "score": 0.18,
            "item": {
                "title": "iPhone 14 Pro Max",
                "description": "48MP main camera, Dynamic Island",
                "brand": "Apple",
                "price": 999
            }
        }
    ]
}

You can download results in JSON, CSV, Excel, or HTML from the dataset tab in Apify Console or via the Apify API.

Output data fields

Field	Format	Description
`query`	text	The search query that was used
`totalResults`	number	Number of results returned
`results[].rank`	number	1-based position in results (1 = best match)
`results[].score`	number	Relevance score (0.0 = perfect match, 1.0 = no match)
`results[].item`	object	Full original record from the source dataset

Tips and advanced options

Tuning the threshold:

0.2 — strict, good for product codes and exact names
0.35 — balanced (default), works well for product titles and descriptions
0.6 — loose, useful for free-text fields or short queries

Multi-field search with weights:

To boost title matches above description matches, set weights to {"title": 0.8, "description": 0.2}. Weights must sum to 1.0 across all searched fields.

Advanced query syntax (when extendedSearch is enabled):

Syntax	Meaning	Example
`=iphone`	Exact match	`=iPhone 15`
`^apple`	Starts with	`^Apple`
`!samsung`	Exclude	`!Samsung`
`'pro`	Includes token	`'pro`

Performance: The Actor loads the entire dataset into memory. For datasets over 100k records, consider filtering the source dataset first or increasing the Actor's memory allocation.

Pricing / Cost estimation

This Actor processes data in-memory without using a browser or proxy, so compute costs are low. A typical run over 10,000 records completes in under 30 seconds. Apify provides a free tier sufficient for many use cases.

FAQ and support

Is this legal? This Actor only reads from datasets you own or have access to on the Apify platform. It does not scrape any external websites.

Can I use this with datasets from other Actors? Yes — as long as you have access to the dataset ID, this Actor can read it.

The results are empty or not what I expected. Try raising the threshold value (e.g. to 0.5), enabling ignoreLocation, or adding more fields to the fields array.

Found a bug or want a feature? Open an issue in the Issues tab on this Actor's page. Custom solutions are also available — reach out via the Apify platform.

Resources

CRM Deduplication Tool

enosgb/crm-deduplication-tool

Detects and merges duplicate contacts in CRM databases using advanced fuzzy matching algorithms

Enos Melo

HubSpot Company Enrichment & Fuzzy Matcher for Clay

alizarin_refrigerator-owner/hubspot-company-enrichment-fuzzy-matcher-for-clay

Fuzzy match and enrich companies against your HubSpot CRM using multi-signal matching (domain, company name, phone, location). Returns HubSpot ID, lifecycle stage, deal status & confidence scores. Perfect for Clay workflows, lead deduplication, and outbound enrichment.

The Howlers

Content Similarity Finder

fiery_dream/content-similarity-finder

Find duplicate and similar content with advanced fuzzy matching algorithms. Perfect for data cleaning and deduplication.

Cody Churchwell

Dataset Download

idiatech/apify-Dataset-Download

Download any dataset from the Apify platform automatically and in any format you want. Use this actor along with a Dataset toolbox automation tool.

idIA Tech

OFAC Sanctions List Search — SDN Screening with Fuzzy Matching

ryanclinton/ofac-sanctions-search

Search the US Treasury OFAC SDN sanctions list for KYC compliance screening. Screen individuals, entities, vessels & aircraft with fuzzy name matching. Filter by sanctions program and country. Returns aliases, IDs, addresses & direct OFAC links.

Ryan Clinton

Data.gov.uk Scraper - Cheap 🌐📊🇬🇧

scrapestorm/data-gov-uk-scraper---cheap

🔎 Easily collect dataset listings from data.gov.uk Provide one or multiple search URLs and extract dataset information such as 📄 Dataset Title 🏢 Published By 🕒 Last Updated 📝 Description 🔗 Dataset URL & more Perfect for open data research, government data monitoring & dataset discovery 📊🚀

Storm_Scraper

5.0

AI Prompt Keyword Matcher

antonio_espresso/ai-prompt-keyword-matcher

Analyze prompts for fuzzy keyword matches and brand token usage.

Antonio Blago

Data.gov.uk Scraper - Low-cost💲🔥📚🇬🇧

delectable_incubator/data-gov-uk-scraper-low-cost

Scrape data.gov.uk dataset listings 🔎📊 with a powerful open data scraper. Extract dataset titles, publishers, update dates, descriptions, tags, and dataset URLs from search results. Ideal for government data monitoring, open data research, dataset discovery, and structured data catalog creation 🚀

Prime Scrape

Product Matching Vectorizer

tri_angle/product-matching-vectorizer

Builds a FAISS vector database from products in an Apify dataset using an ONNX embedding model. The resulting index is saved to a Key-Value Store for fast similarity search. After uploading your dataset to the vector database, use our E-commerce Product Matching Tool to find matching products.