Pricing

from $0.01 / 1,000 results

Content Similarity Finder

Find duplicate and similar content with advanced fuzzy matching algorithms. Perfect for data cleaning and deduplication.

Pricing

from $0.01 / 1,000 results

Rating

0.0

(0)

Developer

Cody Churchwell

Actor stats

Bookmarked

Total users

Monthly active users

6 months ago

Last modified

Content Similarity & Duplicate Finder

Find duplicate and similar content with advanced fuzzy matching algorithms. Perfect for data cleaning and deduplication.

🎯 What It Does

Content Similarity Finder detects duplicate and near-duplicate content using multiple similarity algorithms: cosine similarity, Levenshtein distance, fuzzy matching, and Jaccard similarity.

✨ Key Features

Multiple Algorithms: Cosine, Levenshtein, Fuzzy, Jaccard
Configurable Threshold: Set minimum similarity (0-100%)
Smart Normalization: Case-insensitive, whitespace handling
Duplicate Grouping: Cluster similar items together
Fast Processing: Optimized for large datasets

🚀 Quick Start

{
  "content": [
    {"id": "1", "text": "The quick brown fox jumps"},
    {"id": "2", "text": "A quick brown fox jumps"},
    {"id": "3", "text": "Completely different text"}
  ],
  "similarityThreshold": 0.8,
  "algorithms": {
    "cosine": true,
    "levenshtein": true,
    "fuzzy": true,
    "jaccard": true
  }
}

📥 Input

content: Array of items with id and text fields
similarityThreshold: 0-1 (0.8 = 80% similar minimum)
algorithms: Enable/disable cosine, levenshtein, fuzzy, jaccard
caseSensitive: Treat case as significant (default: false)
ignoreWhitespace: Normalize whitespace (default: true)
minLength: Skip texts shorter than this
groupByDuplicate: Cluster similar items (default: true)

📤 Output

Similarity Matches

{
  "item1": "1",
  "item2": "2",
  "text1": "The quick brown fox",
  "text2": "A quick brown fox",
  "similarity": 0.89,
  "algorithm": "cosine"
}

Duplicate Groups (if groupByDuplicate: true)

{
  "totalGroups": 1,
  "groups": [
    {
      "groupId": "group_1",
      "members": ["1", "2"],
      "size": 2
    }
  ]
}

🛠 Use Cases

Data Deduplication: Remove duplicate entries from databases
Plagiarism Detection: Find copied content
Content Moderation: Detect spam or repeated messages
SEO Analysis: Find duplicate website content
Data Cleaning: Merge similar records

📊 Algorithms

Cosine Similarity: Best for semantic similarity (TF-IDF based)
Levenshtein Distance: Best for typos, minor edits
Fuzzy Matching: Best for approximate string matching
Jaccard Similarity: Best for word overlap comparison

📄 License

MIT License

Clean data, better insights 🔍

CRM Deduplication Tool

enosgb/crm-deduplication-tool

Detects and merges duplicate contacts in CRM databases using advanced fuzzy matching algorithms

Enos Melo

Duplicate Content Checker

automation-lab/duplicate-content-checker

This actor compares the text content of two or more web pages to detect duplicate or near-duplicate content. It uses w-shingling (5-word n-grams) with Jaccard similarity to calculate the percentage of shared content between every pair of URLs. Pages with 90%+ similarity are flagged as...

Stas Persiianenko

Advanced Product Matcher Pro

datawhisperers/advanced-product-matcher-pro

A powerful AI Apify Actor that intelligently matches products between two datasets using advanced machine learning algorithms and configurable similarity scoring. Perfect for e-commerce catalog matching, product deduplication, and inventory reconciliation.

Whisperers

5.0

SEO Duplicate Content Detector

gr_59017/seo-duplicate-content-detector

Detects duplicate or identical content across multiple webpages by analyzing visible page text. Helps identify SEO duplicate content issues, content reuse, and potential ranking risks using simple content comparison and scoring.

Gautam Rana

Fuzzy Search Dataset Actor

dtrungtin/fuzzy-search-dataset-actor

Search any Apify dataset using typo-tolerant fuzzy matching.

Tin

Similar Finder

tomba-io/similar-finder

Find similar domains based on a specific domain using the Tomba API.

Tomba io

E-commerce Product Matching Tool

tri_angle/e-commerce-product-matching-tool

Quickly find and rank matching products from two sources using intelligent similarity search. This actor works with pre-built product data to identify the best matches. Use it after uploading your dataset to the vector database with the Product Matching Vectorizer.

Tri⟁angle

Color Palette Fashion Finder

wild_yapok/color-palette-fashion-finder

Find clothing items that match your color palette from top fashion retailers. Specify colors by + name or hex codes + , and this Actor will search Zara, H&M, ASOS, and Shein for matching products using advanced color + similarity algorithms

Dominik Hajczuk

126

HubSpot Company Enrichment & Fuzzy Matcher for Clay

alizarin_refrigerator-owner/hubspot-company-enrichment-fuzzy-matcher-for-clay

Fuzzy match and enrich companies against your HubSpot CRM using multi-signal matching (domain, company name, phone, location). Returns HubSpot ID, lifecycle stage, deal status & confidence scores. Perfect for Clay workflows, lead deduplication, and outbound enrichment.