Pricing

Pay per usage

SEO Duplicate Content Detector

Detects duplicate or identical content across multiple webpages by analyzing visible page text. Helps identify SEO duplicate content issues, content reuse, and potential ranking risks using simple content comparison and scoring.

Pricing

Pay per usage

Rating

0.0

(0)

Developer

Gautam Rana

Actor stats

Bookmarked

Total users

Monthly active users

2 months ago

Last modified

Description

SEO Duplicate Content Detector analyzes the visible textual content of multiple webpages and compares them to identify duplicate or highly similar content. It helps uncover SEO issues related to content reuse, near-duplicate pages, and potential ranking risks caused by content redundancy.

The API performs basic text normalization, hashing, and similarity comparison to determine whether pages share identical or substantially similar content, and reports duplication metrics in a structured dataset.

This API is useful for:

SEO audits and website quality checks
Identifying content reuse across pages
Detecting thin or duplicated pages
Developers building SEO monitoring tools
Automation workflows for large-scale content analysis

Features

Extracts visible text content from webpages
Generates a content hash for comparison
Detects exact and near-duplicate pages
Identifies which URLs share duplicated content
Calculates content length
Computes a duplication score (percentage)
Returns structured JSON output
Supports multiple URLs per request

Tech Stack

Platform: Apify Actor
Language: (your implementation language)
HTTP Client / Crawler: (e.g., Crawlee, Axios, Requests)
HTML Parsing: (e.g., Cheerio, BeautifulSoup)
Similarity Logic: Hashing / basic text comparison

Input Format

The API accepts a JSON input with a list of webpage URLs to analyze.

Example `input.json`

{
  "startUrls": [
    { "url": "https://github.com/Gautamrana14" },
    { "url": "https://github.com/Gautamrana14?tab=repositories" },
    { "url": "https://github.com/Gautamrana14?tab=overview" }
  ]
}

Usage

Base Endpoint (Apify Actor)

https://api.apify.com/v2/acts/<your-actor-id>/run-sync-get-dataset-items

Example Request (cURL)

curl -X POST \
  -H "Content-Type: application/json" \
  -d @input.json \
  "https://api.apify.com/v2/acts/<your-actor-id>/run-sync-get-dataset-items?token=YOUR_API_TOKEN"

Example using JavaScript

import fetch from "node-fetch";

const response = await fetch(
  "https://api.apify.com/v2/acts/<your-actor-id>/run-sync-get-dataset-items?token=YOUR_API_TOKEN",
  {
    method: "POST",
    headers: { "Content-Type": "application/json" },
    body: JSON.stringify({
      startUrls: [
        { url: "https://example.com/page1" },
        { url: "https://example.com/page2" }
      ]
    })
  }
);

const data = await response.json();
console.log(data);

Example using Python

import requests

url = "https://api.apify.com/v2/acts/<your-actor-id>/run-sync-get-dataset-items"
params = {"token": "YOUR_API_TOKEN"}

payload = {
    "startUrls": [
        {"url": "https://example.com/page1"},
        {"url": "https://example.com/page2"}
    ]
}

response = requests.post(url, params=params, json=payload)
print(response.json())

Output Format

The API returns a dataset URL containing the analysis results.

Output Schema

{
  "dataset": "https://api.apify.com/v2/datasets/xxxx/items"
}

Dataset Schema

Each dataset item has the following structure:

{
  "url": "https://example.com/page1",
  "contentHash": "a94a8fe5ccb19ba61c4c0873d391e987982fbbd3",
  "isDuplicate": true,
  "duplicateWith": [
    "https://example.com/page2"
  ],
  "contentLength": 3450,
  "duplicationScore": 92
}

Field Description

Field	Description
url	Website URL
contentHash	Hash generated from visible text content
isDuplicate	Indicates whether duplicate content was detected
duplicateWith	List of URLs with matching or similar content
contentLength	Length of extracted text content
duplicationScore	Similarity score in percentage (0–100)

Limitations

Only analyzes visible text content
Does not detect plagiarism or semantic similarity
Dynamic or JavaScript-rendered content may affect accuracy
Not a replacement for full SEO audit tools

Rate Limits

Depends on your Apify plan and actor configuration.

Roadmap

Near-duplicate detection using similarity algorithms
Semantic similarity scoring
Content clustering
Domain-wide crawling support
Export reports in CSV and JSON formats

Contributing

Contributions are welcome.

Fork the repository
Create a feature branch
Commit your changes
Open a pull request

License

MIT License

Author

Gautam Rana GitHub: https://github.com/Gautamrana14

Duplicate Content Checker

automation-lab/duplicate-content-checker

This actor compares the text content of two or more web pages to detect duplicate or near-duplicate content. It uses w-shingling (5-word n-grams) with Jaccard similarity to calculate the percentage of shared content between every pair of URLs. Pages with 90%+ similarity are flagged as...

Stas Persiianenko

Ai Seo Content

vivid_astronaut/ai-seo-content

Fabio Suizu

Duplicate Run Guardian

tomas.gabik/duplicate-run-guardian

Save costs by automatically aborting duplicate Actor runs. The essential integration for every scraping workflow

Tomáš Gabík

Enhanced Deep Content Crawler

assertive_analogy/advanced-crawler

A fast, Python-powered web crawler with smart content extraction, JS support, metadata capture, and duplicate detection. Ideal for SEO, content migration, and e-commerce scraping. Reliable, scalable, and easy to customize.

Gideon Nesh

1.0

Content Similarity Finder

fiery_dream/content-similarity-finder

Find duplicate and similar content with advanced fuzzy matching algorithms. Perfect for data cleaning and deduplication.

Cody Churchwell

AI Content Intelligence Pro

apify_daniel/ai-content-seo-optimizer

Professional content analysis tool. Analyzes performance and SEO opportunities. Essential for content marketers and digital agencies.

Daniel Mayne

AI Content Intelligence Pro

apify_daniel/video-content-analytics

Professional content analysis tool. Analyzes performance and SEO opportunities. Essential for content marketers and digital agencies.

Daniel Mayne

Keyword Density Analyzer & SEO Content Scraper

andok/keyword-density-analyzer

Bulk analyze webpage keyword density and content metrics. Automate on-page SEO audits, extract top keywords, and compare competitor content easily.

Andok

AI Content Detector

muhammetakkurtt/ai-content-detector

The AI Content Detector instantly analyzes how much of your text or file was written by AI. Verify content authenticity, boost your SEO, and maintain academic integrity. Secure your texts with fast, reliable results.

Muhammet Akkurt

5.0

Ai Content Generator

allanjblythe/ai-content-generator

Generate blog posts, social media content, emails, and SEO-optimized articles instantly. Cut content costs by 80% and scale your content marketing. AI-powered writing that sounds human and drives results.