Pricing

from $10.00 / 1,000 analysed pages

Try for free

Go to Apify Store

Chatgpt Detector

Try for free

ChatGPT Detector analyses web pages and estimates whether visible text is AI-generated, human-written, mixed, or insufficient for review. It provides probability scores, confidence bands, review priority, and explainable signals for editorial QA, moderation, compliance, and SEO audits.

Pricing

from $10.00 / 1,000 analysed pages

Rating

5.0

(4)

Developer

Sovanza

Actor stats

Bookmarked

Total users

Monthly active users

2 months ago

Last modified

AI Content Detector — ChatGPT, Claude & AI Text Analyzer with Scoring

Instantly detect whether web page content reads as human-written or AI-generated — including writing patterns commonly associated with ChatGPT, Claude, Gemini, Llama, and other LLMs. Instead of a binary label, this actor returns structured probability + confidence plus explainable reasons for every analysis, so teams can make defensible decisions at scale.

Built for educators, publishers, content agencies, and compliance teams who need reliable, explainable AI content detection — not a black box.

Overview

The AI Content Detector analyzes one or more URLs, extracts the main readable text, and estimates whether the content is:

likely_ai
likely_human
mixed
insufficient_text

For every page, it outputs:

AI probability / human probability
A confidence score and confidence band
A reviewPriority label for triage
A reasons breakdown (topReasons, warningFlags, and signals)

This is designed for real-world content: messy pages, mixed edits, SEO templates, and editorial posts — not just clean demo text.

Important disclaimer

AI detection is probabilistic, not definitive. This actor provides risk scoring and review guidance, not proof of authorship. Use it as a screening layer that routes borderline cases to human review.

Key benefits

Save hours of manual review with automated AI detection across large volumes of pages
Make defensible decisions with scored, reasoned output — not opaque labels
Scale screening across many URLs or internal crawls with repeatable runs
Improve auditability with structured signal breakdowns and exportable reports
Reduce reputational and compliance risk by flagging AI-like pages before they go live

Features

AI vs human-like classification for extracted web page text
Explainable scoring: probabilities, confidence, and top reasons
Hybrid detection modes: heuristic, model, or hybrid
Crawl support: optionally follow internal links with caps on depth and pages per domain
Content extraction pipeline: semantic containers + readability + fallbacks to reduce boilerplate noise
Long-form template signals tuned for SEO-style patterns: heading repetition, section similarity, CTA/FAQ templates
Batch processing: multiple start URLs in one run
Structured output for dashboards and downstream automation

Export formats

Results are written to the Apify dataset and can be exported as JSON, CSV, Excel, and (where supported by Apify export options) XML.

Use cases (high-value)

Academic integrity: Screen student submissions hosted online (or LMS-exported pages) for AI-like authorship patterns with scored evidence.
Publishing & editorial: Audit freelance or user-submitted articles before publication to enforce authenticity policies.
Content agency QA: Verify vendors are delivering human-written work, or flag drafts for additional review.
SEO & compliance: Identify templated, AI-like pages in a publishing pipeline before indexing.
HR & recruitment: Screen public application pages/portfolios for AI-like writing patterns (use responsibly).
Legal & contract review: Flag AI-drafted public disclosures/communications that warrant additional verification.

Why this tool stands out

Most detectors return “AI” vs “Human” with no explanation. This actor provides:

A confidence score and confidence band
A reviewPriority label to operationalize triage
A structured signals object (so you can audit and tune thresholds)
Clear topReasons so reviewers understand what triggered the verdict

It’s built to support defensible workflows — not just labels.

How to use on Apify

Using the Actor

Open the Actor on Apify and go to the Input tab.
Add startUrls and set crawl/detection options.
Start the run.
Open the Dataset tab to inspect scores and explanations.
Export JSON/CSV/Excel or pull via API for automation.

Input Configuration

Full schema: INPUT_SCHEMA.json. Example:

{
  "startUrls": [
    { "url": "https://blog.apify.com/" },
    { "url": "https://en.wikipedia.org/wiki/Natural_language_processing" },
    { "url": "https://openai.com/research/" }
  ],
  "crawlLinkedPages": false,
  "maxPagesPerDomain": 1,
  "maxDepth": 0,
  "includeSubdomains": false,
  "detectionMode": "hybrid",
  "languageHint": "en",
  "minTextLength": 300,
  "includeRawText": false,
  "includeHtmlMetadata": true,
  "maxConcurrency": 5,
  "requestTimeoutSecs": 60,
  "saveMarkdown": false,
  "blockAssets": true,
  "includeDebugFields": true,
  "proxyConfiguration": {
    "useApifyProxy": true
  }
}

startUrls (required): URL list to analyze
crawlLinkedPages: follow internal links
maxDepth, maxPagesPerDomain, includeSubdomains: crawl controls
detectionMode: heuristic, model, hybrid
languageHint: optional language hint
minTextLength: minimum cleaned text length for analysis
includeRawText, saveMarkdown, includeHtmlMetadata: output detail controls
blockAssets, requestTimeoutSecs, maxConcurrency: performance controls
includeDebugFields: include extraction and threshold diagnostics in output
proxyConfiguration, userAgent: access and request customization

Output

Results are stored in the Actor’s default dataset.

Each analyzed page can include:

Identity & metadata: inputUrl, finalUrl, domain, statusCode, title, metaDescription, canonicalUrl, language
Text stats: wordCount, paragraphCount, sentenceCount
Scoring: classification, aiProbability, humanProbability, confidence, confidenceBand, reviewPriority
Signals: lexical/structure/repetition/specificity/citation + long-form template signals
Explainability: topReasons, warningFlags, thresholdDecisionReason
Debug fields (optional): extractionSource, rawExtractedLength, cleanedExtractedLength
Optional content fields: rawText, markdown
Meta: timestamp

Triage helpers:

confidenceBand
- low for confidence < 0.40
- medium for 0.40-0.69
- high for >= 0.70
reviewPriority
- high: likely_ai with high confidence
- medium: mixed or uncertain results
- low: likely human with acceptable confidence

Example summary row:

{
  "type": "__summary__",
  "totalUrls": 3,
  "processedUrls": 3,
  "skippedUrls": 0,
  "likelyAiCount": 1,
  "mixedCount": 1,
  "likelyHumanCount": 1,
  "insufficientTextCount": 0,
  "timestamp": "2026-03-26T10:31:45.000000+00:00"
}

Detection methodology (explainability)

The detector combines:

Heuristic signals
- lexical diversity, burstiness proxies, repetition patterns
- generic/explanatory phrasing and templated conclusions
- specificity and citation grounding proxies
- long-form template signals for SEO-style structure detection
Rule-based cues
- formulaic transitions/conclusions
- low source-grounding markers
- repetitive stylistic patterns
Model-style layer (detect_with_model)
- local calibrated scorer over engineered features
- no paid external dependency
- pluggable for future model upgrades
Hybrid scoring
- weighted combination of heuristic and model-style signals
- weights calibrated via benchmark separation diagnostics
- reference/docs signal dampening for selected template cues

Classification thresholds (conservative defaults):

likely_ai if aiProbability >= 0.80 and confidence >= 0.60
likely_human if aiProbability < 0.45 with acceptable confidence
mixed otherwise
insufficient_text when cleaned text is below minTextLength

Error Handling

The run does not fail because of one URL.

Handled cases include:

invalid URLs
timeout/network failures
blocked/challenged pages
JS-heavy pages with little readable text
short or insufficient content

Failure rows still include structured fields like inputUrl, classification, error, and timestamp.

Performance and Anti-Blocking

Parallel processing via maxConcurrency
Capped retry logic per page
Optional heavy asset blocking (blockAssets)
Apify proxy support (proxyConfiguration)
URL deduplication and crawl depth/domain constraints

Benchmarking and Calibration

Reusable benchmark suites in benchmarks/:

benchmark-editorial.json
benchmark-reference.json
benchmark-seo.json
benchmark-thin.json

Run benchmark diagnostics:

$python scripts/benchmark_runner.py

Runner output includes:

per-URL calibration rows (classification, aiProbability, confidence, bands, priority)
grouped totals by benchmark family
average template signals per family
delta vs editorial baseline and ranked signal separation diagnostics

Integrations & API

Run and export through the Apify platform
Retrieve results via Apify API (dataset endpoints)
Integrate with Python/Node workflows, webhooks, schedules, and automation tools

Run Locally

pip install -r requirements.txt
python main.py

Use local Apify storage input (storage/key_value_stores/default/INPUT.json) or platform input.

Run on Apify

Create/upload Actor as chatgpt-detector
Configure input in the Actor UI
Start a run
Read/export dataset results

Why choose this actor?

Conservative by default (false-positive resistant)
Explainable scoring with actionable priority labels
Benchmark-driven calibration loop built in
Production-oriented crawling, output, and failure handling

FAQ

What data does this actor return for each analysis?

For each analyzed URL, the actor returns: classification, aiProbability, humanProbability, confidence, confidenceBand, reviewPriority, plus topReasons, warningFlags, and a structured signals breakdown. Optional fields include rawText and markdown when enabled.

Which AI models can this detector identify?

It does not rely on model “fingerprints”. Instead, it scores statistical and structural writing signals that often correlate with LLM-generated text. This generally generalizes across ChatGPT, Claude, Gemini, Llama, and other models — but edited or mixed content can reduce certainty.

How does the confidence score work?

Confidence reflects how strongly the extracted signals support the predicted class. Middle-range confidence often indicates mixed or heavily edited text, or pages with limited signal density.

Can I analyze multiple items in a single run?

Yes — provide multiple entries in startUrls. You can also enable crawlLinkedPages to analyze additional internal pages with caps via maxDepth and maxPagesPerDomain.

Is technical experience required?

No — run it from the Apify UI. For pipelines, use the Apify API to automate runs and consume datasets.

How accurate is AI detection?

Accuracy depends on length, language, domain style, and how edited the text is. The actor performs best on longer, unedited or lightly edited text. Use confidenceBand, reviewPriority, and topReasons to guide manual review for borderline cases.

What content lengths are supported?

Any length, but pages below minTextLength (default 300 characters of cleaned text) are labeled insufficient_text. Very short text has weaker signals.

Is this a definitive AI detector?

No. It is a probabilistic detector and review-priority tool.

Why do many pages show `mixed`?

Conservative thresholds intentionally reduce overconfident labels, especially on reference and editorial content.

Why do some pages return `insufficient_text`?

The page may be very short, blocked, dynamic, or otherwise not extractable above minTextLength.

Can I tune behavior for my domain?

Yes. Use benchmark files + scripts/benchmark_runner.py and reweight signals/thresholds for your content distribution.

SEO Keywords

chatgpt detector
ai content detector apify
gpt text detection
llm generated text checker
ai writing probability tool
website ai text classifier
content authenticity scoring
ai text risk scoring
editorial ai moderation tool
seo template content detector

Actor Permissions

This Actor is designed to run with limited permissions: read input and write dataset output, with optional proxy/KV usage as configured.

Limitations

Highly dynamic/blocked pages can reduce extraction quality
Mixed human+AI writing remains hard to separate perfectly
Edited AI text can resemble human writing
Short content has limited signal strength
Signal behavior varies by language/domain style

Get Started

Add URLs, run the Actor, inspect confidenceBand + reviewPriority, and iterate with benchmark diagnostics for your domain. 🚀

AI Detector & ChatGPT Checker

dev00/detector-io-ai-detector-apify

Instantly check if text was written by an AI (ChatGPT, Claude, Gemini, GPT) or a human. Get detailed percentage scores and bypass detection.

dev00

QuillBot AI Detector & ChatGPT Content Checker

dev00/quillbot-ai-detector-apify

Instantly check if text was written by an AI (ChatGPT, Claude, Gemini, GPT-4) or a human. Get detailed percentage scores, sentence-level analysis, confidence ratings, and AI paraphrasing detection.

dev00

5.0

Ai Image Detector

vivid_astronaut/ai-image-detector

Fabio Suizu

Ai Content Detector

vivid_astronaut/ai-content-detector

Detect AI-generated content with high accuracy. Analyzes text to determine if it was written by AI (ChatGPT, Claude, etc.) or humans. Perfect for educators, publishers, and content managers.

Fabio Suizu

Website Tech Detector

technicaldost/website-tech-detector

Technical Dost Solutions

AI Content Detector

dealgate/ai-content-detector

Our research-backed algorithms consistently achieve high reliability in distinguishing AI-generated (Claude, ChatGPT, Gemini) content from human writing.

DealGate

Ai Text Analyzer

ruturaj04/Ai-text-analyzer

The AI Text Analyzer helps you quickly determine whether a piece of text is AI-generated or human-written. Built for accuracy and clarity, it analyzes writing patterns, structure, predictability, and stylistic signals to deliver a clear confidence score—so you can make informed decisions in seconds.

Ruturaj Sharbidre

5.0

AI Content Detector

muhammetakkurtt/ai-content-detector

The AI Content Detector instantly analyzes how much of your text or file was written by AI. Verify content authenticity, boost your SEO, and maintain academic integrity. Secure your texts with fast, reliable results.