Chatgpt Detector avatar

Chatgpt Detector

Pricing

from $10.00 / 1,000 analysed pages

Go to Apify Store
Chatgpt Detector

Chatgpt Detector

ChatGPT Detector analyses web pages and estimates whether visible text is AI-generated, human-written, mixed, or insufficient for review. It provides probability scores, confidence bands, review priority, and explainable signals for editorial QA, moderation, compliance, and SEO audits.

Pricing

from $10.00 / 1,000 analysed pages

Rating

5.0

(4)

Developer

Sovanza

Sovanza

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

6 days ago

Last modified

Share

AI Content Detector — ChatGPT, Claude & AI Text Analyzer with Scoring

Instantly detect whether web page content reads as human-written or AI-generated — including writing patterns commonly associated with ChatGPT, Claude, Gemini, Llama, and other LLMs. Instead of a binary label, this actor returns structured probability + confidence plus explainable reasons for every analysis, so teams can make defensible decisions at scale.

Built for educators, publishers, content agencies, and compliance teams who need reliable, explainable AI content detection — not a black box.

Overview

The AI Content Detector analyzes one or more URLs, extracts the main readable text, and estimates whether the content is:

  • likely_ai
  • likely_human
  • mixed
  • insufficient_text

For every page, it outputs:

  • AI probability / human probability
  • A confidence score and confidence band
  • A reviewPriority label for triage
  • A reasons breakdown (topReasons, warningFlags, and signals)

This is designed for real-world content: messy pages, mixed edits, SEO templates, and editorial posts — not just clean demo text.

Important disclaimer

AI detection is probabilistic, not definitive. This actor provides risk scoring and review guidance, not proof of authorship. Use it as a screening layer that routes borderline cases to human review.

Key benefits

  • Save hours of manual review with automated AI detection across large volumes of pages
  • Make defensible decisions with scored, reasoned output — not opaque labels
  • Scale screening across many URLs or internal crawls with repeatable runs
  • Improve auditability with structured signal breakdowns and exportable reports
  • Reduce reputational and compliance risk by flagging AI-like pages before they go live

Features

  • AI vs human-like classification for extracted web page text
  • Explainable scoring: probabilities, confidence, and top reasons
  • Hybrid detection modes: heuristic, model, or hybrid
  • Crawl support: optionally follow internal links with caps on depth and pages per domain
  • Content extraction pipeline: semantic containers + readability + fallbacks to reduce boilerplate noise
  • Long-form template signals tuned for SEO-style patterns: heading repetition, section similarity, CTA/FAQ templates
  • Batch processing: multiple start URLs in one run
  • Structured output for dashboards and downstream automation

Export formats

Results are written to the Apify dataset and can be exported as JSON, CSV, Excel, and (where supported by Apify export options) XML.

Use cases (high-value)

  • Academic integrity: Screen student submissions hosted online (or LMS-exported pages) for AI-like authorship patterns with scored evidence.
  • Publishing & editorial: Audit freelance or user-submitted articles before publication to enforce authenticity policies.
  • Content agency QA: Verify vendors are delivering human-written work, or flag drafts for additional review.
  • SEO & compliance: Identify templated, AI-like pages in a publishing pipeline before indexing.
  • HR & recruitment: Screen public application pages/portfolios for AI-like writing patterns (use responsibly).
  • Legal & contract review: Flag AI-drafted public disclosures/communications that warrant additional verification.

Why this tool stands out

Most detectors return “AI” vs “Human” with no explanation. This actor provides:

  • A confidence score and confidence band
  • A reviewPriority label to operationalize triage
  • A structured signals object (so you can audit and tune thresholds)
  • Clear topReasons so reviewers understand what triggered the verdict

It’s built to support defensible workflows — not just labels.

How to use on Apify

Using the Actor

  1. Open the Actor on Apify and go to the Input tab.
  2. Add startUrls and set crawl/detection options.
  3. Start the run.
  4. Open the Dataset tab to inspect scores and explanations.
  5. Export JSON/CSV/Excel or pull via API for automation.

Input Configuration

Full schema: INPUT_SCHEMA.json. Example:

{
"startUrls": [
{ "url": "https://blog.apify.com/" },
{ "url": "https://en.wikipedia.org/wiki/Natural_language_processing" },
{ "url": "https://openai.com/research/" }
],
"crawlLinkedPages": false,
"maxPagesPerDomain": 1,
"maxDepth": 0,
"includeSubdomains": false,
"detectionMode": "hybrid",
"languageHint": "en",
"minTextLength": 300,
"includeRawText": false,
"includeHtmlMetadata": true,
"maxConcurrency": 5,
"requestTimeoutSecs": 60,
"saveMarkdown": false,
"blockAssets": true,
"includeDebugFields": true,
"proxyConfiguration": {
"useApifyProxy": true
}
}
  • startUrls (required): URL list to analyze
  • crawlLinkedPages: follow internal links
  • maxDepth, maxPagesPerDomain, includeSubdomains: crawl controls
  • detectionMode: heuristic, model, hybrid
  • languageHint: optional language hint
  • minTextLength: minimum cleaned text length for analysis
  • includeRawText, saveMarkdown, includeHtmlMetadata: output detail controls
  • blockAssets, requestTimeoutSecs, maxConcurrency: performance controls
  • includeDebugFields: include extraction and threshold diagnostics in output
  • proxyConfiguration, userAgent: access and request customization

Output

Results are stored in the Actor’s default dataset.

Each analyzed page can include:

  • Identity & metadata: inputUrl, finalUrl, domain, statusCode, title, metaDescription, canonicalUrl, language
  • Text stats: wordCount, paragraphCount, sentenceCount
  • Scoring: classification, aiProbability, humanProbability, confidence, confidenceBand, reviewPriority
  • Signals: lexical/structure/repetition/specificity/citation + long-form template signals
  • Explainability: topReasons, warningFlags, thresholdDecisionReason
  • Debug fields (optional): extractionSource, rawExtractedLength, cleanedExtractedLength
  • Optional content fields: rawText, markdown
  • Meta: timestamp

Triage helpers:

  • confidenceBand
    • low for confidence < 0.40
    • medium for 0.40-0.69
    • high for >= 0.70
  • reviewPriority
    • high: likely_ai with high confidence
    • medium: mixed or uncertain results
    • low: likely human with acceptable confidence

Example summary row:

{
"type": "__summary__",
"totalUrls": 3,
"processedUrls": 3,
"skippedUrls": 0,
"likelyAiCount": 1,
"mixedCount": 1,
"likelyHumanCount": 1,
"insufficientTextCount": 0,
"timestamp": "2026-03-26T10:31:45.000000+00:00"
}

Detection methodology (explainability)

The detector combines:

  1. Heuristic signals

    • lexical diversity, burstiness proxies, repetition patterns
    • generic/explanatory phrasing and templated conclusions
    • specificity and citation grounding proxies
    • long-form template signals for SEO-style structure detection
  2. Rule-based cues

    • formulaic transitions/conclusions
    • low source-grounding markers
    • repetitive stylistic patterns
  3. Model-style layer (detect_with_model)

    • local calibrated scorer over engineered features
    • no paid external dependency
    • pluggable for future model upgrades
  4. Hybrid scoring

    • weighted combination of heuristic and model-style signals
    • weights calibrated via benchmark separation diagnostics
    • reference/docs signal dampening for selected template cues

Classification thresholds (conservative defaults):

  • likely_ai if aiProbability >= 0.80 and confidence >= 0.60
  • likely_human if aiProbability < 0.45 with acceptable confidence
  • mixed otherwise
  • insufficient_text when cleaned text is below minTextLength

Error Handling

The run does not fail because of one URL.

Handled cases include:

  • invalid URLs
  • timeout/network failures
  • blocked/challenged pages
  • JS-heavy pages with little readable text
  • short or insufficient content

Failure rows still include structured fields like inputUrl, classification, error, and timestamp.

Performance and Anti-Blocking

  • Parallel processing via maxConcurrency
  • Capped retry logic per page
  • Optional heavy asset blocking (blockAssets)
  • Apify proxy support (proxyConfiguration)
  • URL deduplication and crawl depth/domain constraints

Benchmarking and Calibration

Reusable benchmark suites in benchmarks/:

  • benchmark-editorial.json
  • benchmark-reference.json
  • benchmark-seo.json
  • benchmark-thin.json

Run benchmark diagnostics:

$python scripts/benchmark_runner.py

Runner output includes:

  • per-URL calibration rows (classification, aiProbability, confidence, bands, priority)
  • grouped totals by benchmark family
  • average template signals per family
  • delta vs editorial baseline and ranked signal separation diagnostics

Integrations & API

  • Run and export through the Apify platform
  • Retrieve results via Apify API (dataset endpoints)
  • Integrate with Python/Node workflows, webhooks, schedules, and automation tools

Run Locally

pip install -r requirements.txt
python main.py

Use local Apify storage input (storage/key_value_stores/default/INPUT.json) or platform input.

Run on Apify

  1. Create/upload Actor as chatgpt-detector
  2. Configure input in the Actor UI
  3. Start a run
  4. Read/export dataset results

Why choose this actor?

  • Conservative by default (false-positive resistant)
  • Explainable scoring with actionable priority labels
  • Benchmark-driven calibration loop built in
  • Production-oriented crawling, output, and failure handling

FAQ

What data does this actor return for each analysis?

For each analyzed URL, the actor returns: classification, aiProbability, humanProbability, confidence, confidenceBand, reviewPriority, plus topReasons, warningFlags, and a structured signals breakdown. Optional fields include rawText and markdown when enabled.

Which AI models can this detector identify?

It does not rely on model “fingerprints”. Instead, it scores statistical and structural writing signals that often correlate with LLM-generated text. This generally generalizes across ChatGPT, Claude, Gemini, Llama, and other models — but edited or mixed content can reduce certainty.

How does the confidence score work?

Confidence reflects how strongly the extracted signals support the predicted class. Middle-range confidence often indicates mixed or heavily edited text, or pages with limited signal density.

Can I analyze multiple items in a single run?

Yes — provide multiple entries in startUrls. You can also enable crawlLinkedPages to analyze additional internal pages with caps via maxDepth and maxPagesPerDomain.

Is technical experience required?

No — run it from the Apify UI. For pipelines, use the Apify API to automate runs and consume datasets.

How accurate is AI detection?

Accuracy depends on length, language, domain style, and how edited the text is. The actor performs best on longer, unedited or lightly edited text. Use confidenceBand, reviewPriority, and topReasons to guide manual review for borderline cases.

What content lengths are supported?

Any length, but pages below minTextLength (default 300 characters of cleaned text) are labeled insufficient_text. Very short text has weaker signals.

Is this a definitive AI detector?

No. It is a probabilistic detector and review-priority tool.

Why do many pages show mixed?

Conservative thresholds intentionally reduce overconfident labels, especially on reference and editorial content.

Why do some pages return insufficient_text?

The page may be very short, blocked, dynamic, or otherwise not extractable above minTextLength.

Can I tune behavior for my domain?

Yes. Use benchmark files + scripts/benchmark_runner.py and reweight signals/thresholds for your content distribution.

SEO Keywords

chatgpt detector
ai content detector apify
gpt text detection
llm generated text checker
ai writing probability tool
website ai text classifier
content authenticity scoring
ai text risk scoring
editorial ai moderation tool
seo template content detector

Actor Permissions

This Actor is designed to run with limited permissions: read input and write dataset output, with optional proxy/KV usage as configured.

Limitations

  • Highly dynamic/blocked pages can reduce extraction quality
  • Mixed human+AI writing remains hard to separate perfectly
  • Edited AI text can resemble human writing
  • Short content has limited signal strength
  • Signal behavior varies by language/domain style

Get Started

Add URLs, run the Actor, inspect confidenceBand + reviewPriority, and iterate with benchmark diagnostics for your domain. 🚀