Chatgpt Detector
Pricing
from $10.00 / 1,000 analysed pages
Chatgpt Detector
ChatGPT Detector analyses web pages and estimates whether visible text is AI-generated, human-written, mixed, or insufficient for review. It provides probability scores, confidence bands, review priority, and explainable signals for editorial QA, moderation, compliance, and SEO audits.
Pricing
from $10.00 / 1,000 analysed pages
Rating
5.0
(4)
Developer
Sovanza
Maintained by CommunityActor stats
0
Bookmarked
2
Total users
1
Monthly active users
6 days ago
Last modified
Categories
Share
AI Content Detector — ChatGPT, Claude & AI Text Analyzer with Scoring
Instantly detect whether web page content reads as human-written or AI-generated — including writing patterns commonly associated with ChatGPT, Claude, Gemini, Llama, and other LLMs. Instead of a binary label, this actor returns structured probability + confidence plus explainable reasons for every analysis, so teams can make defensible decisions at scale.
Built for educators, publishers, content agencies, and compliance teams who need reliable, explainable AI content detection — not a black box.
Overview
The AI Content Detector analyzes one or more URLs, extracts the main readable text, and estimates whether the content is:
likely_ailikely_humanmixedinsufficient_text
For every page, it outputs:
- AI probability / human probability
- A confidence score and confidence band
- A reviewPriority label for triage
- A reasons breakdown (
topReasons,warningFlags, andsignals)
This is designed for real-world content: messy pages, mixed edits, SEO templates, and editorial posts — not just clean demo text.
Important disclaimer
AI detection is probabilistic, not definitive. This actor provides risk scoring and review guidance, not proof of authorship. Use it as a screening layer that routes borderline cases to human review.
Key benefits
- Save hours of manual review with automated AI detection across large volumes of pages
- Make defensible decisions with scored, reasoned output — not opaque labels
- Scale screening across many URLs or internal crawls with repeatable runs
- Improve auditability with structured signal breakdowns and exportable reports
- Reduce reputational and compliance risk by flagging AI-like pages before they go live
Features
- AI vs human-like classification for extracted web page text
- Explainable scoring: probabilities, confidence, and top reasons
- Hybrid detection modes:
heuristic,model, orhybrid - Crawl support: optionally follow internal links with caps on depth and pages per domain
- Content extraction pipeline: semantic containers + readability + fallbacks to reduce boilerplate noise
- Long-form template signals tuned for SEO-style patterns: heading repetition, section similarity, CTA/FAQ templates
- Batch processing: multiple start URLs in one run
- Structured output for dashboards and downstream automation
Export formats
Results are written to the Apify dataset and can be exported as JSON, CSV, Excel, and (where supported by Apify export options) XML.
Use cases (high-value)
- Academic integrity: Screen student submissions hosted online (or LMS-exported pages) for AI-like authorship patterns with scored evidence.
- Publishing & editorial: Audit freelance or user-submitted articles before publication to enforce authenticity policies.
- Content agency QA: Verify vendors are delivering human-written work, or flag drafts for additional review.
- SEO & compliance: Identify templated, AI-like pages in a publishing pipeline before indexing.
- HR & recruitment: Screen public application pages/portfolios for AI-like writing patterns (use responsibly).
- Legal & contract review: Flag AI-drafted public disclosures/communications that warrant additional verification.
Why this tool stands out
Most detectors return “AI” vs “Human” with no explanation. This actor provides:
- A confidence score and confidence band
- A reviewPriority label to operationalize triage
- A structured signals object (so you can audit and tune thresholds)
- Clear topReasons so reviewers understand what triggered the verdict
It’s built to support defensible workflows — not just labels.
How to use on Apify
Using the Actor
- Open the Actor on Apify and go to the Input tab.
- Add
startUrlsand set crawl/detection options. - Start the run.
- Open the Dataset tab to inspect scores and explanations.
- Export JSON/CSV/Excel or pull via API for automation.
Input Configuration
Full schema: INPUT_SCHEMA.json. Example:
{"startUrls": [{ "url": "https://blog.apify.com/" },{ "url": "https://en.wikipedia.org/wiki/Natural_language_processing" },{ "url": "https://openai.com/research/" }],"crawlLinkedPages": false,"maxPagesPerDomain": 1,"maxDepth": 0,"includeSubdomains": false,"detectionMode": "hybrid","languageHint": "en","minTextLength": 300,"includeRawText": false,"includeHtmlMetadata": true,"maxConcurrency": 5,"requestTimeoutSecs": 60,"saveMarkdown": false,"blockAssets": true,"includeDebugFields": true,"proxyConfiguration": {"useApifyProxy": true}}
startUrls(required): URL list to analyzecrawlLinkedPages: follow internal linksmaxDepth,maxPagesPerDomain,includeSubdomains: crawl controlsdetectionMode:heuristic,model,hybridlanguageHint: optional language hintminTextLength: minimum cleaned text length for analysisincludeRawText,saveMarkdown,includeHtmlMetadata: output detail controlsblockAssets,requestTimeoutSecs,maxConcurrency: performance controlsincludeDebugFields: include extraction and threshold diagnostics in outputproxyConfiguration,userAgent: access and request customization
Output
Results are stored in the Actor’s default dataset.
Each analyzed page can include:
- Identity & metadata:
inputUrl,finalUrl,domain,statusCode,title,metaDescription,canonicalUrl,language - Text stats:
wordCount,paragraphCount,sentenceCount - Scoring:
classification,aiProbability,humanProbability,confidence,confidenceBand,reviewPriority - Signals: lexical/structure/repetition/specificity/citation + long-form template signals
- Explainability:
topReasons,warningFlags,thresholdDecisionReason - Debug fields (optional):
extractionSource,rawExtractedLength,cleanedExtractedLength - Optional content fields:
rawText,markdown - Meta:
timestamp
Triage helpers:
confidenceBandlowfor confidence< 0.40mediumfor0.40-0.69highfor>= 0.70
reviewPriorityhigh:likely_aiwith high confidencemedium: mixed or uncertain resultslow: likely human with acceptable confidence
Example summary row:
{"type": "__summary__","totalUrls": 3,"processedUrls": 3,"skippedUrls": 0,"likelyAiCount": 1,"mixedCount": 1,"likelyHumanCount": 1,"insufficientTextCount": 0,"timestamp": "2026-03-26T10:31:45.000000+00:00"}
Detection methodology (explainability)
The detector combines:
-
Heuristic signals
- lexical diversity, burstiness proxies, repetition patterns
- generic/explanatory phrasing and templated conclusions
- specificity and citation grounding proxies
- long-form template signals for SEO-style structure detection
-
Rule-based cues
- formulaic transitions/conclusions
- low source-grounding markers
- repetitive stylistic patterns
-
Model-style layer (
detect_with_model)- local calibrated scorer over engineered features
- no paid external dependency
- pluggable for future model upgrades
-
Hybrid scoring
- weighted combination of heuristic and model-style signals
- weights calibrated via benchmark separation diagnostics
- reference/docs signal dampening for selected template cues
Classification thresholds (conservative defaults):
likely_aiifaiProbability >= 0.80andconfidence >= 0.60likely_humanifaiProbability < 0.45with acceptable confidencemixedotherwiseinsufficient_textwhen cleaned text is belowminTextLength
Error Handling
The run does not fail because of one URL.
Handled cases include:
- invalid URLs
- timeout/network failures
- blocked/challenged pages
- JS-heavy pages with little readable text
- short or insufficient content
Failure rows still include structured fields like inputUrl, classification, error, and timestamp.
Performance and Anti-Blocking
- Parallel processing via
maxConcurrency - Capped retry logic per page
- Optional heavy asset blocking (
blockAssets) - Apify proxy support (
proxyConfiguration) - URL deduplication and crawl depth/domain constraints
Benchmarking and Calibration
Reusable benchmark suites in benchmarks/:
benchmark-editorial.jsonbenchmark-reference.jsonbenchmark-seo.jsonbenchmark-thin.json
Run benchmark diagnostics:
$python scripts/benchmark_runner.py
Runner output includes:
- per-URL calibration rows (
classification,aiProbability,confidence, bands, priority) - grouped totals by benchmark family
- average template signals per family
- delta vs editorial baseline and ranked signal separation diagnostics
Integrations & API
- Run and export through the Apify platform
- Retrieve results via Apify API (dataset endpoints)
- Integrate with Python/Node workflows, webhooks, schedules, and automation tools
Run Locally
pip install -r requirements.txtpython main.py
Use local Apify storage input (storage/key_value_stores/default/INPUT.json) or platform input.
Run on Apify
- Create/upload Actor as
chatgpt-detector - Configure input in the Actor UI
- Start a run
- Read/export dataset results
Why choose this actor?
- Conservative by default (false-positive resistant)
- Explainable scoring with actionable priority labels
- Benchmark-driven calibration loop built in
- Production-oriented crawling, output, and failure handling
FAQ
What data does this actor return for each analysis?
For each analyzed URL, the actor returns: classification, aiProbability, humanProbability, confidence, confidenceBand, reviewPriority, plus topReasons, warningFlags, and a structured signals breakdown. Optional fields include rawText and markdown when enabled.
Which AI models can this detector identify?
It does not rely on model “fingerprints”. Instead, it scores statistical and structural writing signals that often correlate with LLM-generated text. This generally generalizes across ChatGPT, Claude, Gemini, Llama, and other models — but edited or mixed content can reduce certainty.
How does the confidence score work?
Confidence reflects how strongly the extracted signals support the predicted class. Middle-range confidence often indicates mixed or heavily edited text, or pages with limited signal density.
Can I analyze multiple items in a single run?
Yes — provide multiple entries in startUrls. You can also enable crawlLinkedPages to analyze additional internal pages with caps via maxDepth and maxPagesPerDomain.
Is technical experience required?
No — run it from the Apify UI. For pipelines, use the Apify API to automate runs and consume datasets.
How accurate is AI detection?
Accuracy depends on length, language, domain style, and how edited the text is. The actor performs best on longer, unedited or lightly edited text. Use confidenceBand, reviewPriority, and topReasons to guide manual review for borderline cases.
What content lengths are supported?
Any length, but pages below minTextLength (default 300 characters of cleaned text) are labeled insufficient_text. Very short text has weaker signals.
Is this a definitive AI detector?
No. It is a probabilistic detector and review-priority tool.
Why do many pages show mixed?
Conservative thresholds intentionally reduce overconfident labels, especially on reference and editorial content.
Why do some pages return insufficient_text?
The page may be very short, blocked, dynamic, or otherwise not extractable above minTextLength.
Can I tune behavior for my domain?
Yes. Use benchmark files + scripts/benchmark_runner.py and reweight signals/thresholds for your content distribution.
SEO Keywords
chatgpt detector
ai content detector apify
gpt text detection
llm generated text checker
ai writing probability tool
website ai text classifier
content authenticity scoring
ai text risk scoring
editorial ai moderation tool
seo template content detector
Actor Permissions
This Actor is designed to run with limited permissions: read input and write dataset output, with optional proxy/KV usage as configured.
Limitations
- Highly dynamic/blocked pages can reduce extraction quality
- Mixed human+AI writing remains hard to separate perfectly
- Edited AI text can resemble human writing
- Short content has limited signal strength
- Signal behavior varies by language/domain style
Get Started
Add URLs, run the Actor, inspect confidenceBand + reviewPriority, and iterate with benchmark diagnostics for your domain. 🚀
