Robots.txt Checker - CMS-Aware Analysis with AI Recommendations avatar
Robots.txt Checker - CMS-Aware Analysis with AI Recommendations

Pricing

from $0.01 / 1,000 results

Go to Apify Store
Robots.txt Checker - CMS-Aware Analysis with AI Recommendations

Robots.txt Checker - CMS-Aware Analysis with AI Recommendations

The Robots.txt Checker provides comprehensive analysis of your robots.txt file: Syntax Validation CMS Detection - Identify WordPress, Shopify, Drupal,& 6+ other CMS platforms Best Practice Check Companion File Checks - sitemap.xml, llms.txt, security.txt AI Recommendations - CMS-specific suggestions

Pricing

from $0.01 / 1,000 results

Rating

0.0

(0)

Developer

John Rippy

John Rippy

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

3 days ago

Last modified

Share

Validate robots.txt syntax, detect CMS patterns, and get AI-powered optimization advice by John Rippy | johnrippy.link


What This Actor Does

The Robots.txt Checker provides comprehensive analysis of your robots.txt file:

  1. Syntax Validation - Detect parsing errors and malformed directives
  2. CMS Detection - Identify WordPress, Shopify, Drupal, and 6+ other CMS platforms
  3. Best Practice Checks - Verify sitemap declarations, crawl delays, blocked paths
  4. Companion File Checks - Validate sitemap.xml, llms.txt, security.txt
  5. AI Recommendations - CMS-specific optimization suggestions

Why Use This Actor?

The Problem with Manual Checking

Most developers paste robots.txt into a validator and get syntax errors, but miss:

  • CMS-specific paths that should be blocked
  • Missing sitemap declarations
  • Accidental blocking of important content
  • Security and AI crawler considerations

CMS-Aware Intelligence

This actor detects your CMS and provides targeted recommendations:

CMS DetectedSmart Recommendations
WordPressBlock /wp-admin/, /wp-json/, /?s= search pages
ShopifyBlock /cart/, /checkout/, /admin/, /search?
DrupalBlock /node/, /user/, /admin/, filter paths
MagentoBlock /checkout/, /customer/, /catalogsearch/
WixBlock /_api/, /_partials/, internal paths

Use Cases

1. SEO Audits

Verify clients' robots.txt files don't accidentally block important content.

2. Pre-Launch Checks

Ensure robots.txt is properly configured before launching a new site.

3. Competitor Analysis

Compare robots.txt configurations across competitor sites.

4. Security Compliance

Check for security.txt and ensure proper crawler access controls.


Quick Start Examples

Example 1: Single URL Analysis

{
"url": "https://example.com",
"includeAIRecommendations": true
}

Example 2: Batch Analysis with All Checks

{
"urls": [
"https://yoursite.com",
"https://competitor1.com",
"https://competitor2.com"
],
"includeSitemapCheck": true,
"includeLlmsTxtCheck": true,
"includeSecurityTxtCheck": true
}

Example 3: Demo Mode (Free Testing)

{
"demoMode": true
}

Example 4: With AI Enhancement (BYOK)

{
"url": "https://example.com",
"includeAIRecommendations": true,
"anthropicApiKey": "sk-ant-..."
}

Input Parameters

ParameterTypeRequiredDefaultDescription
demoModebooleanNofalseRun with sample data (free, no URL fetching)
urlstringNo*-Single URL to analyze
urlsarrayNo*-Array of URLs to analyze
includeSitemapCheckbooleanNotrueVerify sitemap.xml exists
includeLlmsTxtCheckbooleanNofalseCheck for llms.txt
includeSecurityTxtCheckbooleanNofalseCheck for security.txt
includeAIRecommendationsbooleanNotrueGenerate AI recommendations
anthropicApiKeystringNo-BYOK for enhanced AI recommendations
webhookUrlstringNo-Webhook URL for integrations

*Either url or urls required unless using demoMode


Output Format

{
"url": "https://example.com",
"robotsTxtUrl": "https://example.com/robots.txt",
"timestamp": "2024-12-25T12:00:00.000Z",
"status": "found",
"score": 85,
"rules": [
{
"userAgent": "*",
"disallow": ["/admin/", "/private/"],
"allow": ["/admin/login"]
}
],
"sitemaps": ["https://example.com/sitemap.xml"],
"hasWildcardUserAgent": true,
"syntaxErrors": [],
"warnings": [],
"bestPractices": {
"hasSitemapDeclaration": true,
"hasReasonableCrawlDelay": true,
"blocksImportantPaths": [],
"allowsSearchEngines": true
},
"detectedCms": "WordPress",
"cmsRecommendations": [
"Consider adding Disallow: /wp-json/ to prevent REST API indexing"
],
"sitemapXml": {
"exists": true,
"url": "https://example.com/sitemap.xml",
"urlCount": 245
},
"llmsTxt": {
"exists": false,
"url": "https://example.com/llms.txt"
},
"securityTxt": {
"exists": true,
"url": "https://example.com/.well-known/security.txt",
"hasContact": true,
"hasExpires": true
},
"recommendations": [
{
"priority": 1,
"category": "cms_specific",
"issue": "WordPress optimization opportunity",
"recommendation": "Block /wp-json/ to prevent REST API indexing",
"impact": "medium"
}
]
}

Scoring System

The actor calculates a 0-100 score based on:

FactorImpact
Syntax errors-10 each (max -30)
Missing sitemap declaration-15
Unreasonable crawl delay (>10s)-10
Blocks important paths-5 each
Blocks search engines-20
Has sitemap.xml+5 (bonus)
Has llms.txt+2 (bonus)
Has security.txt+3 (bonus)

AI Recommendations

Without Anthropic API Key

Uses rule-based recommendations based on:

  • Detected CMS patterns
  • Common SEO best practices
  • Security standards

With Anthropic API Key (BYOK)

Enhanced analysis using Claude to:

  • Identify subtle configuration issues
  • Provide context-aware suggestions
  • Prioritize recommendations by impact

CMS Detection

Detects these platforms by analyzing robots.txt patterns:

  • WordPress - /wp-admin/, /wp-content/, /wp-includes/
  • Shopify - /admin/, /cart/, /checkout/, /collections/
  • Drupal - /node/, /user/, /sites/
  • Joomla - /administrator/, /components/, /modules/
  • Magento - /admin/, /checkout/, /customer/, /catalog/
  • Wix - /_api/, /_files/, /_partials/
  • Squarespace - /config/, /api/, /static/

Webhook Integration

Webhook Payload

{
"event": "robots_txt_analysis_complete",
"timestamp": "2024-12-25T12:00:00.000Z",
"actor": "robots-txt-checker",
"status": "success",
"urlsAnalyzed": 3,
"avgScore": 82,
"results": [...]
}

Perfect For

SEO Agencies

  • Client onboarding audits
  • Competitor analysis
  • Pre-launch checklists

Web Developers

  • CI/CD integration for robots.txt validation
  • CMS migration checks
  • Security compliance

Marketing Teams

  • Ensure content is indexable
  • Verify proper crawler access

Pricing

  • Demo Mode: Free (sample data)
  • Standard Usage: Apify compute units only
  • AI Recommendations: Rule-based free, Claude BYOK for enhanced

  • Technical SEO Auditor - Full on-page SEO analysis
  • Sitemap Generator - Create valid sitemaps
  • PageSpeed Intelligence - Performance + Tech Stack analysis

Built by John Rippy | johnrippy.link


Keywords

robots.txt checker, robots.txt analyzer, robots.txt validator, wordpress robots.txt, shopify robots.txt, seo audit, sitemap validation, llms.txt, security.txt, crawl directives, search engine crawler, googlebot, cms detection, technical seo, ai recommendations