Noindex Directive Validator avatar

Noindex Directive Validator

Pricing

$2.99/month + usage

Go to Apify Store
Noindex Directive Validator

Noindex Directive Validator

Noindex checker that scans URLs for meta robots and X-Robots-Tag headers, so SEO teams can find pages accidentally blocked from indexing before they drop out of search results.

Pricing

$2.99/month + usage

Rating

0.0

(0)

Developer

ZeroBreak

ZeroBreak

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

17 days ago

Last modified

Categories

Share

Noindex Directive Validator: Check Any URL for Noindex Tags and Headers

The noindex directive validator checks URLs for indexing blocks that might be hiding pages from Google. Feed it a list of URLs and it reads the meta robots tag and X-Robots-Tag HTTP header on each page, then tells you which ones have noindex set, along with the HTTP status code and the raw directive content.

Most useful after a site migration, CMS update, or new deployment. A stray noindex on a production page can silently drop it from search results. Catching them manually across dozens of pages is slow. This does it in bulk.

Use cases

  • SEO audits: verify that no important pages are accidentally blocked from search indexing across an entire site section
  • Post-migration checks: confirm that noindex tags used on staging have been removed before or after launch
  • CMS validation: catch cases where a CMS update or plugin added noindex to pages it should not have
  • Developer QA: run a quick crawlability check on a list of pages before publishing
  • Ongoing monitoring: schedule regular runs to catch noindex regressions before they affect rankings

What data does this actor extract?

Each URL in the dataset includes:

{
"url": "https://apify.com/about",
"finalUrl": "https://apify.com/about",
"httpStatus": 200,
"noindex": false,
"noindexInMetaRobots": false,
"noindexInXRobotsTag": false,
"metaRobotsContent": "index, follow",
"xRobotsTagContent": "",
"pageTitle": "About Apify",
"checkedAt": "2025-06-15T10:23:45.123456+00:00",
"error": ""
}
FieldTypeDescription
urlstringThe original URL submitted for checking
finalUrlstringThe URL after following any redirects
httpStatusintegerHTTP status code (200, 301, 404, etc.)
noindexbooleanTrue if noindex was found anywhere on the page
noindexInMetaRobotsbooleanTrue if noindex is in a <meta name="robots"> or <meta name="googlebot"> tag
noindexInXRobotsTagbooleanTrue if noindex is in the X-Robots-Tag HTTP response header
metaRobotsContentstringFull content of the meta robots tag, if present
xRobotsTagContentstringFull value of the X-Robots-Tag header, if present
pageTitlestringPage title from the <title> element
checkedAtstringISO 8601 timestamp of when the check ran
errorstringError message if the request failed; empty on success

Input

ParameterTypeDefaultDescription
urlstringSingle URL to check
urlsarrayList of URLs to check, one per line
maxUrlsinteger100Maximum number of URLs to process (up to 1000)
requestTimeoutSecsinteger30Timeout per request in seconds
proxyConfigurationobjectDatacenter (Anywhere)Proxy type and location to use for requests. Optional.

Example input

{
"urls": [
"https://apify.com",
"https://apify.com/about",
"https://apify.com/pricing"
],
"maxUrls": 100,
"requestTimeoutSecs": 30,
"proxyConfiguration": { "useApifyProxy": true }
}

How it works

  1. Takes the submitted URLs, deduplicates them, and normalizes missing schemes to https://
  2. Fetches each URL using an HTTP client that follows redirects
  3. Reads the X-Robots-Tag response header and checks it for noindex or none
  4. Parses the HTML and looks for <meta name="robots"> and <meta name="googlebot"> tags with noindex or none in their content
  5. Pushes a result record per URL with the noindex status, raw directive values, HTTP status code, and page title

FAQ

Does this check the robots.txt file? No. This actor checks page-level noindex directives only: the meta robots tag and X-Robots-Tag header. Robots.txt controls crawling access, not indexing, and is a separate concern.

What counts as noindex? Both noindex and none directives (as defined by Google) trigger the noindex flag. none means noindex and nofollow combined.

Does it check Google-specific noindex tags? Yes. The actor checks both <meta name="robots"> and <meta name="googlebot"> tags.

What happens if a URL redirects? The actor follows redirects and checks the final destination page. Both the original URL and the final URL are recorded in the output.

How many URLs can I check per run? Up to 1,000 URLs per run. Set the maxUrls input to control the limit.

Can I run this on a schedule? Yes. Use Apify's scheduling feature to run the actor automatically at regular intervals to catch noindex regressions over time.

Integrations

Connect Noindex Directive Validator with other apps and services using Apify integrations. You can integrate with Make, Zapier, Slack, Airbyte, GitHub, Google Sheets, Google Drive, and many more. You can also use webhooks to trigger actions whenever results are available.

Run the noindex checker before a site launch to confirm every page you want indexed is actually indexable.