Noindex Directive Validator
Pricing
$2.99/month + usage
Noindex Directive Validator
Noindex checker that scans URLs for meta robots and X-Robots-Tag headers, so SEO teams can find pages accidentally blocked from indexing before they drop out of search results.
Noindex Directive Validator: Check Any URL for Noindex Tags and Headers
The noindex directive validator checks URLs for indexing blocks that might be hiding pages from Google. Feed it a list of URLs and it reads the meta robots tag and X-Robots-Tag HTTP header on each page, then tells you which ones have noindex set, along with the HTTP status code and the raw directive content.
Most useful after a site migration, CMS update, or new deployment. A stray noindex on a production page can silently drop it from search results. Catching them manually across dozens of pages is slow. This does it in bulk.
Use cases
- SEO audits: verify that no important pages are accidentally blocked from search indexing across an entire site section
- Post-migration checks: confirm that noindex tags used on staging have been removed before or after launch
- CMS validation: catch cases where a CMS update or plugin added noindex to pages it should not have
- Developer QA: run a quick crawlability check on a list of pages before publishing
- Ongoing monitoring: schedule regular runs to catch noindex regressions before they affect rankings
What data does this actor extract?
Each URL in the dataset includes:
{"url": "https://apify.com/about","finalUrl": "https://apify.com/about","httpStatus": 200,"noindex": false,"noindexInMetaRobots": false,"noindexInXRobotsTag": false,"metaRobotsContent": "index, follow","xRobotsTagContent": "","pageTitle": "About Apify","checkedAt": "2025-06-15T10:23:45.123456+00:00","error": ""}
| Field | Type | Description |
|---|---|---|
url | string | The original URL submitted for checking |
finalUrl | string | The URL after following any redirects |
httpStatus | integer | HTTP status code (200, 301, 404, etc.) |
noindex | boolean | True if noindex was found anywhere on the page |
noindexInMetaRobots | boolean | True if noindex is in a <meta name="robots"> or <meta name="googlebot"> tag |
noindexInXRobotsTag | boolean | True if noindex is in the X-Robots-Tag HTTP response header |
metaRobotsContent | string | Full content of the meta robots tag, if present |
xRobotsTagContent | string | Full value of the X-Robots-Tag header, if present |
pageTitle | string | Page title from the <title> element |
checkedAt | string | ISO 8601 timestamp of when the check ran |
error | string | Error message if the request failed; empty on success |
Input
| Parameter | Type | Default | Description |
|---|---|---|---|
url | string | Single URL to check | |
urls | array | List of URLs to check, one per line | |
maxUrls | integer | 100 | Maximum number of URLs to process (up to 1000) |
requestTimeoutSecs | integer | 30 | Timeout per request in seconds |
proxyConfiguration | object | Datacenter (Anywhere) | Proxy type and location to use for requests. Optional. |
Example input
{"urls": ["https://apify.com","https://apify.com/about","https://apify.com/pricing"],"maxUrls": 100,"requestTimeoutSecs": 30,"proxyConfiguration": { "useApifyProxy": true }}
How it works
- Takes the submitted URLs, deduplicates them, and normalizes missing schemes to
https:// - Fetches each URL using an HTTP client that follows redirects
- Reads the
X-Robots-Tagresponse header and checks it fornoindexornone - Parses the HTML and looks for
<meta name="robots">and<meta name="googlebot">tags withnoindexornonein their content - Pushes a result record per URL with the noindex status, raw directive values, HTTP status code, and page title
FAQ
Does this check the robots.txt file? No. This actor checks page-level noindex directives only: the meta robots tag and X-Robots-Tag header. Robots.txt controls crawling access, not indexing, and is a separate concern.
What counts as noindex?
Both noindex and none directives (as defined by Google) trigger the noindex flag. none means noindex and nofollow combined.
Does it check Google-specific noindex tags?
Yes. The actor checks both <meta name="robots"> and <meta name="googlebot"> tags.
What happens if a URL redirects? The actor follows redirects and checks the final destination page. Both the original URL and the final URL are recorded in the output.
How many URLs can I check per run?
Up to 1,000 URLs per run. Set the maxUrls input to control the limit.
Can I run this on a schedule? Yes. Use Apify's scheduling feature to run the actor automatically at regular intervals to catch noindex regressions over time.
Integrations
Connect Noindex Directive Validator with other apps and services using Apify integrations. You can integrate with Make, Zapier, Slack, Airbyte, GitHub, Google Sheets, Google Drive, and many more. You can also use webhooks to trigger actions whenever results are available.
Run the noindex checker before a site launch to confirm every page you want indexed is actually indexable.
