Bulk SEO Data Extractor
Pricing
from $1.20 / 1,000 results
Bulk SEO Data Extractor
Extract every on-page SEO signal from any URL: title, meta tags, canonical, OG/Twitter cards, JSON-LD schema, heading hierarchy, alt-text gaps, internal/external link counts, word count, text-to-HTML ratio.
Pricing
from $1.20 / 1,000 results
Rating
0.0
(0)
Developer
Thirdwatch
Actor stats
0
Bookmarked
2
Total users
1
Monthly active users
2 days ago
Last modified
Share
Bulk SEO Audit - Titles, Meta, Headings, Schema, Links
The fastest bulk SEO meta tags and headings extractor — audit titles, meta descriptions, headings, canonicals, social cards, and schema across hundreds of URLs in one pass.
What you get
Paste a list of URLs and get a complete on-page SEO snapshot for each one: title, meta description, heading hierarchy, canonical, robots directive, Open Graph and Twitter social cards, schema markup, image alt-text gaps, internal/external link counts, and word count. Designed for SEO consultants, content auditors, and in-house SEO teams who need real on-page data across hundreds of pages without firing up a desktop crawler.
Output fields
| Field | Description |
|---|---|
url | The URL you submitted. |
final_url | The URL after following redirects. |
status_code | HTTP status code (200, 301, 404, etc.). |
title | The <title> tag text. |
title_length | Character count of the title (Google typically truncates over ~60). |
meta_description | The meta description tag. |
meta_description_length | Character count (Google typically truncates over ~160). |
meta_keywords | The meta keywords tag (rarely used today). |
canonical | The canonical URL declared by the page. |
robots_meta | The meta robots directive (e.g. index,follow, noindex). |
viewport | The viewport meta tag (mobile-friendliness signal). |
lang | The lang attribute on the <html> tag. |
charset | Character set declared by the page. |
h1_tags | List of unique H1 headings on the page. |
h2_tags | List of unique H2 headings on the page. |
h1_count | Total H1 tags (more than one is an SEO smell). |
h2_count | Total H2 tags. |
h3_count | Total H3 tags. |
h4_count | Total H4 tags. |
og | Open Graph tags (og:title, og:description, og:image, etc.) as a flat dict. |
twitter | Twitter card tags (twitter:card, twitter:title, etc.) as a flat dict. |
json_ld | Parsed schema markup blocks (Product, Article, FAQ, Breadcrumb, etc.). |
images_total | Total <img> tags on the page. |
images_missing_alt | Images with no alt attribute (accessibility + SEO gap). |
internal_links | Links pointing back to the same domain. |
external_links | Links pointing to other domains. |
nofollow_links | Links marked rel="nofollow". |
word_count | Visible word count (script and style content excluded). |
text_to_html_ratio | Word count divided by HTML byte size — low values flag thin or template-heavy pages. |
response_time_ms | Time to fetch the page in milliseconds. |
content_length | HTML byte size. |
error | Error message on failure (null on success). |
checked_at | ISO timestamp when the audit ran. |
Example output
{"url": "https://example.com","final_url": "https://example.com/","status_code": 200,"title": "Example Domain","title_length": 14,"meta_description": "Illustrative example for documentation.","meta_description_length": 39,"meta_keywords": "","canonical": "https://example.com/","robots_meta": "index,follow","viewport": "width=device-width, initial-scale=1","lang": "en","charset": "utf-8","h1_tags": ["Example Domain"],"h2_tags": ["More information"],"h1_count": 1,"h2_count": 1,"h3_count": 0,"h4_count": 0,"og": {"og:title": "Example Domain","og:description": "Illustrative example for documentation.","og:image": "https://example.com/og.png","og:type": "website","og:url": "https://example.com/","og:site_name": "Example"},"twitter": {"twitter:card": "summary_large_image","twitter:title": "Example Domain","twitter:description": "Illustrative example for documentation.","twitter:image": "https://example.com/og.png"},"json_ld": [{"@context": "https://schema.org","@type": "WebSite","name": "Example","url": "https://example.com/"}],"images_total": 4,"images_missing_alt": 1,"internal_links": 17,"external_links": 3,"nofollow_links": 2,"word_count": 412,"text_to_html_ratio": 0.0421,"response_time_ms": 184,"content_length": 9786,"error": null,"checked_at": "2026-05-04T12:00:00Z"}
When a URL fails, the same shape is returned with error set and the numeric fields zeroed out.
Input parameters
| Parameter | Required | Description |
|---|---|---|
urls | Yes | List of page URLs to audit. Each URL produces one result. |
timeoutSecs | No (default 20) | Maximum seconds to wait for each page (1–120). Slow pages report a timeout error. |
concurrency | No (default 10) | How many pages to fetch in parallel (1–50). |
userAgent | No | The User-Agent header sent with each request. Override to mimic a specific bot or browser. |
proxyConfiguration | No | Optional Apify proxy. Most public pages don't need one. |
Use cases
- SEO consultants running on-page audits — surface every page missing a title, meta description, or canonical, and rank H1 issues across the whole site.
- Content auditors cleaning up legacy content — find thin pages by word count and text-to-HTML ratio, then prioritize rewrites.
- In-house SEO teams monitoring schema coverage — verify every product page has Product schema, every article has Article schema, and every FAQ page has FAQ schema.
- Accessibility leads finding images missing alt text — get a per-page count and prioritize the worst offenders.
- Competitive researchers comparing social previews — pull Open Graph and Twitter cards from a competitor's blog to see how they package shares.
Limitations
- Captures the HTML the server returns. Pages that render their content entirely in the browser (heavy single-page apps) may show empty title, missing headings, or zero links — for those sites, use a browser-based scraper instead.
- Some sites rate-limit aggressive scraping. Use the proxy option if you start hitting 403s.
- Word count is computed from visible text after removing script and style content. Inline SVG and obscure HTML may slightly inflate or reduce the count.
- Schema markup blocks that are syntactically broken are skipped rather than failing the whole audit.
- Authenticated pages are not supported — this is for public URLs only.
Compared to alternatives
| Thirdwatch Bulk SEO Audit | typical "SEO On-Page Analyzer" actors on Apify Store | |
|---|---|---|
| Open Graph + Twitter cards as structured dicts | Yes | Often title/description only |
| Parsed schema markup blocks | Yes (full JSON) | Often missing |
| Heading lists (not just counts) | Yes | Often only counts |
| Internal vs external vs nofollow link counts | Yes | Often "total links" only |
| Image alt-text gap count | Yes | Sometimes |
| Text-to-HTML ratio (thin-content signal) | Yes | Rarely included |
| Concurrency control | Up to 50 parallel | Often 5–10 |
FAQ
Does this respect robots.txt?
This actor sends one request per URL and does not crawl. It uses a clearly identifiable User-Agent (ThirdwatchSEO/1.0) so site owners can recognize and rate-limit it if they want.
What's the rate limit?
You control it via concurrency (max 50). For sites you don't own, keep it low and enable the proxy option if you hit 429 or 403.
Can I audit pages behind a login? No — this actor does not handle authentication, cookies, or session headers. It's designed for public URLs.
Why are some fields empty on JavaScript-heavy sites? Single-page apps render content in the browser after the initial HTML loads. This actor reads the server-rendered HTML only. For React/Vue/Angular sites with no server-side rendering, use a browser-based audit tool instead.
Does it parse every kind of schema markup?
Yes — every <script type="application/ld+json"> block is parsed and returned as structured data. Microdata and RDFa are not parsed.
SEO meta tags and headings extractor for site-wide audits
Run a full on-page SEO audit across an entire site in minutes. Pair this with the Sitemap URL Extractor to pull every URL on the site, then pipe the list into this actor for a complete title/meta/heading/schema audit.
For uptime, redirect, and SSL checks across the same URL list, combine with the Bulk URL Status Checker.
Built and maintained by Thirdwatch.
Last verified: 2026-05