Pricing

from $0.85 / 1,000 urls

SEO Fields Scraper

Extracts website SEO metadata with titles, descriptions, canonicals, robots tags, headings, Open Graph fields, and audit issues. Export data, run via API, schedule and monitor runs, or integrate with other tools.

Pricing

from $0.85 / 1,000 urls

Rating

0.0

(0)

Developer

Trove Vault

Actor stats

Bookmarked

Total users

Monthly active users

a month ago

Last modified

Why Use This Actor

Audit title tags and meta descriptions across important pages.
Catch missing or weak canonicals, noindex directives, and H1 problems.
Review Open Graph and Twitter Card metadata for social sharing previews.
Sample pages from a site without running a heavy browser crawler.
Export structured metadata to Apify datasets, API clients, spreadsheets, or downstream workflows.

What It Extracts

For each processed page, the actor can return:

title, titleLength, and titleStatus
metaDescription, metaDescriptionLength, and metaDescriptionStatus
canonicalUrl and canonicalStatus
robotsMeta, isNoindex, and isNofollow
h1, h1Count, and h2Sample
Open Graph fields such as openGraphTitle, openGraphDescription, and openGraphImage
Twitter Card fields such as twitterTitle, twitterDescription, and twitterImage
renderingUsed, showing whether the row came from http or browser
seoScore, issues, and warnings
structured error fields when a URL cannot be fetched

The seoScore is a simple completeness score from 0 to 100. It is meant for triage and prioritization, not as a replacement for a full SEO strategy. Use renderingUsed to understand whether a row came from the fast HTTP path or from Playwright browser rendering.

Use Cases

Content teams can check whether newly published pages have complete search and social metadata.

SEO consultants can produce a quick metadata export for audits, migration QA, or retainer reporting.

Growth teams can compare competitor landing pages and identify common metadata patterns.

Developers can run the actor after releases to catch missing titles, incorrect canonicals, or accidental noindex tags.

Automation teams can schedule runs and append results to an existing dataset for monitoring.

Input Example

{
  "startUrls": [
    { "url": "https://apify.com/" }
  ],
  "maxPages": 10,
  "crawlDepth": 1,
  "requestTimeoutSecs": 30,
  "renderingMode": "BROWSER_FALLBACK",
  "browserWaitSecs": 5,
  "sameDomainOnly": true,
  "includeOpenGraph": true,
  "includeTwitterCards": true,
  "includeHeadings": true
}

Use crawlDepth: 0 when you already have the exact URLs you want to audit. Use crawlDepth: 1 for a fast same-domain sample from a homepage. Use a sitemap URL when the site exposes one and you want broader coverage.

Input Reference

Field	Type	Description
`startUrls`	array	Website URLs or sitemap URLs to audit.
`maxPages`	integer	Maximum number of successful page rows to create.
`crawlDepth`	integer	Number of HTML link levels to follow from each start URL.
`requestTimeoutSecs`	integer	HTTP timeout for each page or sitemap request.
`renderingMode`	string	`HTTP_ONLY`, `BROWSER_FALLBACK`, or `BROWSER_ONLY`.
`browserWaitSecs`	integer	Extra wait time after browser page load when Playwright is used.
`sameDomainOnly`	boolean	Keeps discovered links on the same hostname as the start URL.
`includeOpenGraph`	boolean	Extracts Open Graph social preview fields.
`includeTwitterCards`	boolean	Extracts Twitter Card social preview fields.
`includeHeadings`	boolean	Extracts H1 and H2 heading signals.
`proxyConfiguration`	object	Optional Apify Proxy settings for blocked sites.
`datasetId`	string	Optional existing dataset to append results to.
`runId`	string	Optional upstream run ID copied into output rows.

Output Example

{
  "url": "https://apify.com/",
  "finalUrl": "https://apify.com/",
  "statusCode": 200,
  "title": "Apify: Full-stack web scraping and data extraction platform",
  "titleLength": 61,
  "titleStatus": "ok",
  "metaDescription": "Apify is a full-stack web scraping and browser automation platform.",
  "metaDescriptionLength": 70,
  "metaDescriptionStatus": "ok",
  "canonicalUrl": "https://apify.com/",
  "canonicalStatus": "self",
  "renderingUsed": "http",
  "robotsMeta": null,
  "isNoindex": false,
  "h1": "Web scraping, automation, and AI agents",
  "h1Count": 1,
  "seoScore": 88,
  "issues": [],
  "warningCount": 2,
  "warnings": ["Missing Twitter Card image", "Missing Open Graph image"],
  "discoveredVia": "input",
  "scrapedAt": "2026-04-27T12:00:00.000Z",
  "error": false
}

API Usage

curl "https://api.apify.com/v2/acts/TroveVault~seo-fields-scraper/runs" \
  -X POST \
  -H "Authorization: Bearer $APIFY_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "startUrls": [{ "url": "https://apify.com/" }],
    "maxPages": 10,
    "crawlDepth": 1,
    "renderingMode": "BROWSER_FALLBACK",
    "sameDomainOnly": true
  }'

After the run finishes, download results from the default dataset URL in the run output or from the Apify Console.

How to Use Browser Rendering

Use HTTP_ONLY for most websites. It is the default, fastest, and cheapest mode. It works when SEO tags are present in the raw HTML returned by the server, which is common for well-built marketing sites, content sites, and many server-rendered apps.

Use BROWSER_FALLBACK when you are not sure. The actor first tries HTTP, checks whether core metadata is present, and only opens Playwright when the raw HTML looks sparse. This is the best setting for mixed crawls because normal pages stay cheap while JavaScript-rendered pages get a second pass.

Use BROWSER_ONLY when you know the site renders metadata or headings in JavaScript. This opens a browser for every page, so it is slower and more expensive. Start with maxPages: 1 to 5, keep crawlDepth: 0 for tests, and increase browserWaitSecs only if the site is slow to hydrate.

For protected websites such as Amazon, browser mode may still return empty or blocked data. Amazon often serves bot checks, alternate HTML, regional pages, or sparse responses to automation. Try BROWSER_FALLBACK or BROWSER_ONLY with Apify Proxy enabled and a very small page limit, but do not expect guaranteed extraction from strongly protected domains.

Limitations

Browser rendering is available through Playwright, but it should be used deliberately because it costs more than HTTP extraction.

It does not perform keyword research, backlink analysis, Core Web Vitals testing, search ranking checks, or screenshot analysis. It focuses on metadata extraction and lightweight page-level QA.

Some websites block automated HTTP clients. If you see BLOCKED, retry with Apify Proxy enabled and keep maxPages small.

Troubleshooting

If results are empty, check that the URL returns public HTML or XML and is not a PDF, image, or login page.

If many pages are blocked, enable Apify Proxy and reduce concurrency by using a smaller maxPages and crawlDepth.

If a canonical appears different from the final URL, inspect redirects and trailing slash behavior before treating it as an error.

If social metadata is missing, verify whether the site uses Open Graph, Twitter Card tags, or JavaScript-rendered metadata. Retry with renderingMode: "BROWSER_FALLBACK" before assuming the tags do not exist.

If the actor finds too many irrelevant URLs, keep sameDomainOnly enabled and start from a narrower section URL or sitemap.

FAQ

Can it crawl a whole website?

It can crawl same-domain links up to the maxPages and crawlDepth limits. For very large websites, use a sitemap and a deliberate page cap.

Does it support sitemap XML?

Yes. Add a sitemap URL to startUrls and the actor will enqueue URLs from <loc> entries when they match the domain rules.

Does it use a browser?

Only when you ask it to. HTTP_ONLY never opens a browser. BROWSER_FALLBACK opens a browser only when the raw HTML is missing useful metadata. BROWSER_ONLY opens a browser for every page.

Can I monitor metadata changes?

Yes. Schedule the actor and compare datasets over time in your own workflow or append runs to a shared datasetId.

What does `seoScore` mean?

It is a lightweight completeness score based on missing or weak metadata fields. Use it to prioritize QA, not as a universal ranking metric.

Will it respect external links?

By default, no. sameDomainOnly keeps the crawl focused on the start URL hostname.

Can it scrape blocked websites?

Sometimes. Enable Apify Proxy when a site blocks datacenter traffic, but always follow the target website's terms and applicable laws.

Should I use browser mode for Amazon?

You can try it, but Amazon is heavily protected and may still return sparse or blocked pages. Use Apify Proxy, maxPages: 1, crawlDepth: 0, and BROWSER_ONLY for a small test before running a larger job.

Use this actor with product, catalog, review, or competitor intelligence actors when you need both page metadata and business data in the same workflow.

Changelog

0.1 Initial release with HTTP-first metadata extraction, social fields, headings, scoring, structured errors, and dataset append support.

Support

Open an issue from the Apify actor page or contact TroveVault with the run ID, input, and a short description of the page behavior you expected.

SEO Metadata Extractor - Full SEO Audit in One Call

santamaria-automations/seo-metadata-extractor

Extract SEO metadata from any website: title, meta description, Open Graph, Twitter Cards, Schema.org JSON-LD, canonical URLs, hreflang, and H1 structure. Export data, run via API, schedule and monitor runs, or integrate with other tools.

Ale

SEO Data Extractor

nocodeventure/seo-data-extractor

Extract comprehensive SEO metadata, headings, links, images, Open Graph tags, Twitter Cards, and technical data from websites. Perfect for SEO audits, competitor analysis, and content optimization. Runs on Apify platform with structured JSON output.

No-Code Venture

SEO Audit Pro

botflowtech/seo-audit-pro

Professional-grade SEO audit tool built for agencies, developers, and marketers. Extract 30+ SEO data points from any webpage including Title Tags, Meta Descriptions, H1-H6 Headings, Status Codes, Canonical URLs, Open Graph tags, Twitter Cards, JSON-LD structured data, and comprehensive SEO health.

BotFlowTech

Meta Tags Extractor

krawlify/meta-tags-extractor

Extract SEO meta tags, Open Graph, Twitter Cards, JSON-LD structured data, and headings from any website. Perfect for SEO analysis, competitor research, and content audits.

Krawlify Krawlify

RapidAPI Scraper

yourapiservice/rapidAPI-scraper

Scrape and download API listings by category and collection from RapidAPI. Extract API names, descriptions, and other metadata using category-specific search queries. Export scraped data, run the scraper via API, schedule and monitor runs, and integrate with other tools seamlessly.

Your API Service

5.0

SEO Analyzer

literal_jacktree/seo-analyzer

Comprehensive on-page SEO analyzer for any URL. Get an instant SEO score (0-100) with actionable recommendations. Checks title tags, meta descriptions, headings, images, links, Open Graph, and content quality. Perfect for SEO audits, competitor analysis, and site optimization.

Janice

TikTok Explore Scraper

clockworks/tiktok-explore-scraper

Extract data from TikTok explore categories including post, author, video, and music data. Export scraped data, run the scraper via API, schedule and monitor runs or integrate with other tools.

Clockworks

221

4.1

Mobile Apps Reviews, Downloads and Data

trovevault/mobile-apps-reviews-downloads-data-scraper

Extracts mobile app reviews, ratings, installs, metadata, versions, sentiment, themes, and competitor signals. Export data, run via API, schedule and monitor runs, or integrate with other tools.