Pricing

$20.00/month + usage

Go to Store

Ai SEO Content Curator

Try for free

Developed by

AI_Builder

The SEO Actor performs a full SEO audit for each URL, extracting key SEO metrics like titles, meta descriptions, and keywords. It also retrieves network information and integrates SEO audit data providing a comprehensive analysis stored in an organized database for further use.

5.0 (1)

Pricing

$20.00/month + usage

Total users

Monthly users

Runs succeeded

>99%

Last modified

a month ago

SEO tools

The Selenium SEO Scraper is an Apify actor that uses Selenium and a headless Chrome browser to scrape websites, extract SEO-related data, and store it in a structured format. Users provide starting URLs and optional parameters via an input schema, and the actor outputs detailed metadata, network information, SEO audits, and page content to the default Apify dataset.

This documentation explains the input you need to provide and the output you’ll receive.

Input

To run the actor, provide input in JSON format through the Apify console’s “Input” tab or via the API. The input defines the URLs to scrape and controls the scraping scope.

Input Schema

{
    "title": "Selenium SEO Scraper",
    "type": "object",
    "schemaVersion": 1,
    "properties": {
        "start_urls": {
            "title": "Start URLs",
            "type": "array",
            "description": "The URLs where scraping begins. Can be a list of strings or objects with a 'url' field.",
            "prefill": [{"url": "https://example.com"}],
            "editor": "requestListSources"
        },
        "max_depth": {
            "title": "Maximum Depth",
            "type": "integer",
            "description": "How deep to follow links (0 = only start URLs, 1 = one level of links, etc.).",
            "default": 1,
            "minimum": 0
        },
        "max_urls": {
            "title": "Max URLs",
            "type": "integer",
            "description": "The maximum number of URLs to scrape.",
            "default": 10,
            "minimum": 1
        },
        "search_engine": {
            "title": "Search Engine",
            "type": "string",
            "description": "Optional identifier for future features (e.g., search engine-specific scraping).",
            "enum": ["Google", "Bing", "DuckDuckGo"],
            "default": "Google"
        }
    },
    "required": ["start_urls"]
}

Input Fields Explained
start_urls (required):
A list of URLs to start scraping from.

Format: Either ["https://example.com"] or [{"url": "https://example.com"}].

Example: [{"url": "https://www.girlsinparis.com/fr/"}].

max_depth (optional, default: 1):
Controls how many levels of links to follow.

0: Scrape only the start URLs.

1: Scrape start URLs and their direct links.

2: Include links from those links, and so on.

Example: 2.

max_urls (optional, default: 10):
Limits the total number of URLs scraped.

Example: 100.

search_engine (optional, default: "Google"):
Currently informational; reserved for future enhancements (e.g., search engine-specific behavior).

Options: "Google", "Bing", "DuckDuckGo".

Example Inputs
Basic Example
Scrape one URL and its direct links:
json

{
    "start_urls": ["https://www.girlsinparis.com/fr/"],
    "max_depth": 1,
    "max_urls": 10
}

Advanced Example
Deeper crawl with multiple URLs:
json

{
    "start_urls": [
        {"url": "https://www.girlsinparis.com/fr/"},
        {"url": "https://example.com"}
    ],
    "max_depth": 2,
    "max_urls": 100,
    "search_engine": "Google"
}

How to Provide Input
Apify Console:
Go to your actor in the Apify console.

Open the “Input” tab.

Paste your JSON input or use the form (it matches the schema).

Save and run the actor.

API:
Use the Apify API with a POST request to /v2/acts/<actor-id>/runs, including your JSON input in the body.

Refer to the Apify API Docs for details.

Output
The actor stores results in the default Apify dataset, which you can access via the console’s “Dataset” tab or API. Each scraped URL generates a JSON object containing metadata, network stats, SEO audit data, and page content.
Output Structure
json

{
    "url": "https://www.girlsinparis.com/fr/",
    "info": {
        "status": "complete",
        "title": "Girls in Paris - Lingerie & Swimwear",
        "description": "Explore our collection of lingerie and swimwear designed for comfort and style.",
        "firstH1": "Welcome to Girls in Paris",
        "pageSize": 12345,
        "metaCanonical": "https://www.girlsinparis.com/fr/",
        "metaLang": "",
        "metaLanguage": "",
        "htmlLang": "fr",
        "wordCount": 150,
        "linksCount": 20,
        "linksExternalCount": 5,
        "linksInternalCount": 15
    },
    "network": {
        "Ip": "unavailable",
        "IpReverse": "unavailable",
        "pageSizeCompressed": 12345,
        "fileSize": 12345,
        "connectTime": 0.5,
        "loadTime": 1.2,
        "HttpResponseCode": 200,
        "HttpContentType": "text/html; charset=UTF-8",
        "HttpResponse": "Content-Type: text/html; charset=UTF-8, ...",
        "HttpRequest": "User-Agent: Mozilla/5.0, ..."
    },
    "seoAudit": {
        "structuredDataPresent": "ok",
        "titleLength": 30,
        "titlePresent": "ok",
        "descriptionLength": 50,
        "descriptionPresent": "ok",
        "keywordsPresent": "absent",
        "h1Count": 1,
        "h2Count": 3,
        "headingStructureOk": "ok",
        "inlineCssCount": 2,
        "jsFilesCount": 5,
        "styleFilesCount": 3,
        "iframeCount": 0,
        "canonicalPresent": "ok",
        "htmlLangPresent": "ok",
        "metaViewportPresent": "ok",
        "robotsMetaPresent": "ok",
        "ogTagsPresent": "ok",
        "twitterTagsPresent": "absent"
    },
    "content": "# Welcome to Girls in Paris\nExplore our collection...",
    "timestamp": "2025-03-19T06:04:49Z",
    "search_engine": "Google"
}

Output Fields Explained
url (string):
The URL that was scraped.

info (object):
Metadata and statistics about the page:
status: Page load status (e.g., "complete").

title: The page’s title.

description: Meta description, if present.

firstH1: Text of the first <h1> tag.

pageSize: Size of the HTML source in bytes.

metaCanonical: Canonical URL from <link rel="canonical">.

metaLang, metaLanguage, htmlLang: Language attributes from meta tags or <html>.

wordCount: Total words in the page text.

linksCount: Total number of <a> tags.

linksExternalCount: Number of external links.

linksInternalCount: Number of internal links.

network (object):
HTTP request and response details:
Ip, IpReverse: IP address and reverse DNS (currently "unavailable" due to Apify environment limitations).

pageSizeCompressed, fileSize: Size of the response content in bytes.

connectTime: Time to first byte in seconds.

loadTime: Total request time in seconds.

HttpResponseCode: HTTP status code (e.g., 200 for success).

HttpContentType: MIME type (e.g., "text/html; charset=UTF-8").

HttpResponse: Full response headers as a string.

HttpRequest: Full request headers as a string.

seoAudit (object):
SEO analysis metrics:
structuredDataPresent: "ok" if structured data (e.g., schema.org) is found, else "missing".

titleLength: Character length of the title.

titlePresent: "ok" if a title exists, else "absent".

descriptionLength: Character length of the meta description.

descriptionPresent: "ok" if a description exists, else "absent".

keywordsPresent: "ok" if meta keywords exist, else "absent".

h1Count, h2Count: Number of <h1> and <h2> tags.

headingStructureOk: "ok" if exactly one <h1> is present, else "problematic".

inlineCssCount: Number of elements with inline CSS.

jsFilesCount: Number of external <script> tags.

styleFilesCount: Number of external <link rel="stylesheet"> tags.

iframeCount: Number of <iframe> tags.

canonicalPresent, htmlLangPresent, metaViewportPresent, robotsMetaPresent, ogTagsPresent, twitterTagsPresent: "ok" if present, else "absent".

content (string):
The main page content converted to Markdown, with scripts and unwanted elements removed.

timestamp (string):
UTC timestamp of when the data was scraped (e.g., "2025-03-19T06:04:49Z").

search_engine (string):
The value provided in the input (e.g., "Google"), currently for informational purposes.

Accessing the Output
Apify Console:
After the actor runs, go to the “Dataset” tab in the Apify console.

View the data online, download it as JSON or CSV, or preview it.

API:
Use the Apify API to fetch the dataset with a GET request to /v2/datasets/<dataset-id>/items.

Example:
bash

curl "https://api.apify.com/v2/datasets/<dataset-id>/items?token=<your-api-token>"

Replace <dataset-id> with the ID from the run and <your-api-token> with your Apify API token.

Notes
IP Information: The Ip and IpReverse fields are marked "unavailable" because direct DNS lookups are restricted in the Apify environment. Other network data (e.g., HttpResponseCode, loadTime) is still provided.

Dynamic Pages: The actor excels at scraping JavaScript-rendered content, ensuring accurate data from modern websites.

Error Handling: If a URL fails to load or data extraction encounters issues, check the “Log” tab for details.

On this page

AI SEO Content Scraper
- Input
  - Input Schema

Share Actor:

SEO Checker

louisdeconinck/seo-checker

SEO Checker is an advanced Actor that performs comprehensive on-site SEO analysis for any website. It crawls web pages and extracts crucial SEO elements, providing detailed insights to help improve your website's search engine optimization.

Louis Deconinck

118

5.0

Pro SEO Audit Tool - Get Your Website Data for Search Engines

dainty_screw/seo-audit-tool-pro

Elevate your website's search engine ranking with our SEO Audit Tool Pro. Conduct comprehensive SEO audits to identify broken links, missing images, and potential page enhancements. Unlock insights for optimal website performance and visibility.

codemaster devops

SEO Audit Tool

misceres/seo-audit-tool

Search Engine Optimization tool to carry out an SEO audit on any website. Finds broken links, missing images, and provides information about possible page improvements.

Misceres

3.1K

SEO Site Checkup

canadesk/seo-site-checkup

Run checks for common SEO issues, speed optimizations, mobile usability, security and more!

Canadesk Support

292

Simple SEO Auditor Plus

pajoe/simple-seo-auditor-plus

Run a fast, free SEO audit to analyze your website’s meta tags, broken links, page speed, mobile usability, and structured data. Get detailed, actionable insights to boost your search rankings.

va-gasd

AI Content Intelligence Pro

apify_daniel/ai-content-seo-optimizer

Professional content analysis tool. Analyzes performance and SEO opportunities. Essential for content marketers and digital agencies.

Daniel Mayne

Simple SEO Data Extractor

onescales/simple-seo-data-extractor

Grab SEO data from any webpage / URL and export the URL, Title Tag, Meta Description, Meta Keywords, Status Code, Canonical Tag and Meta Robots easily. Run the scraper for 1-100,000 pages. Run one time or on schedule or via API.

One Scales

5.0

Competitor-Based Keyword Recommendations for On-Page SEO

antonio_espresso/keyword-competitor-recommendation

This actor takes a keyword, language, and Google engine, then returns structured SEO insights: ideal word count, title/content terms with usage ranges, relevant questions (H1–H3, PAA), and competitor data including URLs, rankings, titles, and content scores.