Ai SEO Content Curator avatar
Ai SEO Content Curator

Under maintenance

Pricing

$20.00/month + usage

Go to Store
Ai SEO Content Curator

Ai SEO Content Curator

Under maintenance

Developed by

AI_Builder

AI_Builder

Maintained by Community

The SEO Actor performs a full SEO audit for each URL, extracting key SEO metrics like titles, meta descriptions, and keywords. It also retrieves network information and integrates SEO audit data providing a comprehensive analysis stored in an organized database for further use.

5.0 (1)

Pricing

$20.00/month + usage

6

Total users

79

Monthly users

2

Runs succeeded

>99%

Last modified

2 months ago

AI SEO Content Scraper

The Selenium SEO Scraper is an Apify actor that uses Selenium and a headless Chrome browser to scrape websites, extract SEO-related data, and store it in a structured format. Users provide starting URLs and optional parameters via an input schema, and the actor outputs detailed metadata, network information, SEO audits, and page content to the default Apify dataset.

This documentation explains the input you need to provide and the output you’ll receive.

Input

To run the actor, provide input in JSON format through the Apify console’s “Input” tab or via the API. The input defines the URLs to scrape and controls the scraping scope.

Input Schema

{
"title": "Selenium SEO Scraper",
"type": "object",
"schemaVersion": 1,
"properties": {
"start_urls": {
"title": "Start URLs",
"type": "array",
"description": "The URLs where scraping begins. Can be a list of strings or objects with a 'url' field.",
"prefill": [{"url": "https://example.com"}],
"editor": "requestListSources"
},
"max_depth": {
"title": "Maximum Depth",
"type": "integer",
"description": "How deep to follow links (0 = only start URLs, 1 = one level of links, etc.).",
"default": 1,
"minimum": 0
},
"max_urls": {
"title": "Max URLs",
"type": "integer",
"description": "The maximum number of URLs to scrape.",
"default": 10,
"minimum": 1
},
"search_engine": {
"title": "Search Engine",
"type": "string",
"description": "Optional identifier for future features (e.g., search engine-specific scraping).",
"enum": ["Google", "Bing", "DuckDuckGo"],
"default": "Google"
}
},
"required": ["start_urls"]
}
Input Fields Explained
start_urls (required):
A list of URLs to start scraping from.
Format: Either ["https://example.com"] or [{"url": "https://example.com"}].
Example: [{"url": "https://www.girlsinparis.com/fr/"}].
max_depth (optional, default: 1):
Controls how many levels of links to follow.
0: Scrape only the start URLs.
1: Scrape start URLs and their direct links.
2: Include links from those links, and so on.
Example: 2.
max_urls (optional, default: 10):
Limits the total number of URLs scraped.
Example: 100.
search_engine (optional, default: "Google"):
Currently informational; reserved for future enhancements (e.g., search engine-specific behavior).
Options: "Google", "Bing", "DuckDuckGo".
Example Inputs
Basic Example
Scrape one URL and its direct links:
json
{
"start_urls": ["https://www.girlsinparis.com/fr/"],
"max_depth": 1,
"max_urls": 10
}
Advanced Example
Deeper crawl with multiple URLs:
json
{
"start_urls": [
{"url": "https://www.girlsinparis.com/fr/"},
{"url": "https://example.com"}
],
"max_depth": 2,
"max_urls": 100,
"search_engine": "Google"
}
How to Provide Input
Apify Console:
Go to your actor in the Apify console.
Open the “Input” tab.
Paste your JSON input or use the form (it matches the schema).
Save and run the actor.
API:
Use the Apify API with a POST request to /v2/acts/<actor-id>/runs, including your JSON input in the body.
Refer to the Apify API Docs for details.
Output
The actor stores results in the default Apify dataset, which you can access via the console’s “Dataset” tab or API. Each scraped URL generates a JSON object containing metadata, network stats, SEO audit data, and page content.
Output Structure
json
{
"url": "https://www.girlsinparis.com/fr/",
"info": {
"status": "complete",
"title": "Girls in Paris - Lingerie & Swimwear",
"description": "Explore our collection of lingerie and swimwear designed for comfort and style.",
"firstH1": "Welcome to Girls in Paris",
"pageSize": 12345,
"metaCanonical": "https://www.girlsinparis.com/fr/",
"metaLang": "",
"metaLanguage": "",
"htmlLang": "fr",
"wordCount": 150,
"linksCount": 20,
"linksExternalCount": 5,
"linksInternalCount": 15
},
"network": {
"Ip": "unavailable",
"IpReverse": "unavailable",
"pageSizeCompressed": 12345,
"fileSize": 12345,
"connectTime": 0.5,
"loadTime": 1.2,
"HttpResponseCode": 200,
"HttpContentType": "text/html; charset=UTF-8",
"HttpResponse": "Content-Type: text/html; charset=UTF-8, ...",
"HttpRequest": "User-Agent: Mozilla/5.0, ..."
},
"seoAudit": {
"structuredDataPresent": "ok",
"titleLength": 30,
"titlePresent": "ok",
"descriptionLength": 50,
"descriptionPresent": "ok",
"keywordsPresent": "absent",
"h1Count": 1,
"h2Count": 3,
"headingStructureOk": "ok",
"inlineCssCount": 2,
"jsFilesCount": 5,
"styleFilesCount": 3,
"iframeCount": 0,
"canonicalPresent": "ok",
"htmlLangPresent": "ok",
"metaViewportPresent": "ok",
"robotsMetaPresent": "ok",
"ogTagsPresent": "ok",
"twitterTagsPresent": "absent"
},
"content": "# Welcome to Girls in Paris\nExplore our collection...",
"timestamp": "2025-03-19T06:04:49Z",
"search_engine": "Google"
}
Output Fields Explained
url (string):
The URL that was scraped.
info (object):
Metadata and statistics about the page:
status: Page load status (e.g., "complete").
title: The page’s title.
description: Meta description, if present.
firstH1: Text of the first <h1> tag.
pageSize: Size of the HTML source in bytes.
metaCanonical: Canonical URL from <link rel="canonical">.
metaLang, metaLanguage, htmlLang: Language attributes from meta tags or <html>.
wordCount: Total words in the page text.
linksCount: Total number of <a> tags.
linksExternalCount: Number of external links.
linksInternalCount: Number of internal links.
network (object):
HTTP request and response details:
Ip, IpReverse: IP address and reverse DNS (currently "unavailable" due to Apify environment limitations).
pageSizeCompressed, fileSize: Size of the response content in bytes.
connectTime: Time to first byte in seconds.
loadTime: Total request time in seconds.
HttpResponseCode: HTTP status code (e.g., 200 for success).
HttpContentType: MIME type (e.g., "text/html; charset=UTF-8").
HttpResponse: Full response headers as a string.
HttpRequest: Full request headers as a string.
seoAudit (object):
SEO analysis metrics:
structuredDataPresent: "ok" if structured data (e.g., schema.org) is found, else "missing".
titleLength: Character length of the title.
titlePresent: "ok" if a title exists, else "absent".
descriptionLength: Character length of the meta description.
descriptionPresent: "ok" if a description exists, else "absent".
keywordsPresent: "ok" if meta keywords exist, else "absent".
h1Count, h2Count: Number of <h1> and <h2> tags.
headingStructureOk: "ok" if exactly one <h1> is present, else "problematic".
inlineCssCount: Number of elements with inline CSS.
jsFilesCount: Number of external <script> tags.
styleFilesCount: Number of external <link rel="stylesheet"> tags.
iframeCount: Number of <iframe> tags.
canonicalPresent, htmlLangPresent, metaViewportPresent, robotsMetaPresent, ogTagsPresent, twitterTagsPresent: "ok" if present, else "absent".
content (string):
The main page content converted to Markdown, with scripts and unwanted elements removed.
timestamp (string):
UTC timestamp of when the data was scraped (e.g., "2025-03-19T06:04:49Z").
search_engine (string):
The value provided in the input (e.g., "Google"), currently for informational purposes.
Accessing the Output
Apify Console:
After the actor runs, go to the “Dataset” tab in the Apify console.
View the data online, download it as JSON or CSV, or preview it.
API:
Use the Apify API to fetch the dataset with a GET request to /v2/datasets/<dataset-id>/items.
Example:
bash
curl "https://api.apify.com/v2/datasets/<dataset-id>/items?token=<your-api-token>"
Replace <dataset-id> with the ID from the run and <your-api-token> with your Apify API token.
Notes
IP Information: The Ip and IpReverse fields are marked "unavailable" because direct DNS lookups are restricted in the Apify environment. Other network data (e.g., HttpResponseCode, loadTime) is still provided.
Dynamic Pages: The actor excels at scraping JavaScript-rendered content, ensuring accurate data from modern websites.
Error Handling: If a URL fails to load or data extraction encounters issues, check the “Log” tab for details.