
Ai SEO Content Curator
Pay $10.00 for 1,000 results

Ai SEO Content Curator
Pay $10.00 for 1,000 results
The SEO Actor performs a full SEO audit for each URL, extracting key SEO metrics like titles, meta descriptions, and keywords. It also retrieves network information and integrates SEO audit data providing a comprehensive analysis stored in an organized database for further use.
Actor Metrics
10 monthly users
No reviews yet
6 bookmarks
>99% runs succeeded
Created in Sep 2024
Modified a day ago
AI SEO Content Scraper
The Selenium SEO Scraper is an Apify actor that uses Selenium and a headless Chrome browser to scrape websites, extract SEO-related data, and store it in a structured format. Users provide starting URLs and optional parameters via an input schema, and the actor outputs detailed metadata, network information, SEO audits, and page content to the default Apify dataset.
This documentation explains the input you need to provide and the output you’ll receive.
Input
To run the actor, provide input in JSON format through the Apify console’s “Input” tab or via the API. The input defines the URLs to scrape and controls the scraping scope.
Input Schema
1{ 2 "title": "Selenium SEO Scraper", 3 "type": "object", 4 "schemaVersion": 1, 5 "properties": { 6 "start_urls": { 7 "title": "Start URLs", 8 "type": "array", 9 "description": "The URLs where scraping begins. Can be a list of strings or objects with a 'url' field.", 10 "prefill": [{"url": "https://example.com"}], 11 "editor": "requestListSources" 12 }, 13 "max_depth": { 14 "title": "Maximum Depth", 15 "type": "integer", 16 "description": "How deep to follow links (0 = only start URLs, 1 = one level of links, etc.).", 17 "default": 1, 18 "minimum": 0 19 }, 20 "max_urls": { 21 "title": "Max URLs", 22 "type": "integer", 23 "description": "The maximum number of URLs to scrape.", 24 "default": 10, 25 "minimum": 1 26 }, 27 "search_engine": { 28 "title": "Search Engine", 29 "type": "string", 30 "description": "Optional identifier for future features (e.g., search engine-specific scraping).", 31 "enum": ["Google", "Bing", "DuckDuckGo"], 32 "default": "Google" 33 } 34 }, 35 "required": ["start_urls"] 36} 37 38Input Fields Explained 39start_urls (required): 40A list of URLs to start scraping from. 41 42Format: Either ["https://example.com"] or [{"url": "https://example.com"}]. 43 44Example: [{"url": "https://www.girlsinparis.com/fr/"}]. 45 46max_depth (optional, default: 1): 47Controls how many levels of links to follow. 48 490: Scrape only the start URLs. 50 511: Scrape start URLs and their direct links. 52 532: Include links from those links, and so on. 54 55Example: 2. 56 57max_urls (optional, default: 10): 58Limits the total number of URLs scraped. 59 60Example: 100. 61 62search_engine (optional, default: "Google"): 63Currently informational; reserved for future enhancements (e.g., search engine-specific behavior). 64 65Options: "Google", "Bing", "DuckDuckGo". 66 67Example Inputs 68Basic Example 69Scrape one URL and its direct links: 70json 71 72{ 73 "start_urls": ["https://www.girlsinparis.com/fr/"], 74 "max_depth": 1, 75 "max_urls": 10 76} 77 78Advanced Example 79Deeper crawl with multiple URLs: 80json 81 82{ 83 "start_urls": [ 84 {"url": "https://www.girlsinparis.com/fr/"}, 85 {"url": "https://example.com"} 86 ], 87 "max_depth": 2, 88 "max_urls": 100, 89 "search_engine": "Google" 90} 91 92How to Provide Input 93Apify Console: 94Go to your actor in the Apify console. 95 96Open the “Input” tab. 97 98Paste your JSON input or use the form (it matches the schema). 99 100Save and run the actor. 101 102API: 103Use the Apify API with a POST request to /v2/acts/<actor-id>/runs, including your JSON input in the body. 104 105Refer to the Apify API Docs for details. 106 107Output 108The actor stores results in the default Apify dataset, which you can access via the console’s “Dataset” tab or API. Each scraped URL generates a JSON object containing metadata, network stats, SEO audit data, and page content. 109Output Structure 110json 111 112{ 113 "url": "https://www.girlsinparis.com/fr/", 114 "info": { 115 "status": "complete", 116 "title": "Girls in Paris - Lingerie & Swimwear", 117 "description": "Explore our collection of lingerie and swimwear designed for comfort and style.", 118 "firstH1": "Welcome to Girls in Paris", 119 "pageSize": 12345, 120 "metaCanonical": "https://www.girlsinparis.com/fr/", 121 "metaLang": "", 122 "metaLanguage": "", 123 "htmlLang": "fr", 124 "wordCount": 150, 125 "linksCount": 20, 126 "linksExternalCount": 5, 127 "linksInternalCount": 15 128 }, 129 "network": { 130 "Ip": "unavailable", 131 "IpReverse": "unavailable", 132 "pageSizeCompressed": 12345, 133 "fileSize": 12345, 134 "connectTime": 0.5, 135 "loadTime": 1.2, 136 "HttpResponseCode": 200, 137 "HttpContentType": "text/html; charset=UTF-8", 138 "HttpResponse": "Content-Type: text/html; charset=UTF-8, ...", 139 "HttpRequest": "User-Agent: Mozilla/5.0, ..." 140 }, 141 "seoAudit": { 142 "structuredDataPresent": "ok", 143 "titleLength": 30, 144 "titlePresent": "ok", 145 "descriptionLength": 50, 146 "descriptionPresent": "ok", 147 "keywordsPresent": "absent", 148 "h1Count": 1, 149 "h2Count": 3, 150 "headingStructureOk": "ok", 151 "inlineCssCount": 2, 152 "jsFilesCount": 5, 153 "styleFilesCount": 3, 154 "iframeCount": 0, 155 "canonicalPresent": "ok", 156 "htmlLangPresent": "ok", 157 "metaViewportPresent": "ok", 158 "robotsMetaPresent": "ok", 159 "ogTagsPresent": "ok", 160 "twitterTagsPresent": "absent" 161 }, 162 "content": "# Welcome to Girls in Paris\nExplore our collection...", 163 "timestamp": "2025-03-19T06:04:49Z", 164 "search_engine": "Google" 165} 166 167Output Fields Explained 168url (string): 169The URL that was scraped. 170 171info (object): 172Metadata and statistics about the page: 173status: Page load status (e.g., "complete"). 174 175title: The page’s title. 176 177description: Meta description, if present. 178 179firstH1: Text of the first <h1> tag. 180 181pageSize: Size of the HTML source in bytes. 182 183metaCanonical: Canonical URL from <link rel="canonical">. 184 185metaLang, metaLanguage, htmlLang: Language attributes from meta tags or <html>. 186 187wordCount: Total words in the page text. 188 189linksCount: Total number of <a> tags. 190 191linksExternalCount: Number of external links. 192 193linksInternalCount: Number of internal links. 194 195network (object): 196HTTP request and response details: 197Ip, IpReverse: IP address and reverse DNS (currently "unavailable" due to Apify environment limitations). 198 199pageSizeCompressed, fileSize: Size of the response content in bytes. 200 201connectTime: Time to first byte in seconds. 202 203loadTime: Total request time in seconds. 204 205HttpResponseCode: HTTP status code (e.g., 200 for success). 206 207HttpContentType: MIME type (e.g., "text/html; charset=UTF-8"). 208 209HttpResponse: Full response headers as a string. 210 211HttpRequest: Full request headers as a string. 212 213seoAudit (object): 214SEO analysis metrics: 215structuredDataPresent: "ok" if structured data (e.g., schema.org) is found, else "missing". 216 217titleLength: Character length of the title. 218 219titlePresent: "ok" if a title exists, else "absent". 220 221descriptionLength: Character length of the meta description. 222 223descriptionPresent: "ok" if a description exists, else "absent". 224 225keywordsPresent: "ok" if meta keywords exist, else "absent". 226 227h1Count, h2Count: Number of <h1> and <h2> tags. 228 229headingStructureOk: "ok" if exactly one <h1> is present, else "problematic". 230 231inlineCssCount: Number of elements with inline CSS. 232 233jsFilesCount: Number of external <script> tags. 234 235styleFilesCount: Number of external <link rel="stylesheet"> tags. 236 237iframeCount: Number of <iframe> tags. 238 239canonicalPresent, htmlLangPresent, metaViewportPresent, robotsMetaPresent, ogTagsPresent, twitterTagsPresent: "ok" if present, else "absent". 240 241content (string): 242The main page content converted to Markdown, with scripts and unwanted elements removed. 243 244timestamp (string): 245UTC timestamp of when the data was scraped (e.g., "2025-03-19T06:04:49Z"). 246 247search_engine (string): 248The value provided in the input (e.g., "Google"), currently for informational purposes. 249 250Accessing the Output 251Apify Console: 252After the actor runs, go to the “Dataset” tab in the Apify console. 253 254View the data online, download it as JSON or CSV, or preview it. 255 256API: 257Use the Apify API to fetch the dataset with a GET request to /v2/datasets/<dataset-id>/items. 258 259Example: 260bash 261 262curl "https://api.apify.com/v2/datasets/<dataset-id>/items?token=<your-api-token>" 263 264Replace <dataset-id> with the ID from the run and <your-api-token> with your Apify API token. 265 266Notes 267IP Information: The Ip and IpReverse fields are marked "unavailable" because direct DNS lookups are restricted in the Apify environment. Other network data (e.g., HttpResponseCode, loadTime) is still provided. 268 269Dynamic Pages: The actor excels at scraping JavaScript-rendered content, ensuring accurate data from modern websites. 270 271Error Handling: If a URL fails to load or data extraction encounters issues, check the “Log” tab for details.