Google SERP Intelligence Scraper avatar

Google SERP Intelligence Scraper

Pricing

from $5.00 / 1,000 serp pages

Go to Apify Store
Google SERP Intelligence Scraper

Google SERP Intelligence Scraper

Track Google and AI search visibility with structured SERP pages, AI answers, citations, ads, brand share of voice, and report-ready outputs for apps, MCP agents, and internal monitoring.

Pricing

from $5.00 / 1,000 serp pages

Rating

0.0

(0)

Developer

Netdesignr

Netdesignr

Maintained by Community

Actor stats

0

Bookmarked

4

Total users

2

Monthly active users

8 hours ago

Last modified

Share

Track search visibility across Google and AI search engines from one actor.

This actor is built for two audiences:

  • teams that want a production-ready Apify actor they can call from apps, workflows, or dashboards
  • AI agents and MCP-style tools that need clear inputs, stable outputs, and simple downstream parsing

It supports classic Google SERPs, Google AI surfaces, Perplexity, ChatGPT Search, run-level visibility reporting, and optional lead enrichment from cited domains.

Use it when you need one stable interface for:

  • raw SERP evidence
  • AI answer and citation tracking
  • report-grade search visibility summaries
  • scheduled monitoring inside your own products

What You Can Do With It

  • scrape Google web, images, news, and shopping
  • capture Google AI Overview and Google AI Mode when available
  • compare brand visibility across Google, Perplexity, and ChatGPT Search
  • generate a normalized visibility-report for dashboards and internal tools
  • enrich cited domains into business lead records
  • use one actor for both public workflows and internal scheduled monitoring

Good Fit

Use this actor if you need:

  • SEO, GEO, or AEO visibility monitoring
  • brand or competitor share-of-voice tracking
  • citation tracking across AI search engines
  • structured SERP evidence for internal tools
  • report-ready outputs instead of just raw HTML

Common Use Cases

1. SEO rank monitoring

Use the actor to track classic Google result pages for:

  • important commercial queries
  • branded vs non-branded terms
  • category and product landing pages
  • changes in visible competitors over time

Typical output to use:

  • default dataset google-page records
  • REPORT.json for trend summaries and monitoring

2. GEO / AI visibility tracking

Use the actor to compare how often your brand appears in:

  • Google AI Mode
  • Perplexity answers
  • ChatGPT Search answers

Typical output to use:

  • default dataset ai-engine-result records when emitAiResultsToDataset is enabled
  • AI_ENGINE_RESULTS.json
  • REPORT.json

3. Brand share-of-voice reporting

Use brandDomains to see whether your brand is visible across:

  • Google organic results
  • ads
  • AI answer citations
  • related SERP features

Typical output to use:

  • REPORT.json
  • default dataset for evidence review

4. Competitor intelligence

Use the actor to identify:

  • which domains keep appearing for your target queries
  • who shows up in AI citations
  • which result surfaces dominate a query category

Typical output to use:

  • REPORT.json
  • default dataset ai-engine-result records
  • AI_ENGINE_RESULTS.json

5. Internal monitoring backends

Use the actor as a backend service for:

  • dashboards
  • alerting
  • SEO/GEO reporting jobs
  • MCP or agent-based research workflows

Typical output to use:

  • OUTPUT_INDEX.json
  • REPORT.json
  • default dataset for raw Google evidence

Quick “Can I Use This For…” Guide

You can use this actor if you want to:

  • monitor where your brand appears on Google
  • compare Google vs Perplexity vs ChatGPT visibility
  • track citation sources used by AI search tools
  • collect structured SERP evidence for your own app
  • generate run-level visibility summaries for teams or agents
  • enrich cited domains into lead candidates

This actor is probably not the right fit if you need:

  • raw browser session replay
  • generic website crawling outside search visibility use cases
  • guaranteed stable Google AI Mode capture on every query
  • a full CRM enrichment platform by itself

What Makes It Easy To Integrate

  • one actor input for Google and AI search monitoring
  • one dataset that can include both Google page records and AI engine records, each clearly labeled with recordType
  • one run-level summary object in REPORT.json
  • explicit feature flags so agents can enable only what they need
  • additive output design so you can start simple and expand later

Record Types

The actor produces three record families:

google-page

One Google result page with:

  • query metadata
  • organic results
  • images, news, or shopping results depending on searchType
  • ads
  • AI Overview
  • featured snippet
  • People Also Ask
  • knowledge panel
  • related searches
  • videos
  • local pack
  • brand analysis

ai-engine-result

One AI-engine answer record for:

  • Google AI Mode
  • Perplexity
  • ChatGPT Search

Each record includes:

  • answer
  • citations
  • relatedQuestions
  • queryVariants
  • warnings
  • confidence
  • brandAnalysis

visibility-report

One run-level normalized report with:

  • overall run summary
  • per-query visibility
  • per-engine visibility
  • warning breakdown
  • optional lead enrichment

Storage layout:

  • default dataset: google-page records plus ai-engine-result records when emitAiResultsToDataset = true
  • default key-value store: AI_ENGINE_RESULTS.json for AI answer records when enabled
  • default key-value store: REPORT.json for the run-level visibility report when requested
  • default key-value store: OUTPUT_INDEX.json with stable pointers to generated artifacts

How To Think About The Output

Use the actor in two layers:

  1. visibility-report for decisions, dashboards, alerts, and internal workflows
  2. raw evidence records for drill-down, audits, and debugging

That means most agent or application flows should read REPORT.json first, not scan every raw item on every run.

Input Guide

Core search input

  • queries Search strings or full Google search URLs.
  • searchType web, images, news, or shopping
  • maxPagesPerQuery Number of Google pages to scrape per query
  • mobileResults Use a mobile-like result view

Google visibility options

  • includeAiOverview
  • includeAds
  • includePeopleAlsoAsk
  • includeKnowledgePanel
  • includeVideoResults
  • includeLocalPack
  • includeIcons

AI search options

  • googleAiMode off, withSearchResults, or only
  • enablePerplexity
  • perplexityReturnImages
  • perplexityReturnRelatedQuestions
  • perplexitySearchRecency
  • enableChatGptSearch
  • chatGptSearchContextSize
  • emitAiResultsToDataset when true, AI answer records are also written to the default dataset for easier review in Apify

Brand and enrichment options

  • brandDomains Domains to track across search and AI results
  • enableBusinessLeadsEnrichment
  • maxLeadDomains

Output options

  • outputMode raw, report, or both
  • saveHtml
  • saveHtmlToKeyValueStore
  • emptyRunPolicy

Common Input Recipes

Track classic Google rankings only

{
"queries": ["best project management software"],
"searchType": "web",
"scrapeMode": "auto",
"outputMode": "raw"
}

Track Google plus AI search engines for one brand

{
"queries": ["best ai seo platform"],
"searchType": "web",
"googleAiMode": "withSearchResults",
"enablePerplexity": true,
"enableChatGptSearch": true,
"brandDomains": ["yourdomain.com"],
"outputMode": "both"
}

Use the actor as a monitoring backend

{
"queries": [
"best crm for startups",
"best crm for agencies",
"best crm for saas"
],
"searchType": "web",
"includeAiOverview": true,
"includeAds": true,
"googleAiMode": "withSearchResults",
"enablePerplexity": true,
"enableChatGptSearch": true,
"brandDomains": ["hubspot.com", "pipedrive.com"],
"outputMode": "both",
"emptyRunPolicy": "retry_then_warn"
}

Track AI citation visibility for one brand

{
"queries": ["best accounting software for agencies"],
"searchType": "web",
"enablePerplexity": true,
"enableChatGptSearch": true,
"brandDomains": ["xero.com", "quickbooks.intuit.com"],
"outputMode": "both"
}

Collect lead candidates from AI citations

{
"queries": ["best payroll software for small business"],
"searchType": "web",
"enablePerplexity": true,
"enableChatGptSearch": true,
"enableBusinessLeadsEnrichment": true,
"maxLeadDomains": 15,
"outputMode": "report"
}

Quick Start Examples

1. Google web SERP only

{
"queries": ["best running shoes"],
"searchType": "web",
"scrapeMode": "auto",
"outputMode": "raw"
}

2. Google + report output for monitoring

{
"queries": ["nike pegasus 41 review", "brooks ghost 16 review"],
"searchType": "web",
"scrapeMode": "full",
"includeAiOverview": true,
"includeAds": true,
"brandDomains": ["nike.com", "brooksrunning.com"],
"outputMode": "both"
}
{
"queries": ["best crm for startups"],
"searchType": "web",
"scrapeMode": "full",
"googleAiMode": "withSearchResults",
"enablePerplexity": true,
"enableChatGptSearch": true,
"brandDomains": ["hubspot.com", "pipedrive.com"],
"outputMode": "both"
}

4. AI-engine-only visibility run

{
"queries": ["best llm observability tool"],
"searchType": "web",
"googleAiMode": "only",
"enablePerplexity": true,
"enableChatGptSearch": true,
"outputMode": "both"
}

5. Lead enrichment from cited domains

{
"queries": ["best seo tools for agencies"],
"searchType": "web",
"enablePerplexity": true,
"enableChatGptSearch": true,
"enableBusinessLeadsEnrichment": true,
"maxLeadDomains": 10,
"outputMode": "report"
}

API Examples

JavaScript

import { ApifyClient } from 'apify-client';
const client = new ApifyClient({ token: process.env.APIFY_TOKEN });
const input = {
queries: ['nike pegasus 41 review'],
searchType: 'web',
scrapeMode: 'full',
googleAiMode: 'withSearchResults',
includeAiOverview: true,
includeAds: true,
enablePerplexity: true,
enableChatGptSearch: true,
brandDomains: ['nike.com'],
outputMode: 'both',
};
const run = await client.actor('netdesignr/google-serp-scraper').call(input);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
console.log(items);

JavaScript: read only the report

import { ApifyClient } from 'apify-client';
const client = new ApifyClient({ token: process.env.APIFY_TOKEN });
const run = await client.actor('netdesignr/google-serp-scraper').call({
queries: ['best ai seo tool'],
searchType: 'web',
googleAiMode: 'withSearchResults',
enablePerplexity: true,
enableChatGptSearch: true,
outputMode: 'report',
});
const report = await client
.keyValueStore(run.defaultKeyValueStoreId)
.getRecord('REPORT.json');
console.log(report.value);

Python

from apify_client import ApifyClient
import os
client = ApifyClient(os.environ["APIFY_TOKEN"])
run_input = {
"queries": ["best running shoes for beginners"],
"searchType": "web",
"scrapeMode": "auto",
"googleAiMode": "withSearchResults",
"enablePerplexity": True,
"enableChatGptSearch": True,
"brandDomains": ["nike.com"],
"outputMode": "both",
}
run = client.actor("netdesignr/google-serp-scraper").call(run_input=run_input)
items = client.dataset(run["defaultDatasetId"]).list_items().items
print(items)

Python: read REPORT.json from the key-value store

from apify_client import ApifyClient
import os
client = ApifyClient(os.environ["APIFY_TOKEN"])
run = client.actor("netdesignr/google-serp-scraper").call(
run_input={
"queries": ["best llm monitoring tool"],
"searchType": "web",
"enablePerplexity": True,
"enableChatGptSearch": True,
"outputMode": "both",
}
)
report = client.key_value_store(run["defaultKeyValueStoreId"]).get_record("REPORT.json")
print(report["value"])

cURL

curl -X POST "https://api.apify.com/v2/acts/netdesignr~google-serp-scraper/runs?token=$APIFY_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"queries": ["best project management software"],
"searchType": "web",
"googleAiMode": "withSearchResults",
"enablePerplexity": true,
"enableChatGptSearch": true,
"outputMode": "both"
}'

MCP / Agent Usage Pattern

This actor is designed to be easy for agents to call because:

  • inputs are explicit and bounded
  • output types are stable
  • recordType identifies each result family
  • REPORT.json gives a single summary object for downstream reasoning

Recommended agent pattern:

  1. run the actor with outputMode: "both" for full evidence + report
  2. read REPORT.json first for decision-making
  3. inspect the default dataset and AI_ENGINE_RESULTS.json only when the report indicates something interesting

Example agent-oriented input:

{
"queries": ["best ai seo tool"],
"searchType": "web",
"googleAiMode": "withSearchResults",
"enablePerplexity": true,
"enableChatGptSearch": true,
"brandDomains": ["yourdomain.com"],
"outputMode": "both",
"emptyRunPolicy": "retry_then_warn"
}

Output Reading Guide

If you only need dashboard data

Use outputMode: "report" and read:

  • key-value store record REPORT.json

If you need Google-only page evidence

Read the default dataset and filter items where:

  • recordType = "google-page"
  • optionally searchMetadata.searchType = "web" | "images" | "news" | "shopping"

If you need AI-engine answer evidence

Read the default dataset when emitAiResultsToDataset = true, or fall back to key-value store record AI_ENGINE_RESULTS.json, and filter items where:

  • recordType = "ai-engine-result"
  • engine = "google-ai-mode" | "perplexity" | "chatgpt"

Output Contract Highlights

google-page

Important fields:

  • searchMetadata.query
  • searchMetadata.searchType
  • searchMetadata.pageNumber
  • organicResults
  • paidResults
  • aiOverview
  • relatedSearches
  • brandAnalysis

ai-engine-result

Important fields:

  • engine
  • query
  • answer
  • citations
  • relatedQuestions
  • queryVariants
  • warnings
  • confidence
  • brandAnalysis

visibility-report

Important fields:

  • reportMetadata
  • reportSummary
  • engineVisibility
  • warningBreakdown
  • leadEnrichment

Key-Value Store Artifacts

REPORT.json

The run-level normalized report used for:

  • dashboards
  • alerts
  • internal monitoring
  • MCP or agent decision-making

AI_ENGINE_RESULTS.json

An array of ai-engine-result records containing:

  • Google AI Mode results when captured
  • Perplexity results
  • ChatGPT Search results

OUTPUT_INDEX.json

A lightweight index that tells consumers which key-value records were generated for the run.

Integration Notes For Internal Tools

  • prefer outputMode: "both" for scheduled monitoring
  • treat REPORT.json as the contract for dashboards and alerting
  • use the default dataset for Google page evidence
  • use the default dataset for fast AI-answer review and AI_ENGINE_RESULTS.json for stable AI-answer drill-down
  • use warnings and confidence markers to avoid overstating weak captures
  • keep brand domains stable across runs if you want comparable share-of-voice reporting

Error Handling Expectations

  • the actor prefers partial success over silent failure
  • provider-specific features can fail while the run still returns other useful data
  • warnings are part of the output contract and should be consumed, not ignored
  • higher-volatility search surfaces may produce lower-confidence records or no record when a valid capture is not available

If you need evidence + report

Use outputMode: "both" and:

  • read REPORT.json first
  • then inspect the default dataset and AI_ENGINE_RESULTS.json

If you only need raw extraction

Use outputMode: "raw"

FAQ

Does this actor scrape only Google?

No. Google is the core search surface, but the actor can also normalize AI-search visibility from Perplexity and ChatGPT Search when those options are enabled.

What should I read first after a run?

Read OUTPUT_INDEX.json or REPORT.json first. They tell you what was generated and give the fastest summary of the run.

Where do I find Google page results?

In the default dataset. Those records use recordType = "google-page".

Where do I find Perplexity, ChatGPT, or Google AI Mode answers?

In the default dataset when emitAiResultsToDataset = true, or in AI_ENGINE_RESULTS.json in the default key-value store.

Where do I find the normalized summary for dashboards or agents?

In REPORT.json in the default key-value store.

Do I need API keys for every feature?

No. Google page extraction does not require Perplexity or OpenAI credentials. Those credentials are only needed when you enable the corresponding AI-search options.

Can I use this for scheduled monitoring?

Yes. That is one of the main intended use cases. Use stable brandDomains, consistent query sets, and outputMode: "both" for scheduled runs.

Can I use this for one-off research tasks?

Yes. For quick one-off runs, outputMode: "raw" is usually enough. For decision-ready results, use outputMode: "report" or both.

Does the actor always return Google AI Mode?

No. Google AI Mode is a high-drift surface. When a valid capture is not available, the actor will fail that specific capture path safely and preserve diagnostics instead of pretending it succeeded.

Is this suitable for MCP or AI agents?

Yes. The input is explicit, outputs are stable, and the actor exposes summary artifacts that are easier for agents to consume than raw pages alone.

Environment Requirements

Some features require provider credentials:

  • Perplexity support requires PERPLEXITY_API_KEY
  • ChatGPT Search support requires OPENAI_API_KEY

If those options are enabled without credentials, the actor will log warnings and continue with the rest of the run where possible.

Security And Documentation Policy

Public documentation focuses on:

  • what the actor accepts
  • what it returns
  • how to integrate it
  • how to interpret the results

It does not document internal implementation details that are not required to use the product.

Notes

  • Google layouts and AI surfaces change often, so the actor emits warnings instead of failing silently whenever possible.
  • Some Google queries may hit interstitials, consent screens, or anti-bot responses. Those runs still produce diagnostics when possible.
  • Google AI Mode is more volatile than classic web SERPs and should be treated as a higher-drift surface.
  • The actor is built for practical use and stable outputs; public documentation intentionally focuses on inputs, outputs, and workflows rather than implementation internals.

Validation

pnpm --filter @apify-actors/shared build
pnpm --filter google-serp-scraper build
pnpm --filter google-serp-scraper test
pnpm --filter google-serp-scraper run type-check
pnpm --filter google-serp-scraper run test:smoke:live