Pricing

from $2.00 / 1,000 results

Google Search Results Scraper

Google Search Results Scraper returns normalized, review-ready API data with robust inputs, consistent outputs, and pay-per-result pricing.

Pricing

from $2.00 / 1,000 results

Rating

0.0

(0)

Developer

Virtual Footprint LLC

Actor stats

Bookmarked

Total users

Monthly active users

13 hours ago

Last modified

Dockerfile

FROM apify/actor-python:3.12

COPY requirements.txt ./
RUN pip install --no-cache-dir -r requirements.txt
COPY . ./
CMD ["python", "main.py"]

icon.svg

1<svg xmlns="http://www.w3.org/2000/svg" width="512" height="512" viewBox="0 0 512 512" role="img" aria-label="Google Search Results Scraper icon">
2  <rect width="512" height="512" rx="92" fill="#172554"/>
3  <circle cx="154" cy="156" r="72" fill="#84cc16" opacity="0.95"/>
4  <path d="M128 332c48-94 112-142 192-142 52 0 96 20 132 60-42 68-98 102-168 102-58 0-110-7-156-20z" fill="#ffffff" opacity="0.92"/>
5  <path d="M108 386h296" stroke="#84cc16" stroke-width="38" stroke-linecap="round"/>
6  <circle cx="342" cy="168" r="34" fill="#ffffff" opacity="0.98"/>
7</svg>

main.py

1"""
2Google Search Results API - Real-time SERP Scraping
3Scrapes Google search results with TLS fingerprint bypass and proxy support.
4"""
5from __future__ import annotations
6
7import asyncio
8import json
9import os
10import random
11import re
12from datetime import datetime, timezone
13from typing import Any, List
14from urllib.parse import quote_plus, unquote
15
16try:
17    from curl_cffi import requests
18    HAS_CURL = True
19except ImportError:
20    import requests
21    HAS_CURL = False
22
23from apify import Actor
24
25
26USER_AGENTS = [
27    'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36',
28    'Mozilla/5.0 (Macintosh; Intel Mac OS X 14_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/18.1 Safari/605.1.15',
29    'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36',
30]
31
32
33def sanitize_text(text: str, max_len: int = 500) -> str:
34    if not text: return ''
35    return re.sub(r'\s+', ' ', text.strip())[:max_len]
36
37
38def extract_url(href: str) -> str:
39    if not href: return ''
40    m = re.search(r'/url\?q=([^&]+)', href)
41    if m: return unquote(m.group(1))
42    m = re.search(r'url=([^&]+)', href)
43    if m: return unquote(m.group(1))
44    return href if href.startswith('http') else ''
45
46
47def parse_results_soup(html: str) -> List[dict[str, Any]]:
48    """Parse Google search results using regex for speed and resilience."""
49    results = []
50    seen_urls = set()
51
52    # Pattern 1: Standard result blocks div.g > div > a > h3
53    blocks = re.findall(
54        r'<div[^>]*class="[^"]*g[^"]*"[^>]*>.*?</div>\s*</div>\s*</div>',
55        html, re.DOTALL
56    )[:50]
57
58    for block in blocks:
59        h3 = re.search(r'<h3[^>]*>(.*?)</h3>', block, re.DOTALL)
60        link = re.search(r'<a[^>]+href="([^"]+)"[^>]*>', block, re.DOTALL)
61        if not (h3 and link): continue
62        title = sanitize_text(re.sub(r'<[^>]+>', '', h3.group(1)))
63        url = extract_url(link.group(1))
64        if not url or not title or url in seen_urls: continue
65        seen_urls.add(url)
66
67        # Snippet
68        snippet = ''
69        snip = re.search(r'<div[^>]*class="[^"]*(?:VwiC3b|lEBKkf)[^"]*"[^>]*>(.*?)</div>', block, re.DOTALL)
70        if snip: snippet = sanitize_text(re.sub(r'<[^>]+>', '', snip.group(1)))
71        if not snippet:
72            snip2 = re.search(r'<span[^>]*class="[^"]*(?:aCOpRe|st)[^"]*"[^>]*>(.*?)</span>', block, re.DOTALL)
73            if snip2: snippet = sanitize_text(re.sub(r'<[^>]+>', '', snip2.group(1)))
74
75        results.append({'title': title, 'url': url, 'snippet': snippet})
76
77    # Pattern 2: Modern Google structure (no div.g)
78    if not results:
79        for match in re.finditer(
80            r'<a[^>]+href="(https?://[^"]+)"[^>]*><h3[^>]*>(.*?)</h3></a>',
81            html
82        ):
83            url = match.group(1)
84            title = sanitize_text(re.sub(r'<[^>]+>', '', match.group(2)))
85            if url and title and url not in seen_urls:
86                seen_urls.add(url)
87                results.append({'title': title, 'url': url, 'snippet': ''})
88
89    return results[:30]
90
91
92async def scrape_google(query: str, num_results: int = 20) -> List[dict[str, Any]]:
93    results = []
94    offset = 0
95    max_pages = 3
96
97    while len(results) < num_results and offset < max_pages:
98        start = offset * 10
99        search_url = f"https://www.google.com/search?q={quote_plus(query)}&num=10&start={start}&hl=en"
100        headers = {
101            'User-Agent': random.choice(USER_AGENTS),
102            'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
103            'Accept-Language': 'en-US,en;q=0.9',
104            'Referer': 'https://www.google.com/',
105            'Cache-Control': 'no-cache',
106        }
107        retries = 2
108        page_ok = False
109        while retries > 0 and not page_ok:
110            try:
111                if HAS_CURL:
112                    resp = requests.get(search_url, headers=headers, impersonate='chrome131', timeout=20)
113                else:
114                    resp = requests.get(search_url, headers=headers, timeout=20)
115                if resp.status_code == 200:
116                    page = parse_results_soup(resp.text)
117                    if page:
118                        for r in page:
119                            if r['url'] not in {x['url'] for x in results}:
120                                results.append(r)
121                        page_ok = True
122                        Actor.log.info(f"Page {offset+1}: {len(page)} results")
123                    else:
124                        Actor.log.warning(f"Page {offset+1}: no results parsed")
125                        retries -= 1
126                        await asyncio.sleep(3)
127                elif resp.status_code == 429:
128                    Actor.log.warning(f"429 rate limited, retrying...")
129                    await asyncio.sleep(5 * (3 - retries + 1))
130                    retries -= 1
131                else:
132                    Actor.log.warning(f"HTTP {resp.status_code}")
133                    retries -= 1
134                    await asyncio.sleep(2)
135            except Exception as e:
136                Actor.log.error(f"Request failed: {e}")
137                retries -= 1
138                await asyncio.sleep(3)
139        if not page_ok:
140            break
141        offset += 1
142        await asyncio.sleep(2)
143
144    return results[:num_results]
145
146
147async def main() -> None:
148    async with Actor:
149        inp = await Actor.get_input() or {}
150        Actor.log.info(f"Input: {json.dumps(inp)}")
151        payload = inp.get('input', inp) if isinstance(inp, dict) else inp
152        query = (payload if isinstance(payload, str) else payload.get('query', inp.get('query', ''))) or ''
153        num = int(payload.get('num_results', payload.get('maxResults', payload.get('limit', 20))))
154
155        if not query:
156            Actor.log.error('No query provided')
157            await Actor.push_data({
158                'error': True, 'message': 'query field is required',
159                'received_input': str(inp)[:200], 'source': 'google',
160                'scrapedAt': datetime.now(timezone.utc).isoformat()
161            })
162            return
163
164        Actor.log.info(f'Query: "{query}" max={num}')
165        results = await scrape_google(query, num)
166
167        for i, r in enumerate(results, 1):
168            await Actor.push_data({
169                'position': i, 'title': r['title'], 'url': r['url'],
170                'snippet': r['snippet'], 'query': query, 'source': 'google',
171                'scrapedAt': datetime.now(timezone.utc).isoformat()
172            })
173
174        Actor.log.info(f'Returned {len(results)} results')
175
176
177if __name__ == '__main__':
178    asyncio.run(main())

requirements.txt

1apify>=2.0.0
2requests>=2.31.0
3beautifulsoup4>=4.12.0
4curl-cffi>=0.7.0
5lxml>=5.0.0

.actor/actor.json

{
  "$schema": "https://apify.com/schemas/v1/actor.schema.json",
  "actorSpecification": 1,
  "name": "google-search-results-scraper",
  "title": "Google Search Results Scraper",
  "description": "Google Search Results Scraper returns normalized, review-ready API data with robust inputs, consistent outputs, and pay-per-result pricing.",
  "version": "1.0",
  "buildTag": "latest",
  "dockerfile": "../Dockerfile",
  "input": "./input_schema.json",
  "readme": "../README.md",
  "categories": [
    "SEARCH"
  ],
  "pricingInfos": [
    {
      "pricingModel": "PAY_PER_EVENT",
      "apifyMarginPercentage": 0.2,
      "pricingPerEvent": {
        "actorChargeEvents": {
          "apify-actor-start": {
            "eventTitle": "Actor Start",
            "eventDescription": "Charged once when the actor starts after input validation succeeds.",
            "eventPriceUsd": 5e-05,
            "isOneTimeEvent": true
          },
          "apify-default-dataset-item": {
            "eventTitle": "Result",
            "eventDescription": "Charged for each normalized dataset item successfully pushed by the actor.",
            "eventPriceUsd": 0.002,
            "isPrimaryEvent": true
          }
        }
      },
      "reasonForChange": "Pre-live PAY_PER_EVENT pricing for Google Search Results Scraper: $2.00/1K results."
    }
  ]
}

.actor/input_schema.json

{
  "title": "Google Search Results Scraper Input",
  "type": "object",
  "schemaVersion": 1,
  "properties": {
    "query": {
      "title": "Query",
      "type": "string",
      "description": "Primary keyword, URL, profile, company, product, or identifier to collect.",
      "editor": "textfield"
    },
    "queries": {
      "title": "Queries",
      "type": "array",
      "description": "Optional batch list of query strings. Used when query is empty or when batching is desired.",
      "editor": "stringList",
      "items": {
        "type": "string"
      }
    },
    "urls": {
      "title": "Direct URLs",
      "type": "array",
      "description": "Optional direct URLs to process. These take priority over discovery when provided.",
      "editor": "stringList",
      "items": {
        "type": "string"
      }
    },
    "maxResults": {
      "title": "Maximum results",
      "type": "integer",
      "description": "Maximum number of dataset items to emit.",
      "default": 25,
      "minimum": 1,
      "maximum": 1000
    },
    "maxCostPerRun": {
      "title": "Maximum cost per run",
      "type": "number",
      "description": "Optional guardrail in USD. The actor caps output before exceeding this amount.",
      "default": 5,
      "minimum": 0.01
    },
    "includeRaw": {
      "title": "Include raw metadata",
      "type": "boolean",
      "description": "Include collection diagnostics and raw source metadata where available.",
      "default": false
    },
    "proxyConfiguration": {
      "title": "Proxy configuration",
      "type": "object",
      "description": "Apify proxy settings for production runs.",
      "editor": "proxy",
      "default": {
        "useApifyProxy": true
      }
    }
  },
  "required": []
}

.actor/output_schema.json

{
  "actorOutputSchemaVersion": 2,
  "title": "Google Search Results Scraper Output Schema",
  "description": "Template output schema for google-search-results-scraper based on SEO, Web & Market Intelligence category.",
  "schema": {
    "type": "object",
    "required": [
      "query",
      "position",
      "title",
      "url",
      "source",
      "scrapedAt"
    ],
    "properties": {
      "query": {
        "type": [
          "string",
          "null"
        ],
        "description": "The search query used to retrieve these results."
      },
      "position": {
        "type": [
          "number",
          "null"
        ],
        "description": "Position in the search results (1-indexed)."
      },
      "title": {
        "type": [
          "string",
          "null"
        ],
        "description": "Title or headline of the item."
      },
      "url": {
        "type": [
          "string",
          "null"
        ],
        "description": "Full URL of the item or profile."
      },
      "snippet": {
        "type": [
          "string",
          "null"
        ],
        "description": "Text snippet or excerpt from the result."
      },
      "source": {
        "type": [
          "string",
          "null"
        ],
        "description": "Platform or source the data was collected from."
      },
      "content": {
        "type": [
          "string",
          "null"
        ],
        "description": "Full page content or article body."
      },
      "metaTitle": {
        "type": [
          "string",
          "null"
        ],
        "description": "HTML meta title of the page."
      },
      "metaDescription": {
        "type": [
          "string",
          "null"
        ],
        "description": "HTML meta description of the page."
      },
      "wordCount": {
        "type": [
          "number",
          "null"
        ],
        "description": "Word count of the content."
      },
      "readingTime": {
        "type": [
          "number",
          "null"
        ],
        "description": "Estimated reading time in seconds."
      },
      "markdown": {
        "type": [
          "string",
          "null"
        ],
        "description": "Page content converted to clean markdown format."
      },
      "backlinks": {
        "type": [
          "string",
          "null"
        ],
        "description": "Number of backlinks pointing to the domain."
      },
      "competition": {
        "type": [
          "string",
          "null"
        ],
        "description": "Estimated competition level for the search term."
      },
      "searchVolume": {
        "type": [
          "number",
          "null"
        ],
        "description": "Estimated monthly search volume for the keyword."
      },
      "scrapedAt": {
        "type": [
          "string",
          "null"
        ],
        "description": "ISO-8601 timestamp when the data was collected."
      },
      "runId": {
        "type": [
          "string",
          "null"
        ],
        "description": "Internal Apify run identifier for audit and traceability."
      }
    }
  }
}

actor_package/init.py

actor_package/main.py

1from pathlib import Path
2import runpy
3
4runpy.run_path(str(Path(__file__).resolve().parents[1] / 'main.py'), run_name='__main__')

actor_package/main.py

1from pathlib import Path
2import runpy
3
4runpy.run_path(str(Path(__file__).resolve().parents[1] / 'main.py'), run_name='__main__')

Google Search Results Scraper

apify/google-search-scraper

Scrape Google Search Engine Results Pages (SERPs). Select the country or language and extract organic and paid results, AI Mode, AI overviews, ads, queries, People Also Ask, prices, reviews, like a Google SERP API. Export data, run the scraper via API, schedule runs, or integrate with other tools.

Apify

145K

4.5

(152)

Fast Google Search Results Scraper

6sigmag/fast-google-search-results-scraper

Paste keywords in bulk → get clean, clickable URLs. This ultra-lightweight Google SERP scraper is built for non-technical teams who need links fast for lead prospecting and market research. No giant payloads, no complex setup

David

283

5.0

(4)

Google Search Results Scraper (Pay Per Result)

vtrdev/google-search-results-serp-scraper

Google SERP scraper with dual parsing, smart title recovery, and proxy support. Scrape multiple pages with localized results. Ideal for SEO tracking, content research, and brand monitoring — billed only per result.

VTRDEV

101

Google Search Results Scraper (SERP)

apidojo/google-search-scraper

SERP - Google Search Scraper with unbeatable pricing! $0.002/query gets you 10 results FREE + $0.0002/extra item. Event-based billing = pay only for what you need. Ideal for SEO monitoring, keyword research & market analysis. No proxy required!

API Dojo

606

3.2

(10)

Google Search Result Scraper

getdataforu/google-search-result-scraper

Google Search Result Scraper extracts organic and paid results, ads, Related, People Also Ask, News, Videos and Images Answers. Supports all available markets and languages. Download data as HTML table, JSON, CSV, Excel, XML or RSS

EMT Crawler

5.0

(2)

Google Search Results Scraper

crawlerbros/google-search-results-scraper

Scrape Google Search result pages (SERPs) and extract structured data: organic results, paid ads, related queries, and People Also Ask. Supports country/language targeting, time filters, pagination, and CSV-friendly output.

Crawler Bros

Google Search Results Scraper

scraperforge/google-search-results-scraper

🔎 Google Search Results Scraper extracts live SERP data—titles, snippets, URLs, rankings, ads & featured snippets—by location, language and device. ⚙️ Built for SEO, keyword research, competitor analysis & content planning. 🚀 Fast, scalable, proxy-ready.

ScraperForge

Google Search Results (SERP) Scraper

scrapebase/google-search-results-serp-scraper

🔎 Google Search Results (SERP) Scraper extracts organic results, titles, URLs, snippets, ads, PAA, featured snippets, local pack & more by keyword, location, language & device. ⚙️ Ideal for SEO monitoring, rank tracking & competitor intel. 🚀 Fast, scalable, anti-blocking.

ScrapeBase

Google Search Results Scraper

futurizerush/google-search-results-scraper

Scrape Google search results with optional deep crawl for SEO research, competitor analysis, and content preparation.

Rush

115

5.0

(1)

Google Search Results Scraper

scrapepilotapi/google-search-results-scraper

🔎 Google Search Results Scraper captures SERP data at scale—titles, URLs, snippets, featured snippets, PAA, ads & more. 📊 Perfect for SEO, PPC, market intel, content research & competitor monitoring. ⚡ Fast, reliable, proxy-ready with geo & device targeting; JSON/CSV output.