Google Patents Scraper avatar

Google Patents Scraper

Pricing

from $5.99 / 1,000 results

Go to Apify Store
Google Patents Scraper

Google Patents Scraper

๐Ÿ”Ž Google Patents Scraper extracts structured patent data from Google Patents โ€” titles, abstracts, inventors, assignees, CPC/IPC, citations, claims, dates & PDFs. โšก Fast, reliable, and bulk-ready for IP research, competitive intel & R&D landscaping. ๐Ÿ“Š CSV/JSON/API.

Pricing

from $5.99 / 1,000 results

Rating

0.0

(0)

Developer

Scrapier

Scrapier

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

10 days ago

Last modified

Share

Google Patents Scraper

The Google Patents Scraper is a fast, reliable Google Patents data scraper that extracts structured patent records at scale โ€” titles, abstracts, inventors, assignees, dates, links, PDFs, and optional deep fields like claims, citations, CPC/IPC, and patent family. It solves the repetitive, error-prone work of copy-paste by automating Google Patents data extraction via Googleโ€™s xhr/query and xhr/result endpoints, making it ideal for marketers, developers, data analysts, and researchers who need a production-ready Google Patents crawler for bulk runs. With CSV/JSON output and Apify API access, this Google Patents scraping tool enables end-to-end workflows โ€” from quick lookups to large datasets for competitive intel, IP monitoring, and R&D landscaping.

What data / output can you get?

Data fieldDescriptionExample value
patentNumberPublication number parsed from search results"US12438891B1"
titleCleaned patent title text"Systems and methods for machine learningโ€“based inference"
abstractCleaned snippet/abstract from results"A system includes a model trained to..."
inventorsList of inventor names["Jane Doe", "John Smith"]
assigneeAssignee/owner from results"Example Corp."
filingDateFiling date (YYYY-MM-DD)"2021-06-01"
publicationDatePublication date (YYYY-MM-DD)"2023-09-14"
grantDateGrant date when available (YYYY-MM-DD)"2024-05-10"
urlCanonical Google Patents URL"https://patents.google.com/patent/US12438891B1"
pdfUrlDirect link to the patent PDF on Googleโ€™s storage"https://patentimages.storage.googleapis.com/US12438891B1.pdf"
scrapedAtISO-8601 timestamp of extraction (UTC, Z)"2026-04-27T05:27:42Z"
classifications.cpcCPC codes (when classifications enrichment is enabled)["G06N20/00"]
classifications.ipcIPC codes (derived when present; may be empty)["G06N 20/00"]
citations.citedByForward citations (when citations enrichment is enabled)["US2020123456A1", "EP3456789B1"]
citations.referencesBackward citations (when citations enrichment is enabled)["US9876543B2"]

Bonus/optional fields when enrichment is toggled on:

  • fullText.description, fullText.claims โ€” adds long-form description and claims text.
  • patentFamily โ€” list of related publications in the same family.

Exports are available via the Apify Dataset in CSV or JSON, and accessible through the Apify API for automations and pipelines.

Key features

  • โšก๏ธ Smart proxy ladder & resilience
    Automatically tries direct requests first, then falls back to Apify datacenter proxies and finally residential proxies (up to 3 attempts). Successful responses โ€œstickโ€ to the working proxy tier for subsequent requests.

  • ๐Ÿง  OR-merged query builder
    Combines keywords, publication numbers, and q= terms from search URLs into a single OR-union so you can scrape Google Patents broadly without missing results when mixing inputs.

  • ๐Ÿ“ฆ Enrichment on demand
    Toggle includeFullText, includeClaims, includeCitations, includePatentFamily, and includeClassifications to tailor each run. Get exactly the depth you need for IP research and Google Patents text mining.

  • ๐Ÿ“ˆ Bulk-ready pagination
    Handles search pagination while respecting your maxResults cap โ€” perfect for Google Patents bulk download workflows and building large datasets.

  • ๐Ÿ’พ CSV/JSON and API-friendly
    Export data to CSV or JSON and access programmatically via the Apify API โ€” ideal for developers and teams building ingestion pipelines or a Google Patents API alternative.

  • ๐Ÿ“Ž PDF links included
    Each record includes pdfUrl, enabling Google Patents PDF downloader workflows in your own system.

  • ๐Ÿ’ป Python-powered for reliability
    Built on Python (aiohttp + Apify SDK), with concurrent detail fetching for efficient Google Patents data extraction.

How to use Google Patents Scraper - step by step

  1. Sign in to your Apify account.
  2. Open the โ€œGoogle Patents Scraperโ€ actor.
  3. Add input data:
    • Use urls for patent page links, search links with q=, or plain keyword lines (plain lines are treated as search phrases).
    • You can also set searchQuery and/or patentNumbers โ€” all inputs are OR-merged so you get a broad combined result.
  4. Narrow results (optional): set assignee, inventor, country, dateFrom/dateTo (absolute or relative like โ€œ30 daysโ€), and patentType to filter your query.
  5. Set limits and enrichment: choose maxResults (use 0 for no cap), and toggle includeFullText, includeClaims, includeCitations, includePatentFamily, includeClassifications.
  6. Configure proxyConfiguration only if your workspace requires Apify Proxy for large or repeated runs.
  7. Run the actor. It paginates results and logs progress. If enrichment is enabled, details load in parallel and rows appear in the dataset as soon as theyโ€™re ready.
  8. Download your dataset as CSV or JSON or connect via the Apify API for downstream processing and Google Patents CSV export workflows.

Pro Tip: Automate end-to-end pipelines by pulling results via the Apify API into your Python scripts for further analysis, modeling, or Google Patents text mining.

Use cases

Use caseDescription
Competitive intelligence for R&DAnalyze assignees, claims, and citations to map competitor focus areas and technology trajectories.
Patent landscaping for IP teamsBuild comprehensive datasets by country, date range, and document kind to identify white spaces and clusters.
Academic research & text miningEnable corpus creation with abstracts, optional descriptions, and claims for NLP pipelines.
Rapid patent metadata extractionExtract titles, inventors, dates, and links at scale for dashboards and reporting.
Google Patents search results scraperTurn keyword searches into structured datasets without manual export.
Dataset creation & bulk downloadUse maxResults and CSV/JSON export for Google Patents dataset download workflows.
PDF library buildingLeverage pdfUrl to fetch and archive PDFs in your own storage, tied to each patent record.

Why choose Google Patents Scraper?

The Google Patents Scraper is built for precision, scale, and real-world reliability.

  • ๐ŸŽฏ Structured accuracy: Normalizes Googleโ€™s xhr/query results into clean fields for analytics and modeling.
  • ๐Ÿ”€ Smarter queries: OR-merge keywords, search URLs, and publication numbers to avoid missed matches.
  • ๐Ÿš€ Scalable runs: Handles pagination and parallel detail loading for large batches.
  • ๐Ÿ› ๏ธ Developer access: Export CSV/JSON and integrate via the Apify API โ€” great for Google Patents scraper Python workflows.
  • ๐Ÿ›ก๏ธ Robust connectivity: Direct-first requests with automatic fallback to datacenter and residential proxies.
  • โœ… Public-data focus: Designed to extract publicly available records from patents.google.com responsibly.
  • ๐Ÿ’ธ Operational efficiency: Reduce manual effort and streamline Google Patents data extraction across teams.

In short: a production-grade Google Patents crawler that outperforms brittle, manual, or extension-based alternatives.

Yes โ€” when used responsibly. This actor extracts public data from patents.google.com and does not access private or authenticated content.

Guidelines for compliant use:

  • Use only publicly available information and respect Googleโ€™s terms of service.
  • Ensure your usage aligns with applicable data protection laws (e.g., GDPR, CCPA).
  • Avoid collecting or processing personal data beyond what is publicly provided in patent records.
  • Validate your specific use case with your legal team, especially for redistribution or commercial reuse.

Input parameters & output format

Example JSON input

{
"urls": [
"https://patents.google.com/patent/US12438891B1",
"machine learning",
"https://patents.google.com/?q=graph+neural+network"
],
"searchQuery": "computer vision",
"patentNumbers": ["EP1234567B1", "WO2020123456A1"],
"assignee": "Example Corp",
"inventor": "Jane Doe",
"country": "US",
"dateFrom": "6 months",
"dateTo": "",
"patentType": "ANY",
"maxResults": 25,
"includeFullText": false,
"includeClaims": true,
"includeCitations": true,
"includePatentFamily": true,
"includeClassifications": true,
"proxyConfiguration": {
"useApifyProxy": true
}
}

Parameters

  • urls (array)
    Description: One entry per line: patent URLs, search URLs (q=), or plain keywords. Plain lines are search phrases. All entries OR-merge with the Keywords field.
    Default: n/a (prefill shown in UI). Required: No

  • searchQuery (string)
    Description: Main keyword search. OR-merged with any q= from search URLs and with publication numbers from patent URLs.
    Default: ""
    Required: No

  • patentNumbers (array)
    Description: Specific patent IDs. Combined with keywords using OR.
    Default: []
    Required: No

  • assignee (string)
    Description: Focus on patents owned by a particular organization.
    Default: ""
    Required: No

  • inventor (string)
    Description: Find patents listing a specific inventor.
    Default: ""
    Required: No

  • country (string)
    Description: Patent office / region filter (e.g., US, EP, WO) or ANY to search everywhere.
    Default: "ANY"
    Required: No

  • dateFrom (string)
    Description: Published after โ€” absolute (YYYY-MM-DD) or relative (e.g., 30 days, 6 months).
    Default: ""
    Required: No

  • dateTo (string)
    Description: Published before โ€” absolute or relative.
    Default: ""
    Required: No

  • patentType (string)
    Description: Limit to grants, applications, or designs โ€” or ANY for all types.
    Default: "ANY"
    Required: No

  • maxResults (integer)
    Description: Cap how many patents to collect; use 0 for no limit.
    Default: 10
    Required: No

  • includeFullText (boolean)
    Description: Adds the full written description (large text).
    Default: false
    Required: No

  • includeClaims (boolean)
    Description: Adds the patent claims text.
    Default: true
    Required: No

  • includeCitations (boolean)
    Description: Adds backward and forward citation lists where available.
    Default: true
    Required: No

  • includePatentFamily (boolean)
    Description: Adds related publications in the same family.
    Default: true
    Required: No

  • includeClassifications (boolean)
    Description: Adds CPC/IPC classification codes.
    Default: true
    Required: No

  • proxyConfiguration (object)
    Description: Optional Apify Proxy settings. Leave default off if you donโ€™t need it.
    Default: {} (UI may prefill useApifyProxy)
    Required: No

Note: At least one of urls, searchQuery, or patentNumbers must yield a query; otherwise the run exits with a helpful log message.

Example JSON output

{
"patentNumber": "US12438891B1",
"title": "Systems and methods for machine learningโ€“based inference",
"abstract": "A system includes a model trained to...",
"inventors": ["Jane Doe", "John Smith"],
"assignee": "Example Corp.",
"filingDate": "2021-06-01",
"publicationDate": "2023-09-14",
"grantDate": "2024-05-10",
"classifications": {
"cpc": ["G06N20/00", "G06F17/18"],
"ipc": ["G06N 20/00", "G06F 17/18"]
},
"url": "https://patents.google.com/patent/US12438891B1",
"pdfUrl": "https://patentimages.storage.googleapis.com/US12438891B1.pdf",
"scrapedAt": "2026-04-27T05:27:42Z",
"citations": {
"citedBy": ["EP3456789B1"],
"references": ["US9876543B2", "WO2020123456A1"]
},
"fullText": {
"claims": "1. A method comprising ...",
"description": "In some implementations, the system comprises ..."
},
"patentFamily": ["EP1234567B1", "WO2020123456A1"]
}

Notes:

  • Optional fields (fullText, patentFamily) appear only when the corresponding include* flags are enabled.
  • Some fields may be empty strings or empty arrays when not available from the source or when detail pages canโ€™t be retrieved (the item is still saved with available data).

FAQ

Do I need to log in or provide cookies to scrape Google Patents?

No. The actor works with publicly available endpoints on patents.google.com and does not require login or cookies. It uses direct requests first and only falls back to proxies if needed.

Can I use this with Python or an API?

Yes. Results are stored in an Apify Dataset, which you can access via the Apify API. This makes it easy to integrate into Google Patents scraper Python workflows or downstream automation.

How many patents can I scrape in one run?

You control this with maxResults. Set a specific number to cap output or use 0 for no limit. The actor paginates results automatically and enriches details in parallel when requested.

Can I export to CSV or JSON?

Yes. You can export your dataset as CSV or JSON directly from Apify. This supports Google Patents CSV export and broader Google Patents dataset download use cases.

Does it include patent PDFs?

Each record includes a pdfUrl pointing to the Google patent images storage. You can use this link to download PDFs externally as part of a Google Patents PDF downloader workflow.

Can it extract claims, citations, classifications, and family?

Yes. Toggle includeClaims, includeCitations, includeClassifications, and includePatentFamily to add these fields to your output. You can also enable includeFullText to add the long-form description.

What filters are available?

You can filter by assignee, inventor, country (office/region), dateFrom/dateTo (absolute or relative), and patentType (grant, application, design, or ANY). Inputs are OR-merged with keywords and publication numbers for broad coverage.

How does it avoid getting blocked?

The actor tries direct requests first. On block or failure, it automatically climbs a proxy ladder: datacenter proxies (e.g., SHADER) and then residential proxies with up to 3 retries. After a successful proxy response, it โ€œsticksโ€ to the working tier.

Closing thoughts

The Google Patents Scraper is built for structured, scalable Google Patents data extraction โ€” from quick keyword pulls to large, enriched datasets. With smart query merging, optional deep fields, CSV/JSON exports, and Apify API access, it serves marketers, developers, data analysts, and researchers alike. Build pipelines, power dashboards, or kick off Google Patents text mining with a dependable Google Patents search results scraper. Start extracting smarter patent insights โ€” at scale and on your terms.