Google Patents Scraper
Pricing
from $5.99 / 1,000 results
Google Patents Scraper
๐ Google Patents Scraper extracts structured patent data from Google Patents โ titles, abstracts, inventors, assignees, CPC/IPC, citations, claims, dates & PDFs. โก Fast, reliable, and bulk-ready for IP research, competitive intel & R&D landscaping. ๐ CSV/JSON/API.
Pricing
from $5.99 / 1,000 results
Rating
0.0
(0)
Developer
Scrapier
Maintained by CommunityActor stats
0
Bookmarked
2
Total users
1
Monthly active users
10 days ago
Last modified
Categories
Share
Google Patents Scraper
The Google Patents Scraper is a fast, reliable Google Patents data scraper that extracts structured patent records at scale โ titles, abstracts, inventors, assignees, dates, links, PDFs, and optional deep fields like claims, citations, CPC/IPC, and patent family. It solves the repetitive, error-prone work of copy-paste by automating Google Patents data extraction via Googleโs xhr/query and xhr/result endpoints, making it ideal for marketers, developers, data analysts, and researchers who need a production-ready Google Patents crawler for bulk runs. With CSV/JSON output and Apify API access, this Google Patents scraping tool enables end-to-end workflows โ from quick lookups to large datasets for competitive intel, IP monitoring, and R&D landscaping.
What data / output can you get?
| Data field | Description | Example value |
|---|---|---|
| patentNumber | Publication number parsed from search results | "US12438891B1" |
| title | Cleaned patent title text | "Systems and methods for machine learningโbased inference" |
| abstract | Cleaned snippet/abstract from results | "A system includes a model trained to..." |
| inventors | List of inventor names | ["Jane Doe", "John Smith"] |
| assignee | Assignee/owner from results | "Example Corp." |
| filingDate | Filing date (YYYY-MM-DD) | "2021-06-01" |
| publicationDate | Publication date (YYYY-MM-DD) | "2023-09-14" |
| grantDate | Grant date when available (YYYY-MM-DD) | "2024-05-10" |
| url | Canonical Google Patents URL | "https://patents.google.com/patent/US12438891B1" |
| pdfUrl | Direct link to the patent PDF on Googleโs storage | "https://patentimages.storage.googleapis.com/US12438891B1.pdf" |
| scrapedAt | ISO-8601 timestamp of extraction (UTC, Z) | "2026-04-27T05:27:42Z" |
| classifications.cpc | CPC codes (when classifications enrichment is enabled) | ["G06N20/00"] |
| classifications.ipc | IPC codes (derived when present; may be empty) | ["G06N 20/00"] |
| citations.citedBy | Forward citations (when citations enrichment is enabled) | ["US2020123456A1", "EP3456789B1"] |
| citations.references | Backward citations (when citations enrichment is enabled) | ["US9876543B2"] |
Bonus/optional fields when enrichment is toggled on:
- fullText.description, fullText.claims โ adds long-form description and claims text.
- patentFamily โ list of related publications in the same family.
Exports are available via the Apify Dataset in CSV or JSON, and accessible through the Apify API for automations and pipelines.
Key features
-
โก๏ธ Smart proxy ladder & resilience
Automatically tries direct requests first, then falls back to Apify datacenter proxies and finally residential proxies (up to 3 attempts). Successful responses โstickโ to the working proxy tier for subsequent requests. -
๐ง OR-merged query builder
Combines keywords, publication numbers, and q= terms from search URLs into a single OR-union so you can scrape Google Patents broadly without missing results when mixing inputs. -
๐ฆ Enrichment on demand
Toggle includeFullText, includeClaims, includeCitations, includePatentFamily, and includeClassifications to tailor each run. Get exactly the depth you need for IP research and Google Patents text mining. -
๐ Bulk-ready pagination
Handles search pagination while respecting your maxResults cap โ perfect for Google Patents bulk download workflows and building large datasets. -
๐พ CSV/JSON and API-friendly
Export data to CSV or JSON and access programmatically via the Apify API โ ideal for developers and teams building ingestion pipelines or a Google Patents API alternative. -
๐ PDF links included
Each record includes pdfUrl, enabling Google Patents PDF downloader workflows in your own system. -
๐ป Python-powered for reliability
Built on Python (aiohttp + Apify SDK), with concurrent detail fetching for efficient Google Patents data extraction.
How to use Google Patents Scraper - step by step
- Sign in to your Apify account.
- Open the โGoogle Patents Scraperโ actor.
- Add input data:
- Use urls for patent page links, search links with q=, or plain keyword lines (plain lines are treated as search phrases).
- You can also set searchQuery and/or patentNumbers โ all inputs are OR-merged so you get a broad combined result.
- Narrow results (optional): set assignee, inventor, country, dateFrom/dateTo (absolute or relative like โ30 daysโ), and patentType to filter your query.
- Set limits and enrichment: choose maxResults (use 0 for no cap), and toggle includeFullText, includeClaims, includeCitations, includePatentFamily, includeClassifications.
- Configure proxyConfiguration only if your workspace requires Apify Proxy for large or repeated runs.
- Run the actor. It paginates results and logs progress. If enrichment is enabled, details load in parallel and rows appear in the dataset as soon as theyโre ready.
- Download your dataset as CSV or JSON or connect via the Apify API for downstream processing and Google Patents CSV export workflows.
Pro Tip: Automate end-to-end pipelines by pulling results via the Apify API into your Python scripts for further analysis, modeling, or Google Patents text mining.
Use cases
| Use case | Description |
|---|---|
| Competitive intelligence for R&D | Analyze assignees, claims, and citations to map competitor focus areas and technology trajectories. |
| Patent landscaping for IP teams | Build comprehensive datasets by country, date range, and document kind to identify white spaces and clusters. |
| Academic research & text mining | Enable corpus creation with abstracts, optional descriptions, and claims for NLP pipelines. |
| Rapid patent metadata extraction | Extract titles, inventors, dates, and links at scale for dashboards and reporting. |
| Google Patents search results scraper | Turn keyword searches into structured datasets without manual export. |
| Dataset creation & bulk download | Use maxResults and CSV/JSON export for Google Patents dataset download workflows. |
| PDF library building | Leverage pdfUrl to fetch and archive PDFs in your own storage, tied to each patent record. |
Why choose Google Patents Scraper?
The Google Patents Scraper is built for precision, scale, and real-world reliability.
- ๐ฏ Structured accuracy: Normalizes Googleโs xhr/query results into clean fields for analytics and modeling.
- ๐ Smarter queries: OR-merge keywords, search URLs, and publication numbers to avoid missed matches.
- ๐ Scalable runs: Handles pagination and parallel detail loading for large batches.
- ๐ ๏ธ Developer access: Export CSV/JSON and integrate via the Apify API โ great for Google Patents scraper Python workflows.
- ๐ก๏ธ Robust connectivity: Direct-first requests with automatic fallback to datacenter and residential proxies.
- โ Public-data focus: Designed to extract publicly available records from patents.google.com responsibly.
- ๐ธ Operational efficiency: Reduce manual effort and streamline Google Patents data extraction across teams.
In short: a production-grade Google Patents crawler that outperforms brittle, manual, or extension-based alternatives.
Is it legal / ethical to use Google Patents Scraper?
Yes โ when used responsibly. This actor extracts public data from patents.google.com and does not access private or authenticated content.
Guidelines for compliant use:
- Use only publicly available information and respect Googleโs terms of service.
- Ensure your usage aligns with applicable data protection laws (e.g., GDPR, CCPA).
- Avoid collecting or processing personal data beyond what is publicly provided in patent records.
- Validate your specific use case with your legal team, especially for redistribution or commercial reuse.
Input parameters & output format
Example JSON input
{"urls": ["https://patents.google.com/patent/US12438891B1","machine learning","https://patents.google.com/?q=graph+neural+network"],"searchQuery": "computer vision","patentNumbers": ["EP1234567B1", "WO2020123456A1"],"assignee": "Example Corp","inventor": "Jane Doe","country": "US","dateFrom": "6 months","dateTo": "","patentType": "ANY","maxResults": 25,"includeFullText": false,"includeClaims": true,"includeCitations": true,"includePatentFamily": true,"includeClassifications": true,"proxyConfiguration": {"useApifyProxy": true}}
Parameters
-
urls (array)
Description: One entry per line: patent URLs, search URLs (q=), or plain keywords. Plain lines are search phrases. All entries OR-merge with the Keywords field.
Default: n/a (prefill shown in UI). Required: No -
searchQuery (string)
Description: Main keyword search. OR-merged with any q= from search URLs and with publication numbers from patent URLs.
Default: ""
Required: No -
patentNumbers (array)
Description: Specific patent IDs. Combined with keywords using OR.
Default: []
Required: No -
assignee (string)
Description: Focus on patents owned by a particular organization.
Default: ""
Required: No -
inventor (string)
Description: Find patents listing a specific inventor.
Default: ""
Required: No -
country (string)
Description: Patent office / region filter (e.g., US, EP, WO) or ANY to search everywhere.
Default: "ANY"
Required: No -
dateFrom (string)
Description: Published after โ absolute (YYYY-MM-DD) or relative (e.g., 30 days, 6 months).
Default: ""
Required: No -
dateTo (string)
Description: Published before โ absolute or relative.
Default: ""
Required: No -
patentType (string)
Description: Limit to grants, applications, or designs โ or ANY for all types.
Default: "ANY"
Required: No -
maxResults (integer)
Description: Cap how many patents to collect; use 0 for no limit.
Default: 10
Required: No -
includeFullText (boolean)
Description: Adds the full written description (large text).
Default: false
Required: No -
includeClaims (boolean)
Description: Adds the patent claims text.
Default: true
Required: No -
includeCitations (boolean)
Description: Adds backward and forward citation lists where available.
Default: true
Required: No -
includePatentFamily (boolean)
Description: Adds related publications in the same family.
Default: true
Required: No -
includeClassifications (boolean)
Description: Adds CPC/IPC classification codes.
Default: true
Required: No -
proxyConfiguration (object)
Description: Optional Apify Proxy settings. Leave default off if you donโt need it.
Default: {} (UI may prefill useApifyProxy)
Required: No
Note: At least one of urls, searchQuery, or patentNumbers must yield a query; otherwise the run exits with a helpful log message.
Example JSON output
{"patentNumber": "US12438891B1","title": "Systems and methods for machine learningโbased inference","abstract": "A system includes a model trained to...","inventors": ["Jane Doe", "John Smith"],"assignee": "Example Corp.","filingDate": "2021-06-01","publicationDate": "2023-09-14","grantDate": "2024-05-10","classifications": {"cpc": ["G06N20/00", "G06F17/18"],"ipc": ["G06N 20/00", "G06F 17/18"]},"url": "https://patents.google.com/patent/US12438891B1","pdfUrl": "https://patentimages.storage.googleapis.com/US12438891B1.pdf","scrapedAt": "2026-04-27T05:27:42Z","citations": {"citedBy": ["EP3456789B1"],"references": ["US9876543B2", "WO2020123456A1"]},"fullText": {"claims": "1. A method comprising ...","description": "In some implementations, the system comprises ..."},"patentFamily": ["EP1234567B1", "WO2020123456A1"]}
Notes:
- Optional fields (fullText, patentFamily) appear only when the corresponding include* flags are enabled.
- Some fields may be empty strings or empty arrays when not available from the source or when detail pages canโt be retrieved (the item is still saved with available data).
FAQ
Do I need to log in or provide cookies to scrape Google Patents?
No. The actor works with publicly available endpoints on patents.google.com and does not require login or cookies. It uses direct requests first and only falls back to proxies if needed.
Can I use this with Python or an API?
Yes. Results are stored in an Apify Dataset, which you can access via the Apify API. This makes it easy to integrate into Google Patents scraper Python workflows or downstream automation.
How many patents can I scrape in one run?
You control this with maxResults. Set a specific number to cap output or use 0 for no limit. The actor paginates results automatically and enriches details in parallel when requested.
Can I export to CSV or JSON?
Yes. You can export your dataset as CSV or JSON directly from Apify. This supports Google Patents CSV export and broader Google Patents dataset download use cases.
Does it include patent PDFs?
Each record includes a pdfUrl pointing to the Google patent images storage. You can use this link to download PDFs externally as part of a Google Patents PDF downloader workflow.
Can it extract claims, citations, classifications, and family?
Yes. Toggle includeClaims, includeCitations, includeClassifications, and includePatentFamily to add these fields to your output. You can also enable includeFullText to add the long-form description.
What filters are available?
You can filter by assignee, inventor, country (office/region), dateFrom/dateTo (absolute or relative), and patentType (grant, application, design, or ANY). Inputs are OR-merged with keywords and publication numbers for broad coverage.
How does it avoid getting blocked?
The actor tries direct requests first. On block or failure, it automatically climbs a proxy ladder: datacenter proxies (e.g., SHADER) and then residential proxies with up to 3 retries. After a successful proxy response, it โsticksโ to the working tier.
Closing thoughts
The Google Patents Scraper is built for structured, scalable Google Patents data extraction โ from quick keyword pulls to large, enriched datasets. With smart query merging, optional deep fields, CSV/JSON exports, and Apify API access, it serves marketers, developers, data analysts, and researchers alike. Build pipelines, power dashboards, or kick off Google Patents text mining with a dependable Google Patents search results scraper. Start extracting smarter patent insights โ at scale and on your terms.