Google Scholar Scraper avatar

Google Scholar Scraper

Pricing

Pay per usage

Go to Apify Store
Google Scholar Scraper

Google Scholar Scraper

Scrape Google Scholar for academic papers, citations, author profiles. No API key needed. Extract titles, authors, abstracts, citation counts, PDF links, h-index, i10-index. Export JSON, CSV, Excel. Anti-bot protection with residential proxies, UA rotation, CAPTCHA detection.

Pricing

Pay per usage

Rating

0.0

(0)

Developer

George Kioko

George Kioko

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

2 days ago

Last modified

Share

Google Scholar Scraper -- Scrape Academic Papers, Citations & Author Profiles

Apify Actor No API Key Pay Per Result

The most reliable Google Scholar scraper on Apify. Scrape Google Scholar for academic papers, citation data, author profiles, h-index, and research metadata -- all without an API key. Export Google Scholar data as JSON, CSV, or Excel with full abstracts, citation counts, PDF links, and author metrics.

Use this academic paper scraper to extract structured research data at scale. Whether you need a citation scraper for bibliometric analysis, an author profile scraper for researcher tracking, or a research paper extractor for building literature databases -- this actor handles it all.

How It Works

flowchart LR
A["Your Input\n(query or author ID)"] --> B["Google Scholar\nScraper"]
B --> C{"Scraping Mode"}
C -->|Paper Search| D["Search Results\nPagination"]
C -->|Author Profile| E["Profile + Papers\nExtraction"]
D --> F["Structured Data\n(JSON/CSV/Excel)"]
E --> F
D -->|"includeCitations: true"| G["Citation Details\nExtraction"]
G --> F

Architecture

flowchart TB
subgraph Input
Q["Search Query"]
AU["Author ID"]
FI["Filters\n(year, sort, language)"]
end
subgraph Anti-Bot Layer
PR["Residential Proxies\n(US-based)"]
UA["User Agent\nRotation"]
DL["Random Delays\n(2-5 seconds)"]
ST["Browser Fingerprint\nMasking"]
CP["CAPTCHA Detection\n& Auto-Retry"]
end
subgraph Google Scholar
SP["Search Pages\n(/scholar)"]
AP["Author Pages\n(/citations)"]
CT["Citation Pages\n(/scholar?cites=...)"]
end
subgraph Data Extraction
PP["Paper Parser\n(title, authors, year,\nabstract, citations, PDF)"]
AE["Author Parser\n(h-index, i10-index,\ncitations, co-authors)"]
CE["Citation Parser\n(citing papers,\nauthors, years)"]
end
subgraph Output
DS["Apify Dataset\n(JSON / CSV / Excel)"]
end
Q & AU & FI --> PR
PR --> UA --> DL --> ST --> CP
CP --> SP & AP
SP --> PP
AP --> AE
AE -->|"Paper list"| PP
PP -->|"includeCitations"| CT
CT --> CE
PP & AE & CE --> DS

Features

  • Scrape Google Scholar search results -- Extract papers by any keyword, topic, or phrase with full pagination support
  • Author profile scraper -- Get h-index, i10-index, total citations, research interests, co-authors, and complete publication lists from any Google Scholar author profile
  • Citation scraper -- Discover which papers cite a given paper, with full metadata for each citing work
  • Research paper extractor -- Pull titles, authors, years, abstracts, publication venues, and direct PDF links
  • No API key required -- Works without any Google Scholar API key, credentials, or authentication
  • Export Google Scholar data -- Download results as JSON, CSV, or Excel from the Apify dataset
  • Year filtering -- Narrow results by publication date range for targeted academic data extraction
  • Sort by date or relevance -- Find the latest papers or the most relevant ones
  • Multi-language support -- Scrape Google Scholar in any supported language
  • Anti-bot protection built in -- Residential proxies, user agent rotation, random delays, fingerprint masking, and automatic CAPTCHA retry
  • H-index scraper -- Extract h-index and i10-index metrics directly from author profiles
  • PDF link extraction -- Get direct links to open-access PDFs when available
  • Pay per result -- Only pay for the papers you actually scrape, no monthly subscription

Use Cases

  • Literature reviews -- Quickly gather all papers on a topic with citation counts and abstracts for systematic reviews
  • Citation analysis -- Track how many times papers are cited and by whom for bibliometric research
  • Research monitoring -- Monitor new publications in a field by filtering by year
  • Author analysis -- Get complete publication profiles and h-index data for researchers
  • AI/RAG pipelines -- Feed structured academic data into LLM knowledge bases and retrieval-augmented generation systems
  • Academic databases -- Build custom research databases with structured Google Scholar data
  • Competitive research -- Track publications from specific authors or institutions
  • Grant applications -- Quickly compile publication lists and citation metrics for funding proposals
  • Trend analysis -- Identify emerging research topics by tracking publication volume over time

Input

ParameterTypeDefaultDescription
querystring-Search query for Google Scholar (e.g., "machine learning healthcare")
authorIdstring-Google Scholar author ID for profile scraping
maxResultsinteger50Maximum number of papers to scrape
languagestring"en"Language code for results
yearFrominteger-Filter papers from this year
yearTointeger-Filter papers to this year
sortBystring"relevance"Sort by "relevance" or "date"
includeAbstractsbooleantrueInclude paper abstracts in output
includeCitationsbooleanfalseInclude citing paper details

You must provide either query or authorId (or both).

How to Find a Google Scholar Author ID

Go to a researcher's Google Scholar profile. The URL will look like:

https://scholar.google.com/citations?user=JicYPdAAAAAJ

The author ID is the value after user= (in this case JicYPdAAAAAJ).

Example Inputs

Search for papers on a topic

{
"query": "large language models healthcare",
"maxResults": 100,
"yearFrom": 2023,
"sortBy": "date",
"includeAbstracts": true
}

Scrape an author profile (h-index, citations, publications)

{
"authorId": "JicYPdAAAAAJ",
"maxResults": 200
}

Search with citation details

{
"query": "transformer architecture attention mechanism",
"maxResults": 20,
"includeCitations": true
}

Filter papers by year range

{
"query": "CRISPR gene editing",
"yearFrom": 2022,
"yearTo": 2025,
"maxResults": 50,
"sortBy": "date"
}

Output

Paper search result

Each paper scraped from Google Scholar includes:

{
"type": "paper",
"title": "Attention Is All You Need",
"url": "https://arxiv.org/abs/1706.03762",
"authors": "A Vaswani, N Shazeer, N Parmar, J Uszkoreit...",
"year": 2017,
"publicationVenue": "Advances in neural information processing systems",
"abstract": "The dominant sequence transduction models are based on complex recurrent or convolutional neural networks...",
"citationCount": 120543,
"citationUrl": "https://scholar.google.com/scholar?cites=...",
"pdfLink": "https://arxiv.org/pdf/1706.03762",
"pdfSource": "[PDF] arxiv.org",
"scholarId": "12345678",
"relatedUrl": "https://scholar.google.com/scholar?related=...",
"versionCount": 15,
"scrapedAt": "2026-03-12T10:00:00.000Z",
"searchQuery": "transformer architecture"
}

Author profile result

Author profile data includes h-index, i10-index, total citations, and co-authors:

{
"type": "author_profile",
"authorId": "JicYPdAAAAAJ",
"name": "Geoffrey Hinton",
"affiliation": "University of Toronto",
"interests": ["Machine Learning", "Neural Networks", "Deep Learning"],
"totalCitations": 834291,
"hIndex": 186,
"i10Index": 499,
"imageUrl": "https://scholar.google.com/citations?view_op=medium_photo&user=...",
"homepage": "https://www.cs.toronto.edu/~hinton/",
"coauthors": [
{
"name": "Yann LeCun",
"affiliation": "NYU / Meta",
"url": "https://scholar.google.com/citations?user=..."
}
],
"scrapedAt": "2026-03-12T10:00:00.000Z"
}

Citation details result

When includeCitations is enabled, you get citing papers for each result:

{
"type": "citation_details",
"parentTitle": "Attention Is All You Need",
"parentScholarId": "12345678",
"citingPapers": [
{
"title": "BERT: Pre-training of Deep Bidirectional Transformers",
"url": "https://arxiv.org/abs/1810.04805",
"authors": "J Devlin, MW Chang, K Lee, K Toutanova",
"year": 2018
}
],
"citingPapersCount": 10,
"scrapedAt": "2026-03-12T10:00:00.000Z"
}

Pricing

This actor uses pay-per-event pricing. You only pay for the papers you scrape -- no monthly fees, no API key costs.

EventPrice
paper-found$0.004 per paper

Cost examples

Papers scrapedCost
50 papers$0.20
200 papers$0.80
500 papers$2.00
1,000 papers$4.00

Author profile metadata (h-index, i10-index, citations, co-authors) is included free. You only pay for papers scraped.

Anti-Bot Protection

Google Scholar is known for aggressive anti-bot measures. This Google Scholar crawler handles it with:

  • Residential proxies -- Uses Apify's US-based residential proxy group for reliable access
  • Random delays -- 2-5 second random delays between requests to mimic human browsing
  • User agent rotation -- Rotates through 6 real browser user agents (Chrome, Firefox, Safari, Edge)
  • Browser fingerprint masking -- Hides navigator.webdriver and other automation signals
  • CAPTCHA detection -- Detects CAPTCHA/block pages and automatically retries with a different proxy session
  • Single concurrency -- Makes requests one at a time to avoid triggering Google rate limits

Tips for Best Results

  1. Start small -- Test with maxResults: 10 first to verify your query returns what you need
  2. Use year filters -- Narrow your search with yearFrom and yearTo for more targeted academic data extraction
  3. Sort by date -- Use sortBy: "date" to find the most recent papers in any field
  4. Citation details are expensive -- Enable includeCitations only when needed, as it makes additional requests per paper
  5. Author profiles -- Use authorId to get a complete publication list, h-index, and citation metrics for any researcher
  6. Combine modes -- Provide both query and authorId to search papers AND scrape an author profile in one run

Limitations

  • Google Scholar may temporarily block requests if too many are made. The actor handles this with automatic retries and proxy rotation.
  • Citation details (includeCitations: true) significantly increases run time and cost since it makes additional page requests per paper.
  • Google Scholar does not provide an official API, so page structure may change occasionally.
  • Maximum practical limit is approximately 1,000 results per search query (this is a Google Scholar limitation, not an actor limitation).

FAQ

Can I scrape Google Scholar without an API key?

Yes. This actor scrapes Google Scholar directly through a headless browser with residential proxies. No Google Scholar API key, no Semantic Scholar API, and no third-party credentials are required. Just provide your search query or author ID and run it.

How do I export Google Scholar data to CSV or Excel?

After the actor finishes, go to the dataset tab in the Apify Console. You can download results as JSON, CSV, Excel, XML, or RSS. You can also access the data programmatically via the Apify API.

Can I scrape author h-index and citation metrics?

Yes. Provide an authorId and the actor will extract the author's h-index, i10-index, total citations, research interests, affiliation, homepage, co-authors, and their full publication list.

How does this compare to the Google Scholar API?

Google Scholar does not offer an official public API. This scraper fills that gap by extracting the same data you see on the Google Scholar website, structured as clean JSON. You get papers, authors, citations, and PDF links without needing to manage API quotas or keys.

Can I use this for systematic literature reviews?

Absolutely. Set your search query, enable includeAbstracts, use year filters to define your review period, and set maxResults to capture all relevant papers. The structured output is ready to import into reference managers or analysis tools.

Is this a Google Scholar crawler that handles pagination?

Yes. The actor automatically paginates through Google Scholar search results (10 per page) and author publication lists (100 per page) until it reaches your maxResults limit.

What data fields are extracted from each paper?

Each paper includes: title, URL, authors, publication year, publication venue, abstract (optional), citation count, citation URL, PDF link, PDF source, Scholar ID, related articles URL, and version count.

Can I scrape citation data to see who cited a paper?

Yes. Set includeCitations: true and the actor will follow citation links for each paper and extract the citing papers with their title, URL, authors, and year.

Integrations

You can connect this Google Scholar scraper with other tools using the Apify API, official API clients for Python and JavaScript, or integrate with platforms like Zapier, Make, Google Sheets, and Slack.

Use this actor programmatically via the Apify API:

curl "https://api.apify.com/v2/acts/george.the.developer~google-scholar-scraper/runs" \
-X POST \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_API_TOKEN" \
-d '{"query": "artificial intelligence", "maxResults": 50}'

Looking for scrapers for other platforms? Check out these actors by the same developer:

Support