Pricing

from $4.00 / 1,000 paper founds

Try for free

Go to Apify Store

Google Scholar Scraper

Try for free

Scrape Google Scholar for academic papers, citations, author profiles. No API key needed. Extract titles, authors, abstracts, citation counts, PDF links, h-index, i10-index. Export JSON, CSV, Excel. Anti-bot protection with residential proxies, UA rotation, CAPTCHA detection.

Pricing

from $4.00 / 1,000 paper founds

Rating

5.0

(2)

Developer

George Kioko

Actor stats

Bookmarked

Total users

Monthly active users

0.4 hours

Issues response

a month ago

Last modified

Google Scholar Scraper -- Scrape Academic Papers, Citations & Author Profiles

The most reliable Google Scholar scraper on Apify. Scrape Google Scholar for academic papers, citation data, author profiles, h-index, and research metadata -- all without an API key. Export Google Scholar data as JSON, CSV, or Excel with full abstracts, citation counts, PDF links, and author metrics.

Use this academic paper scraper to extract structured research data at scale. Whether you need a citation scraper for bibliometric analysis, an author profile scraper for researcher tracking, or a research paper extractor for building literature databases -- this actor handles it all.

How It Works

flowchart LR
    A["Your Input\n(query or author ID)"] --> B["Google Scholar\nScraper"]
    B --> C{"Scraping Mode"}
    C -->|Paper Search| D["Search Results\nPagination"]
    C -->|Author Profile| E["Profile + Papers\nExtraction"]
    D --> F["Structured Data\n(JSON/CSV/Excel)"]
    E --> F
    D -->|"includeCitations: true"| G["Citation Details\nExtraction"]
    G --> F

Architecture

flowchart TB
    subgraph Input
        Q["Search Query"]
        AU["Author ID"]
        FI["Filters\n(year, sort, language)"]
    end

    subgraph Anti-Bot Layer
        PR["Residential Proxies\n(US-based)"]
        UA["User Agent\nRotation"]
        DL["Random Delays\n(2-5 seconds)"]
        ST["Browser Fingerprint\nMasking"]
        CP["CAPTCHA Detection\n& Auto-Retry"]
    end

    subgraph Google Scholar
        SP["Search Pages\n(/scholar)"]
        AP["Author Pages\n(/citations)"]
        CT["Citation Pages\n(/scholar?cites=...)"]
    end

    subgraph Data Extraction
        PP["Paper Parser\n(title, authors, year,\nabstract, citations, PDF)"]
        AE["Author Parser\n(h-index, i10-index,\ncitations, co-authors)"]
        CE["Citation Parser\n(citing papers,\nauthors, years)"]
    end

    subgraph Output
        DS["Apify Dataset\n(JSON / CSV / Excel)"]
    end

    Q & AU & FI --> PR
    PR --> UA --> DL --> ST --> CP
    CP --> SP & AP
    SP --> PP
    AP --> AE
    AE -->|"Paper list"| PP
    PP -->|"includeCitations"| CT
    CT --> CE
    PP & AE & CE --> DS

Features

Scrape Google Scholar search results -- Extract papers by any keyword, topic, or phrase with full pagination support
Author profile scraper -- Get h-index, i10-index, total citations, research interests, co-authors, and complete publication lists from any Google Scholar author profile
Citation scraper -- Discover which papers cite a given paper, with full metadata for each citing work
Research paper extractor -- Pull titles, authors, years, abstracts, publication venues, and direct PDF links
No API key required -- Works without any Google Scholar API key, credentials, or authentication
Export Google Scholar data -- Download results as JSON, CSV, or Excel from the Apify dataset
Year filtering -- Narrow results by publication date range for targeted academic data extraction
Sort by date or relevance -- Find the latest papers or the most relevant ones
Multi-language support -- Scrape Google Scholar in any supported language
Anti-bot protection built in -- Residential proxies, user agent rotation, random delays, fingerprint masking, and automatic CAPTCHA retry
H-index scraper -- Extract h-index and i10-index metrics directly from author profiles
PDF link extraction -- Get direct links to open-access PDFs when available
Pay per result -- Only pay for the papers you actually scrape, no monthly subscription

Use Cases

Literature reviews -- Quickly gather all papers on a topic with citation counts and abstracts for systematic reviews
Citation analysis -- Track how many times papers are cited and by whom for bibliometric research
Research monitoring -- Monitor new publications in a field by filtering by year
Author analysis -- Get complete publication profiles and h-index data for researchers
AI/RAG pipelines -- Feed structured academic data into LLM knowledge bases and retrieval-augmented generation systems
Academic databases -- Build custom research databases with structured Google Scholar data
Competitive research -- Track publications from specific authors or institutions
Grant applications -- Quickly compile publication lists and citation metrics for funding proposals
Trend analysis -- Identify emerging research topics by tracking publication volume over time

Input

Parameter	Type	Default	Description
`query`	string	-	Search query for Google Scholar (e.g., "machine learning healthcare")
`authorId`	string	-	Google Scholar author ID for profile scraping
`maxResults`	integer	50	Maximum number of papers to scrape
`language`	string	"en"	Language code for results
`yearFrom`	integer	-	Filter papers from this year
`yearTo`	integer	-	Filter papers to this year
`sortBy`	string	"relevance"	Sort by "relevance" or "date"
`includeAbstracts`	boolean	true	Include paper abstracts in output
`includeCitations`	boolean	false	Include citing paper details

You must provide either query or authorId (or both).

How to Find a Google Scholar Author ID

Go to a researcher's Google Scholar profile. The URL will look like:

https://scholar.google.com/citations?user=JicYPdAAAAAJ

The author ID is the value after user= (in this case JicYPdAAAAAJ).

Example Inputs

Search for papers on a topic

{
    "query": "large language models healthcare",
    "maxResults": 100,
    "yearFrom": 2023,
    "sortBy": "date",
    "includeAbstracts": true
}

Scrape an author profile (h-index, citations, publications)

{
    "authorId": "JicYPdAAAAAJ",
    "maxResults": 200
}

Search with citation details

{
    "query": "transformer architecture attention mechanism",
    "maxResults": 20,
    "includeCitations": true
}

Filter papers by year range

{
    "query": "CRISPR gene editing",
    "yearFrom": 2022,
    "yearTo": 2025,
    "maxResults": 50,
    "sortBy": "date"
}

Output

Paper search result

Each paper scraped from Google Scholar includes:

{
    "type": "paper",
    "title": "Attention Is All You Need",
    "url": "https://arxiv.org/abs/1706.03762",
    "authors": "A Vaswani, N Shazeer, N Parmar, J Uszkoreit...",
    "year": 2017,
    "publicationVenue": "Advances in neural information processing systems",
    "abstract": "The dominant sequence transduction models are based on complex recurrent or convolutional neural networks...",
    "citationCount": 120543,
    "citationUrl": "https://scholar.google.com/scholar?cites=...",
    "pdfLink": "https://arxiv.org/pdf/1706.03762",
    "pdfSource": "[PDF] arxiv.org",
    "scholarId": "12345678",
    "relatedUrl": "https://scholar.google.com/scholar?related=...",
    "versionCount": 15,
    "scrapedAt": "2026-03-12T10:00:00.000Z",
    "searchQuery": "transformer architecture"
}

Author profile result

Author profile data includes h-index, i10-index, total citations, and co-authors:

{
    "type": "author_profile",
    "authorId": "JicYPdAAAAAJ",
    "name": "Geoffrey Hinton",
    "affiliation": "University of Toronto",
    "interests": ["Machine Learning", "Neural Networks", "Deep Learning"],
    "totalCitations": 834291,
    "hIndex": 186,
    "i10Index": 499,
    "imageUrl": "https://scholar.google.com/citations?view_op=medium_photo&user=...",
    "homepage": "https://www.cs.toronto.edu/~hinton/",
    "coauthors": [
        {
            "name": "Yann LeCun",
            "affiliation": "NYU / Meta",
            "url": "https://scholar.google.com/citations?user=..."
        }
    ],
    "scrapedAt": "2026-03-12T10:00:00.000Z"
}

Citation details result

When includeCitations is enabled, you get citing papers for each result:

{
    "type": "citation_details",
    "parentTitle": "Attention Is All You Need",
    "parentScholarId": "12345678",
    "citingPapers": [
        {
            "title": "BERT: Pre-training of Deep Bidirectional Transformers",
            "url": "https://arxiv.org/abs/1810.04805",
            "authors": "J Devlin, MW Chang, K Lee, K Toutanova",
            "year": 2018
        }
    ],
    "citingPapersCount": 10,
    "scrapedAt": "2026-03-12T10:00:00.000Z"
}

Pricing

This actor uses pay-per-event pricing. You only pay for the papers you scrape -- no monthly fees, no API key costs.

Event	Price
`paper-found`	$0.004 per paper

Cost examples

Papers scraped	Cost
50 papers	$0.20
200 papers	$0.80
500 papers	$2.00
1,000 papers	$4.00

Author profile metadata (h-index, i10-index, citations, co-authors) is included free. You only pay for papers scraped.

Anti-Bot Protection

Google Scholar is known for aggressive anti-bot measures. This Google Scholar crawler handles it with:

Residential proxies -- Uses Apify's US-based residential proxy group for reliable access
Random delays -- 2-5 second random delays between requests to mimic human browsing
User agent rotation -- Rotates through 6 real browser user agents (Chrome, Firefox, Safari, Edge)
Browser fingerprint masking -- Hides navigator.webdriver and other automation signals
CAPTCHA detection -- Detects CAPTCHA/block pages and automatically retries with a different proxy session
Single concurrency -- Makes requests one at a time to avoid triggering Google rate limits

Tips for Best Results

Start small -- Test with maxResults: 10 first to verify your query returns what you need
Use year filters -- Narrow your search with yearFrom and yearTo for more targeted academic data extraction
Sort by date -- Use sortBy: "date" to find the most recent papers in any field
Citation details are expensive -- Enable includeCitations only when needed, as it makes additional requests per paper
Author profiles -- Use authorId to get a complete publication list, h-index, and citation metrics for any researcher
Combine modes -- Provide both query and authorId to search papers AND scrape an author profile in one run

Limitations

Google Scholar may temporarily block requests if too many are made. The actor handles this with automatic retries and proxy rotation.
Citation details (includeCitations: true) significantly increases run time and cost since it makes additional page requests per paper.
Google Scholar does not provide an official API, so page structure may change occasionally.
Maximum practical limit is approximately 1,000 results per search query (this is a Google Scholar limitation, not an actor limitation).

FAQ

Can I scrape Google Scholar without an API key?

Yes. This actor scrapes Google Scholar directly through a headless browser with residential proxies. No Google Scholar API key, no Semantic Scholar API, and no third-party credentials are required. Just provide your search query or author ID and run it.

How do I export Google Scholar data to CSV or Excel?

After the actor finishes, go to the dataset tab in the Apify Console. You can download results as JSON, CSV, Excel, XML, or RSS. You can also access the data programmatically via the Apify API.

Can I scrape author h-index and citation metrics?

Yes. Provide an authorId and the actor will extract the author's h-index, i10-index, total citations, research interests, affiliation, homepage, co-authors, and their full publication list.

How does this compare to the Google Scholar API?

Google Scholar does not offer an official public API. This scraper fills that gap by extracting the same data you see on the Google Scholar website, structured as clean JSON. You get papers, authors, citations, and PDF links without needing to manage API quotas or keys.

Can I use this for systematic literature reviews?

Absolutely. Set your search query, enable includeAbstracts, use year filters to define your review period, and set maxResults to capture all relevant papers. The structured output is ready to import into reference managers or analysis tools.

Is this a Google Scholar crawler that handles pagination?

Yes. The actor automatically paginates through Google Scholar search results (10 per page) and author publication lists (100 per page) until it reaches your maxResults limit.

What data fields are extracted from each paper?

Each paper includes: title, URL, authors, publication year, publication venue, abstract (optional), citation count, citation URL, PDF link, PDF source, Scholar ID, related articles URL, and version count.

Can I scrape citation data to see who cited a paper?

Yes. Set includeCitations: true and the actor will follow citation links for each paper and extract the citing papers with their title, URL, authors, and year.

Integrations

You can connect this Google Scholar scraper with other tools using the Apify API, official API clients for Python and JavaScript, or integrate with platforms like Zapier, Make, Google Sheets, and Slack.

Use this actor programmatically via the Apify API:

curl "https://api.apify.com/v2/acts/george.the.developer~google-scholar-scraper/runs" \
  -X POST \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_API_TOKEN" \
  -d '{"query": "artificial intelligence", "maxResults": 50}'

Looking for scrapers for other platforms? Check out these actors by the same developer:

Threads Scraper -- Scrape Meta Threads posts, profiles, and comments
Telegram Channel Scraper -- Extract messages, media, and metadata from public Telegram channels
YouTube Transcript Extractor -- Extract full transcripts from YouTube videos
Google News Scraper & Brand Monitor -- Scrape Google News articles and monitor brand mentions

Support

For bugs or feature requests, open an issue on this actor's page
Built by George The Developer

Google Scholar Scraper

easyapi/google-scholar-scraper

Powerful Google Scholar scraper collect up to 5000 scholarly results per run with flexible search options, citation filtering. Perfect for academic research, bibliometric analysis, and scientific trend tracking. 🎓🔍

EasyApi

374

2.5

Academic Paper Scraper

labrat011/academic-paper-scraper

Search MILLIONS of academic papers from Semantic Scholar and arXiv by keyword, DOI, or citation graph. Returns titles, authors, abstracts, citation counts, and open access PDFs as clean JSON. Works as an MCP tool for AI agents.

mick_

Google Scholar Scraper

marco.gullo/google-scholar-scraper

Scrape publication details from scholar.google.com. Add your query, time range, and optionally document type (PDF or HTML only). Extract information about articles such as titles, authors, links, related articles, and more.

Marco Gullo

1.8K

5.0

We Work Remotely Jobs Scraper

parsebird/wwr-jobs-scraper

Scrape remote job listings from We Work Remotely, the world's largest remote job board. Filter by category, country, salary range, engagement type, and skills. Extract job details, company info, descriptions, and apply links.

ParseBird

Google Scholar Scraper

lulzasaur/google-scholar-scraper

Scrape Google Scholar search results with titles, authors, citations, abstracts, and PDF links. Also supports author profile mode to extract h-index, i10-index, and publication lists.

lulz bot

Rumble Video Downloader

easyapi/rumble-video-downloader

Extract and download videos from Rumble.com in multiple quality options (240p to 1080p). Get detailed metadata including title, author, duration, and thumbnail. Perfect for content archiving, research, and offline viewing. Supports batch processing of multiple URLs. 🎥✨

EasyApi

Google Scholar Scraper

solidcode/google-scholar-scraper

[💰 $2.0 / 1K] Extract academic papers, author profiles, h-index, i10-index, citation counts, abstracts, and PDF links from Google Scholar. Batch search queries and author IDs, filter by year range, sort by relevance or date.

SolidCode

Stepstone Job Scraper 🔥

shahidirfan/Stepstone-Job-Scraper

Introducing the Stepstone Job Scraper, a lightweight actor for efficiently scraping job listings from Stepstone. Fast and simple. For best results and reliable data extraction, the use of residential proxies is strongly advised. Get the job data you need!

Shahid Irfan

206

2.0

Telegram Channel Messages Scraper

maximedupre/telegram-channel-messages-scraper

Get public Telegram channel messages from channel URLs, t.me/s archive links, @handles, bare handles, or direct post URLs. Start with the prefilled channel for a small test run. Private /c/ links, invite-only channels, DMs, and Telegram login are not supported.

Maxime Dupré

Google Scholar Scraper

automation-lab/google-scholar-scraper

Search Google Scholar and extract academic papers. Get titles, authors, citation counts, abstracts, PDF links, and publication details. Supports year filtering.