Google Scholar Scraper
Pricing
Pay per usage
Google Scholar Scraper
Scrape Google Scholar for academic papers, citations, author profiles. No API key needed. Extract titles, authors, abstracts, citation counts, PDF links, h-index, i10-index. Export JSON, CSV, Excel. Anti-bot protection with residential proxies, UA rotation, CAPTCHA detection.
Pricing
Pay per usage
Rating
0.0
(0)
Developer

George Kioko
Actor stats
0
Bookmarked
2
Total users
1
Monthly active users
2 days ago
Last modified
Categories
Share
Google Scholar Scraper -- Scrape Academic Papers, Citations & Author Profiles
The most reliable Google Scholar scraper on Apify. Scrape Google Scholar for academic papers, citation data, author profiles, h-index, and research metadata -- all without an API key. Export Google Scholar data as JSON, CSV, or Excel with full abstracts, citation counts, PDF links, and author metrics.
Use this academic paper scraper to extract structured research data at scale. Whether you need a citation scraper for bibliometric analysis, an author profile scraper for researcher tracking, or a research paper extractor for building literature databases -- this actor handles it all.
How It Works
flowchart LRA["Your Input\n(query or author ID)"] --> B["Google Scholar\nScraper"]B --> C{"Scraping Mode"}C -->|Paper Search| D["Search Results\nPagination"]C -->|Author Profile| E["Profile + Papers\nExtraction"]D --> F["Structured Data\n(JSON/CSV/Excel)"]E --> FD -->|"includeCitations: true"| G["Citation Details\nExtraction"]G --> F
Architecture
flowchart TBsubgraph InputQ["Search Query"]AU["Author ID"]FI["Filters\n(year, sort, language)"]endsubgraph Anti-Bot LayerPR["Residential Proxies\n(US-based)"]UA["User Agent\nRotation"]DL["Random Delays\n(2-5 seconds)"]ST["Browser Fingerprint\nMasking"]CP["CAPTCHA Detection\n& Auto-Retry"]endsubgraph Google ScholarSP["Search Pages\n(/scholar)"]AP["Author Pages\n(/citations)"]CT["Citation Pages\n(/scholar?cites=...)"]endsubgraph Data ExtractionPP["Paper Parser\n(title, authors, year,\nabstract, citations, PDF)"]AE["Author Parser\n(h-index, i10-index,\ncitations, co-authors)"]CE["Citation Parser\n(citing papers,\nauthors, years)"]endsubgraph OutputDS["Apify Dataset\n(JSON / CSV / Excel)"]endQ & AU & FI --> PRPR --> UA --> DL --> ST --> CPCP --> SP & APSP --> PPAP --> AEAE -->|"Paper list"| PPPP -->|"includeCitations"| CTCT --> CEPP & AE & CE --> DS
Features
- Scrape Google Scholar search results -- Extract papers by any keyword, topic, or phrase with full pagination support
- Author profile scraper -- Get h-index, i10-index, total citations, research interests, co-authors, and complete publication lists from any Google Scholar author profile
- Citation scraper -- Discover which papers cite a given paper, with full metadata for each citing work
- Research paper extractor -- Pull titles, authors, years, abstracts, publication venues, and direct PDF links
- No API key required -- Works without any Google Scholar API key, credentials, or authentication
- Export Google Scholar data -- Download results as JSON, CSV, or Excel from the Apify dataset
- Year filtering -- Narrow results by publication date range for targeted academic data extraction
- Sort by date or relevance -- Find the latest papers or the most relevant ones
- Multi-language support -- Scrape Google Scholar in any supported language
- Anti-bot protection built in -- Residential proxies, user agent rotation, random delays, fingerprint masking, and automatic CAPTCHA retry
- H-index scraper -- Extract h-index and i10-index metrics directly from author profiles
- PDF link extraction -- Get direct links to open-access PDFs when available
- Pay per result -- Only pay for the papers you actually scrape, no monthly subscription
Use Cases
- Literature reviews -- Quickly gather all papers on a topic with citation counts and abstracts for systematic reviews
- Citation analysis -- Track how many times papers are cited and by whom for bibliometric research
- Research monitoring -- Monitor new publications in a field by filtering by year
- Author analysis -- Get complete publication profiles and h-index data for researchers
- AI/RAG pipelines -- Feed structured academic data into LLM knowledge bases and retrieval-augmented generation systems
- Academic databases -- Build custom research databases with structured Google Scholar data
- Competitive research -- Track publications from specific authors or institutions
- Grant applications -- Quickly compile publication lists and citation metrics for funding proposals
- Trend analysis -- Identify emerging research topics by tracking publication volume over time
Input
| Parameter | Type | Default | Description |
|---|---|---|---|
query | string | - | Search query for Google Scholar (e.g., "machine learning healthcare") |
authorId | string | - | Google Scholar author ID for profile scraping |
maxResults | integer | 50 | Maximum number of papers to scrape |
language | string | "en" | Language code for results |
yearFrom | integer | - | Filter papers from this year |
yearTo | integer | - | Filter papers to this year |
sortBy | string | "relevance" | Sort by "relevance" or "date" |
includeAbstracts | boolean | true | Include paper abstracts in output |
includeCitations | boolean | false | Include citing paper details |
You must provide either query or authorId (or both).
How to Find a Google Scholar Author ID
Go to a researcher's Google Scholar profile. The URL will look like:
https://scholar.google.com/citations?user=JicYPdAAAAAJ
The author ID is the value after user= (in this case JicYPdAAAAAJ).
Example Inputs
Search for papers on a topic
{"query": "large language models healthcare","maxResults": 100,"yearFrom": 2023,"sortBy": "date","includeAbstracts": true}
Scrape an author profile (h-index, citations, publications)
{"authorId": "JicYPdAAAAAJ","maxResults": 200}
Search with citation details
{"query": "transformer architecture attention mechanism","maxResults": 20,"includeCitations": true}
Filter papers by year range
{"query": "CRISPR gene editing","yearFrom": 2022,"yearTo": 2025,"maxResults": 50,"sortBy": "date"}
Output
Paper search result
Each paper scraped from Google Scholar includes:
{"type": "paper","title": "Attention Is All You Need","url": "https://arxiv.org/abs/1706.03762","authors": "A Vaswani, N Shazeer, N Parmar, J Uszkoreit...","year": 2017,"publicationVenue": "Advances in neural information processing systems","abstract": "The dominant sequence transduction models are based on complex recurrent or convolutional neural networks...","citationCount": 120543,"citationUrl": "https://scholar.google.com/scholar?cites=...","pdfLink": "https://arxiv.org/pdf/1706.03762","pdfSource": "[PDF] arxiv.org","scholarId": "12345678","relatedUrl": "https://scholar.google.com/scholar?related=...","versionCount": 15,"scrapedAt": "2026-03-12T10:00:00.000Z","searchQuery": "transformer architecture"}
Author profile result
Author profile data includes h-index, i10-index, total citations, and co-authors:
{"type": "author_profile","authorId": "JicYPdAAAAAJ","name": "Geoffrey Hinton","affiliation": "University of Toronto","interests": ["Machine Learning", "Neural Networks", "Deep Learning"],"totalCitations": 834291,"hIndex": 186,"i10Index": 499,"imageUrl": "https://scholar.google.com/citations?view_op=medium_photo&user=...","homepage": "https://www.cs.toronto.edu/~hinton/","coauthors": [{"name": "Yann LeCun","affiliation": "NYU / Meta","url": "https://scholar.google.com/citations?user=..."}],"scrapedAt": "2026-03-12T10:00:00.000Z"}
Citation details result
When includeCitations is enabled, you get citing papers for each result:
{"type": "citation_details","parentTitle": "Attention Is All You Need","parentScholarId": "12345678","citingPapers": [{"title": "BERT: Pre-training of Deep Bidirectional Transformers","url": "https://arxiv.org/abs/1810.04805","authors": "J Devlin, MW Chang, K Lee, K Toutanova","year": 2018}],"citingPapersCount": 10,"scrapedAt": "2026-03-12T10:00:00.000Z"}
Pricing
This actor uses pay-per-event pricing. You only pay for the papers you scrape -- no monthly fees, no API key costs.
| Event | Price |
|---|---|
paper-found | $0.004 per paper |
Cost examples
| Papers scraped | Cost |
|---|---|
| 50 papers | $0.20 |
| 200 papers | $0.80 |
| 500 papers | $2.00 |
| 1,000 papers | $4.00 |
Author profile metadata (h-index, i10-index, citations, co-authors) is included free. You only pay for papers scraped.
Anti-Bot Protection
Google Scholar is known for aggressive anti-bot measures. This Google Scholar crawler handles it with:
- Residential proxies -- Uses Apify's US-based residential proxy group for reliable access
- Random delays -- 2-5 second random delays between requests to mimic human browsing
- User agent rotation -- Rotates through 6 real browser user agents (Chrome, Firefox, Safari, Edge)
- Browser fingerprint masking -- Hides
navigator.webdriverand other automation signals - CAPTCHA detection -- Detects CAPTCHA/block pages and automatically retries with a different proxy session
- Single concurrency -- Makes requests one at a time to avoid triggering Google rate limits
Tips for Best Results
- Start small -- Test with
maxResults: 10first to verify your query returns what you need - Use year filters -- Narrow your search with
yearFromandyearTofor more targeted academic data extraction - Sort by date -- Use
sortBy: "date"to find the most recent papers in any field - Citation details are expensive -- Enable
includeCitationsonly when needed, as it makes additional requests per paper - Author profiles -- Use
authorIdto get a complete publication list, h-index, and citation metrics for any researcher - Combine modes -- Provide both
queryandauthorIdto search papers AND scrape an author profile in one run
Limitations
- Google Scholar may temporarily block requests if too many are made. The actor handles this with automatic retries and proxy rotation.
- Citation details (
includeCitations: true) significantly increases run time and cost since it makes additional page requests per paper. - Google Scholar does not provide an official API, so page structure may change occasionally.
- Maximum practical limit is approximately 1,000 results per search query (this is a Google Scholar limitation, not an actor limitation).
FAQ
Can I scrape Google Scholar without an API key?
Yes. This actor scrapes Google Scholar directly through a headless browser with residential proxies. No Google Scholar API key, no Semantic Scholar API, and no third-party credentials are required. Just provide your search query or author ID and run it.
How do I export Google Scholar data to CSV or Excel?
After the actor finishes, go to the dataset tab in the Apify Console. You can download results as JSON, CSV, Excel, XML, or RSS. You can also access the data programmatically via the Apify API.
Can I scrape author h-index and citation metrics?
Yes. Provide an authorId and the actor will extract the author's h-index, i10-index, total citations, research interests, affiliation, homepage, co-authors, and their full publication list.
How does this compare to the Google Scholar API?
Google Scholar does not offer an official public API. This scraper fills that gap by extracting the same data you see on the Google Scholar website, structured as clean JSON. You get papers, authors, citations, and PDF links without needing to manage API quotas or keys.
Can I use this for systematic literature reviews?
Absolutely. Set your search query, enable includeAbstracts, use year filters to define your review period, and set maxResults to capture all relevant papers. The structured output is ready to import into reference managers or analysis tools.
Is this a Google Scholar crawler that handles pagination?
Yes. The actor automatically paginates through Google Scholar search results (10 per page) and author publication lists (100 per page) until it reaches your maxResults limit.
What data fields are extracted from each paper?
Each paper includes: title, URL, authors, publication year, publication venue, abstract (optional), citation count, citation URL, PDF link, PDF source, Scholar ID, related articles URL, and version count.
Can I scrape citation data to see who cited a paper?
Yes. Set includeCitations: true and the actor will follow citation links for each paper and extract the citing papers with their title, URL, authors, and year.
Integrations
You can connect this Google Scholar scraper with other tools using the Apify API, official API clients for Python and JavaScript, or integrate with platforms like Zapier, Make, Google Sheets, and Slack.
Use this actor programmatically via the Apify API:
curl "https://api.apify.com/v2/acts/george.the.developer~google-scholar-scraper/runs" \-X POST \-H "Content-Type: application/json" \-H "Authorization: Bearer YOUR_API_TOKEN" \-d '{"query": "artificial intelligence", "maxResults": 50}'
Related Tools
Looking for scrapers for other platforms? Check out these actors by the same developer:
- Threads Scraper -- Scrape Meta Threads posts, profiles, and comments
- Telegram Channel Scraper -- Extract messages, media, and metadata from public Telegram channels
- YouTube Transcript Extractor -- Extract full transcripts from YouTube videos
- Google News Scraper & Brand Monitor -- Scrape Google News articles and monitor brand mentions
Support
- For bugs or feature requests, open an issue on this actor's page
- Built by George The Developer