Pricing

Pay per usage

Google Scholar Scraper: Articles, Citations & PDFs

Extract academic data from Google Scholar: titles, authors, years, citations, abstracts, PDF links. Supports queries, year filters (1900-2100), pagination (up to 5 pages). Rate-limited for safety. Ideal for research, citations, datasets, AI. Clean JSON output. Run on Apify with proxies.

Pricing

Pay per usage

Rating

0.0

(0)

Developer

PrimeParse

Actor stats

Bookmarked

Total users

Monthly active users

7 months ago

Last modified

🔬 Google Scholar Scraper: Academic Research Data Extractor

Enterprise-grade Google Scholar scraper for academic research and data analysis. Collects structured data from Google Scholar search results including titles, authors, citations, abstracts, and PDF links. Ideal for literature reviews, citation analysis, and academic dataset building. Features intelligent parsing, rate limiting, and year filtering.

High-quality Google Scholar Data Extractor for Researchers, Academics, and Data Scientists

Automatically searches Google Scholar, extracts article metadata, filters by publication year, and collects citation data — clean, structured, ready for analysis or academic research.

Built for:

Academic researchers conducting literature reviews
Data scientists building research datasets
PhD students tracking citations and publications
Librarians organizing academic resources
Research teams monitoring publication trends
AI/ML engineers collecting training data from academic sources

✅ Smart search with keyword queries ✅ Year range filtering (1900-2100) ✅ Rich metadata extraction (title, authors, year, citations, abstract, PDF links) ✅ Automatic pagination support (up to 5 pages) ✅ Rate limiting & respectful crawling ✅ AI-ready structured output

👉 Runs on Apify • No code required

🚀 Why This Scraper

✔ Purpose-Built for Academic Research

Intelligently extracts structured data from Google Scholar search results — perfect for literature reviews, citation analysis, and academic research.

✔ Comprehensive Metadata Extraction

Extracts all essential academic metadata: article titles, author lists, publication years, citation counts, abstracts, PDF links, and Google Scholar page URLs.

✔ Clean & Structured Output

Produces clean, structured JSON output ready for analysis, database import, or further processing. Perfect for academic datasets and research workflows.

✔ Smart Year Filtering

Filter results by publication year range to focus on recent research or historical publications. Supports years from 1900 to 2100.

✔ AI & ML Ready

Structured JSON output perfect for RAG systems, LLM fine-tuning, academic knowledge bases, or training datasets for research applications.

✔ Fast & Efficient

✔ Safe & Controlled Processing

Built-in rate limiting (1-2 second delays), configurable pagination limits, and graceful error handling to respect Google Scholar's infrastructure.

💼 Use Cases

Literature reviews — Collect and analyze academic papers for systematic reviews
Citation tracking — Monitor citation counts and track research impact
Publication monitoring — Track new publications in specific research areas
Dataset building — Create structured datasets for academic research or AI training
Competitive research — Monitor competitor publications and research trends
Academic analysis — Analyze publication patterns, author networks, and citation trends
PDF collection — Automatically collect PDF links for offline research

📊 Supported Data

Article titles — Full publication titles
Authors — Complete author lists (up to 10 authors per article)
Publication years — Extracted from metadata
Citation counts — Number of citations for each article
Abstracts — Article abstracts when available
PDF links — Direct links to PDF files when available
Google Scholar links — Direct links to article pages on Google Scholar

⚙️ How It Works

Enter your search query (e.g., "machine learning", "quantum computing")
Optionally set year range filters and pagination limits
Configure proxy settings for reliable access
Run the Actor
Download clean, structured academic datasets

🧩 Input Configuration

Example JSON Input

{
  "query": "machine learning",
  "maxPages": 1,
  "startYear": 2020,
  "endYear": 2026,
  "proxyConfiguration": {
    "useApifyProxy": true
  }
}

Key Options

query — Search query string (required, e.g., "machine learning", "neural networks")
maxPages — Maximum number of result pages to scrape (default: 1, recommended: 1-5)
startYear — Filter results by minimum publication year (optional, 1900-2100)
endYear — Filter results by maximum publication year (optional, 1900-2100)
proxyConfiguration — Proxy settings for anti-bot protection (default: uses Apify Proxy)

Search Query Tips

Use specific terms for better results (e.g., "deep learning neural networks" instead of "AI")
Combine keywords with quotes for exact phrases: "transfer learning"
Use Boolean operators: machine learning AND computer vision
Filter by author: author:"John Smith" machine learning
Filter by publication: source:"Nature" quantum computing

📂 Output Dataset

All articles are stored in the default Apify dataset with the following structure:

Example Output Record

{
  "title": "Machine learning",
  "authors": [
    "ZH Zhou"
  ],
  "year": 2021,
  "citations": 3301,
  "abstract": "… from data is called learning or training. The … machine learning is to find or approximate ground-truth. In this book, models are sometimes called learners, which are machine learning …",
  "pdfLink": null,
  "scholarLink": "https://books.google.com/books?hl=en&lr=&id=ctM-EAAAQBAJ&oi=fnd&pg=PR6&dq=machine+learning&ots=o_OnT7Rv3p&sig=bH9TGnw_ZdZYH4lSLmKun7xX6Cs"
}

Output Fields

title (string, required) — Article title
authors (array, required) — List of author names (up to 10 authors)
year (integer|null) — Publication year
citations (integer|null) — Number of citations
abstract (string|null) — Article abstract when available
pdfLink (string|null) — Direct link to PDF file when available
scholarLink (string, required) — Link to Google Scholar article page

Multiple Authors Example

{
  "title": "A guide to machine learning for biologists",
  "authors": [
    "JG Greener",
    "SM Kandathil"
  ],
  "year": 2022,
  "citations": 2020,
  "abstract": "… A machine learning task is an objective specification for what we want a machine learning model to accomplish…",
  "pdfLink": "https://discovery.ucl.ac.uk/id/eprint/10134478/1/NRMCB-review-accepted-forRPS.pdf",
  "scholarLink": "https://www.nature.com/articles/s41580-021-00407-0"
}

Input File Example

Create storage/key_value_stores/default/INPUT.json:

{
  "query": "quantum computing",
  "maxPages": 2,
  "startYear": 2020,
  "endYear": 2024
}

📈 Performance

Processing Speed — ~3-4 seconds per page (depending on results)
Rate Limiting — Built-in 1-2 second delays between requests
Concurrency — Single request at a time for reliability
Scalability — Handles 1-5 pages optimally (up to 50 articles per run)
Success Rate — High reliability with proper proxy configuration

🔧 Advanced Configuration

Year Range Filtering

Filter results by publication year:

{
  "query": "artificial intelligence",
  "startYear": 2020,
  "endYear": 2026
}

Multiple Pages

Scrape multiple pages for comprehensive results:

{
  "query": "deep learning",
  "maxPages": 5
}

Proxy Configuration

Use Apify Proxy for reliable access:

{
  "query": "neural networks",
  "proxyConfiguration": {
    "useApifyProxy": true
  }
}

📧 Support

Issues — Use Apify Issues tab for bug reports
Documentation — Check Apify documentation for platform features
Community — Join Apify community for discussions

Tags: Google Scholar, academic research, literature review, citation analysis, research data, paper scraping, academic scraping, research automation, citation tracking, publication monitoring, academic dataset, research tools, scholarly articles, PDF extraction

Built with ❤️ on Apify

Google Scholar Scraper - Academic Papers & Citations

klondikeking/google-scholar-scraper-v2

Extract academic papers, citations, authors, and PDF links from Google Scholar.

Pierrick McD0nald

Google Scholar Scraper

crawlerbros/google-scholar-scraper

Scrape academic papers, articles, and citations from Google Scholar. Search by keywords with filters for year range, document type, sort order, and article type. Extract titles, authors, citations, links, and more.

Crawler Bros

5.0

Google Scholar Article Scraper

agenscrape/google-scholar-article-scraper

Extract academic articles, citations, authors, and publication data from Google Scholar. Perfect for research analysis and literature reviews with fast, reliable scraping.

Agenscrape

Free Google Scholar Scraper — Papers + Citations

s-r/free-google-scholar-scraper

Google Scholar Scraper

kawsar/google-scholar-scraper

Google Scholar scraper that collects paper titles, authors, citations, and PDF links from search results, so you get structured academic data without the manual work.

Kawsar

Google Scholar Scraper

automation-lab/google-scholar-scraper

Search Google Scholar and extract academic papers. Get titles, authors, citation counts, abstracts, PDF links, and publication details. Supports year filtering.

Stas Persiianenko

Google Scholar Scraper — Papers & Citations

muhammadafzal/google-scholar-scraper

Scrape Google Scholar results with paper titles, authors, publication details, citation counts, related links, and research metadata.

Muhammad Afzal

Google Scholar Scraper

lulzasaur/google-scholar-scraper

Scrape Google Scholar search results with titles, authors, citations, abstracts, and PDF links. Also supports author profile mode to extract h-index, i10-index, and publication lists.

lulz bot

Google Scholar Scraper

johnlenflure/google-scholar-scraper

Scrape Google Scholar search results. Extract paper titles, authors, abstracts, citation counts, years, PDF links, and related article URLs.

Sinan Donmez

Semantic Scholar Scraper

solidcode/semanticscholar-scraper

[💰 $6 / 1K] Extract academic papers, abstracts, citations, references, authors, and open-access PDF links from Semantic Scholar's 200M+ database. Search by keyword, paper ID/DOI/URL, or author. Filter by year, field, and citations. No API key.