Google Scholar Scraper avatar

Google Scholar Scraper

Pricing

from $1.50 / 1,000 results

Go to Apify Store
Google Scholar Scraper

Google Scholar Scraper

Extract academic papers from Google Scholar: title, authors, year, journal, citation count, abstract snippet, PDF links. Search by keyword with year range filters. Stricter rate limiting for reliability. Perfect for literature review, research trend analysis, citation tracking.

Pricing

from $1.50 / 1,000 results

Rating

0.0

(0)

Developer

cloud9

cloud9

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

5 days ago

Last modified

Categories

Share

Apify Actor to scrape Google Scholar search results with advanced filtering options.

Features

  • Search by keyword: Find academic papers, articles, and books
  • Author filtering: Filter results by specific authors
  • Year range: Limit results to specific publication years
  • Sort options: Sort by relevance or date
  • Citation data: Extract citation counts and related articles
  • PDF links: Automatically detect available PDF downloads
  • Rate limiting: Built-in 5-10 second delays to respect Google Scholar
  • Robust parsing: Handles various result formats (articles, books, citations)

Input Parameters

FieldTypeRequiredDescription
searchQueryStringSearch query (e.g., "machine learning")
authorStringFilter by author name
yearFromNumberPublication year start (1900-2100)
yearToNumberPublication year end (1900-2100)
sortBySelectSort by "relevance" or "date" (default: "relevance")
includePatentsBooleanInclude patents in results (default: true)
includeCitationsBooleanInclude citations in results (default: true)
maxResultsNumberMaximum results to scrape (default: 100, max: 1000)

Output Format

Each result contains:

{
"title": "Paper title",
"articleUrl": "https://example.com/paper.pdf",
"pdfUrl": "https://example.com/download.pdf",
"authors": "John Doe, Jane Smith",
"year": 2023,
"journal": "Journal of Machine Learning Research",
"abstract": "This paper presents...",
"citationCount": 42,
"citedByUrl": "https://scholar.google.com/scholar?cites=...",
"relatedArticlesUrl": "https://scholar.google.com/scholar?q=related:...",
"allVersionsCount": 3,
"isBook": false,
"isCitation": false,
"isPdf": true
}

Usage Example

Input

{
"searchQuery": "deep learning natural language processing",
"author": "Yoshua Bengio",
"yearFrom": 2020,
"yearTo": 2024,
"sortBy": "date",
"maxResults": 50
}

Run Locally

# Install dependencies
npm install
# Build TypeScript
npm run build
# Run actor (requires input.json in root or Apify environment)
npm start

Important Notes

Rate Limiting

Google Scholar is very strict about automated access:

  • Actor uses 5-10 second delays between requests
  • Realistic User-Agent rotation
  • Proper HTTP headers to mimic browser behavior
  • Automatic CAPTCHA detection and graceful shutdown

Recommendation:

  • Keep maxResults under 100 for reliability
  • Use longer delays for larger scrapes
  • Consider using Google Scholar API alternatives for production use

CAPTCHA/Blocking

If Google Scholar detects automation:

  • Actor logs a warning and stops gracefully
  • No partial results are lost (already scraped data is saved)
  • You can retry with longer delays or from a different IP
  • Respect Google Scholar's Terms of Service
  • Use for research/academic purposes
  • Do not overload their servers
  • Consider API alternatives for commercial use

Development

Build

$npm run build

Local Testing

$npm run dev

Docker Build

docker build -t google-scholar-scraper .
docker run -e APIFY_INPUT='{"searchQuery":"machine learning"}' google-scholar-scraper

Troubleshooting

No Results Found

  • Check if query has typos
  • Try broader search terms
  • Verify year range is valid

CAPTCHA Detected

  • Reduce maxResults
  • Run actor less frequently
  • Use different IP address
  • Consider Google Scholar API

Parser Errors

  • Google Scholar HTML structure may change
  • Open an issue with example query
  • Actor will skip unparseable results

License

Apache-2.0

Support

For issues or questions, please open a GitHub issue or contact the Apify support team.