Google Scholar Scraper
Pricing
from $1.50 / 1,000 results
Google Scholar Scraper
Extract academic papers from Google Scholar: title, authors, year, journal, citation count, abstract snippet, PDF links. Search by keyword with year range filters. Stricter rate limiting for reliability. Perfect for literature review, research trend analysis, citation tracking.
Pricing
from $1.50 / 1,000 results
Rating
0.0
(0)
Developer

cloud9
Actor stats
0
Bookmarked
2
Total users
1
Monthly active users
5 days ago
Last modified
Categories
Share
Apify Actor to scrape Google Scholar search results with advanced filtering options.
Features
- Search by keyword: Find academic papers, articles, and books
- Author filtering: Filter results by specific authors
- Year range: Limit results to specific publication years
- Sort options: Sort by relevance or date
- Citation data: Extract citation counts and related articles
- PDF links: Automatically detect available PDF downloads
- Rate limiting: Built-in 5-10 second delays to respect Google Scholar
- Robust parsing: Handles various result formats (articles, books, citations)
Input Parameters
| Field | Type | Required | Description |
|---|---|---|---|
searchQuery | String | ✅ | Search query (e.g., "machine learning") |
author | String | ❌ | Filter by author name |
yearFrom | Number | ❌ | Publication year start (1900-2100) |
yearTo | Number | ❌ | Publication year end (1900-2100) |
sortBy | Select | ❌ | Sort by "relevance" or "date" (default: "relevance") |
includePatents | Boolean | ❌ | Include patents in results (default: true) |
includeCitations | Boolean | ❌ | Include citations in results (default: true) |
maxResults | Number | ❌ | Maximum results to scrape (default: 100, max: 1000) |
Output Format
Each result contains:
{"title": "Paper title","articleUrl": "https://example.com/paper.pdf","pdfUrl": "https://example.com/download.pdf","authors": "John Doe, Jane Smith","year": 2023,"journal": "Journal of Machine Learning Research","abstract": "This paper presents...","citationCount": 42,"citedByUrl": "https://scholar.google.com/scholar?cites=...","relatedArticlesUrl": "https://scholar.google.com/scholar?q=related:...","allVersionsCount": 3,"isBook": false,"isCitation": false,"isPdf": true}
Usage Example
Input
{"searchQuery": "deep learning natural language processing","author": "Yoshua Bengio","yearFrom": 2020,"yearTo": 2024,"sortBy": "date","maxResults": 50}
Run Locally
# Install dependenciesnpm install# Build TypeScriptnpm run build# Run actor (requires input.json in root or Apify environment)npm start
Important Notes
Rate Limiting
Google Scholar is very strict about automated access:
- Actor uses 5-10 second delays between requests
- Realistic User-Agent rotation
- Proper HTTP headers to mimic browser behavior
- Automatic CAPTCHA detection and graceful shutdown
Recommendation:
- Keep
maxResultsunder 100 for reliability - Use longer delays for larger scrapes
- Consider using Google Scholar API alternatives for production use
CAPTCHA/Blocking
If Google Scholar detects automation:
- Actor logs a warning and stops gracefully
- No partial results are lost (already scraped data is saved)
- You can retry with longer delays or from a different IP
Legal Considerations
- Respect Google Scholar's Terms of Service
- Use for research/academic purposes
- Do not overload their servers
- Consider API alternatives for commercial use
Development
Build
$npm run build
Local Testing
$npm run dev
Docker Build
docker build -t google-scholar-scraper .docker run -e APIFY_INPUT='{"searchQuery":"machine learning"}' google-scholar-scraper
Troubleshooting
No Results Found
- Check if query has typos
- Try broader search terms
- Verify year range is valid
CAPTCHA Detected
- Reduce
maxResults - Run actor less frequently
- Use different IP address
- Consider Google Scholar API
Parser Errors
- Google Scholar HTML structure may change
- Open an issue with example query
- Actor will skip unparseable results
License
Apache-2.0
Support
For issues or questions, please open a GitHub issue or contact the Apify support team.