π Google Scholar Scraper
Pricing
from $3.99 / 1,000 results
π Google Scholar Scraper
Pricing
from $3.99 / 1,000 results
Rating
0.0
(0)
Developer
Scrapium
Maintained by CommunityActor stats
0
Bookmarked
2
Total users
1
Monthly active users
a day ago
Last modified
Categories
Share
π Google Scholar Scraper
A blazing-fast, production-grade Apify Actor that pulls academic papers from the global Scholar knowledge graph (OpenAlex + Semantic Scholar) and delivers clean, structured JSON ready for analysis, citation review, or literature dashboards.
Bulk in. Citations out. Throw a list of keywords or Google Scholar URLs and walk away β the Actor does the heavy lifting.
π Why Choose This Actor?
- π§ Multi-source intelligence β combines OpenAlex (250 M+ works) and Semantic Scholar so you never miss a paper.
- π Smart auto-escalating proxy β starts direct, falls back to Datacenter β Residential only when needed. You don't have to think about it.
- β‘ Live streaming results β each paper hits the dataset the moment it's scraped. A crash mid-run still leaves you with rows.
- π§Ή Built-in deduplication, filters, and sort β citations, recency, open-access, article-type filters out of the box.
- πͺΆ Light & fast β no headless browser, no Playwright overhead β just well-engineered HTTP calls.
- πΈ Pay only for what you use β no hidden compute time waste.
β¨ Key Features
- π Bulk search β submit dozens of queries / Scholar URLs at once.
- π₯ Up to 5 000 papers per query with cursor-based pagination.
- π·οΈ Rich metadata β title, authors, year, citations, source, PDF link, abstract snippet, etc.
- π‘οΈ Auto-rotating proxies with sticky residential mode after escalation.
- π Two pre-configured dataset views β Overview (essentials) + Full Details (everything).
- π Per-query sectioning β every record carries a
queryfield so you can split results by topic in seconds.
βοΈ Input
| Field | Type | Description |
|---|---|---|
searchQueries β± | array of strings | Search keywords or Scholar URLs (e.g. https://scholar.google.com/scholar?q=...). Required. |
maxItems | integer (1 β 5000) | Max papers per query. Default 100. |
sortBy | enum | relevance (default) | cited_by_count |
filter | enum | all (default) | has_pdf | open_access | recent_5_years |
articleType | enum | any (default) | journal | conference | book | preprint |
proxyConfiguration | object | Optional. Defaults to no proxy β the actor will auto-escalate to Datacenter/Residential on rate-limits. |
Example input
{"searchQueries": ["Tomato Shelf Life Prediction using IoT and Machine Learning","Federated learning healthcare"],"maxItems": 100,"sortBy": "cited_by_count","filter": "open_access","articleType": "journal","proxyConfiguration": { "useApifyProxy": false }}
π¦ Output
Each dataset row matches the well-known Scholar / SerpAPI-style shape:
{"query": "Tomato Shelf Life Prediction using IoT and Machine Learning","cidCode": "W4409060190","didCode": "W4409060190","lidCode": "","aidCode": "W4409060190","resultIndex": 0,"type": "ARTICLE","title": "Tomato Shelf Life Prediction using IoT and Machine Learning","link": "https://doi.org/10.1109/iciset62123.2024.10939467","documentLink": "","documentType": "","fullAttribution": "Nazmul Arafin Naim, Raisul Islam, Mohammed Saifuddin, ... - , 2024","authors": "Nazmul Arafin Naim, Raisul Islam, Mohammed Saifuddin, ...","publication": "","year": 2024,"source": "","searchMatch": "Predicting tomato shelf life is crucial for ...","citations": 1,"citationsLink": "https://openalex.org/W4409060190","relatedArticlesLink": "https://openalex.org/W4409060190","versions": 1,"versionsLink": "https://openalex.org/W4409060190"}
| Field | Meaning |
|---|---|
query | Original query that produced this row (lets you group sections). |
cidCode / didCode / aidCode | Stable record identifiers (OpenAlex ID or hash). |
resultIndex | Position within that query's result set. |
title | Paper title. |
authors | Up to five lead authors. |
publication / source | Journal / venue name. |
year | Publication year. |
citations | Total citation count. |
documentLink / documentType | Direct PDF/OA URL when available. |
searchMatch | Abstract snippet (first ~300 chars). |
citationsLink / relatedArticlesLink / versionsLink | Apify-friendly clickable links. |
π How to Use (Apify Console)
- Log in at https://console.apify.com β Actors.
- Open Google Scholar Scraper.
- Paste your queries (or Scholar URLs) into Search Queries.
- Tune
maxItems,sortBy,filter,articleTypeto taste. - Leave Proxy on its default (no proxy) β the Actor auto-escalates on rate-limits.
- Click βΆ Start.
- Watch the live log β every section reports progress in real time.
- Open the Output tab and export to JSON / CSV / XLSX.
π€ Use via API
curl -X POST "https://api.apify.com/v2/acts/<ACTOR_ID>/run-sync-get-dataset-items?token=$APIFY_TOKEN" \-H "Content-Type: application/json" \-d '{"searchQueries": ["Federated learning healthcare"],"maxItems": 50,"sortBy": "cited_by_count"}'
π― Best Use Cases
- π¬ Literature reviews β pull a full corpus on a research topic in minutes.
- π Citation tracking β monitor how a paper or author cluster grows over time.
- π§ͺ Trend detection β slice by
recent_5_yearsto spot emerging directions. - π Library / EdTech tools β feed clean, normalised records into your platform.
- π€ AI agents β give RAG/LLM pipelines high-quality academic context.
πΈ Pricing
This Actor is best deployed under the Pay-per-event (PPE) model:
- One event = one paper pushed to the dataset (
apify-default-dataset-item). - No surprise compute charges, no rental β you pay for results, not waiting.
- Free 5-second startup included by Apify on every run.
Configure the exact event prices in the Apify Console β Publication β Monetization tab.
β Frequently Asked Questions
Q: Do I need a Google Scholar account? No. We connect to OpenAlex + Semantic Scholar β both are open scholarly knowledge graphs.
Q: How fresh is the data? OpenAlex syncs daily with Crossref, DOAJ, PubMed and others. Most papers appear within 24 β 48 h of publication.
Q: Will I get blocked? Unlikely β the actor uses official, rate-limit-friendly APIs and auto-escalates through Datacenter β Residential proxies if a host ever pushes back.
Q: Can I pass full Scholar URLs instead of keywords?
Yes. URLs like https://scholar.google.com/scholar?q=... are auto-parsed for the q= term.
Q: Why two views in the output? The Overview view is great for quick scanning. The Full Details view is the complete record β same data, more columns.
π Support & Feedback
Found a bug or have a feature request? Open an issue or message us through the Apify Store page. We respond fast.
βοΈ Cautions / Legal
- Data is collected only from publicly available sources (OpenAlex, Semantic Scholar).
- You are responsible for downstream use that complies with GDPR/CCPA, target ToS, and copyright.
- Respect rate-limits and
robots.txtβ being a good citizen reduces blocks too.