πŸ” Google Scholar Scraper avatar

πŸ” Google Scholar Scraper

Pricing

from $3.99 / 1,000 results

Go to Apify Store
πŸ” Google Scholar Scraper

πŸ” Google Scholar Scraper

Pricing

from $3.99 / 1,000 results

Rating

0.0

(0)

Developer

Scrapium

Scrapium

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

a day ago

Last modified

Share

πŸ“š Google Scholar Scraper

A blazing-fast, production-grade Apify Actor that pulls academic papers from the global Scholar knowledge graph (OpenAlex + Semantic Scholar) and delivers clean, structured JSON ready for analysis, citation review, or literature dashboards.

Bulk in. Citations out. Throw a list of keywords or Google Scholar URLs and walk away β€” the Actor does the heavy lifting.


πŸš€ Why Choose This Actor?

  • 🧠 Multi-source intelligence β€” combines OpenAlex (250 M+ works) and Semantic Scholar so you never miss a paper.
  • 🌐 Smart auto-escalating proxy β€” starts direct, falls back to Datacenter β†’ Residential only when needed. You don't have to think about it.
  • ⚑ Live streaming results β€” each paper hits the dataset the moment it's scraped. A crash mid-run still leaves you with rows.
  • 🧹 Built-in deduplication, filters, and sort β€” citations, recency, open-access, article-type filters out of the box.
  • πŸͺΆ Light & fast β€” no headless browser, no Playwright overhead β€” just well-engineered HTTP calls.
  • πŸ’Έ Pay only for what you use β€” no hidden compute time waste.

✨ Key Features

  • πŸ”Ž Bulk search β€” submit dozens of queries / Scholar URLs at once.
  • πŸ“₯ Up to 5 000 papers per query with cursor-based pagination.
  • 🏷️ Rich metadata β€” title, authors, year, citations, source, PDF link, abstract snippet, etc.
  • πŸ›‘οΈ Auto-rotating proxies with sticky residential mode after escalation.
  • πŸ“Š Two pre-configured dataset views β€” Overview (essentials) + Full Details (everything).
  • πŸ“ Per-query sectioning β€” every record carries a query field so you can split results by topic in seconds.

βš™οΈ Input

FieldTypeDescription
searchQueries ✱array of stringsSearch keywords or Scholar URLs (e.g. https://scholar.google.com/scholar?q=...). Required.
maxItemsinteger (1 – 5000)Max papers per query. Default 100.
sortByenumrelevance (default) | cited_by_count
filterenumall (default) | has_pdf | open_access | recent_5_years
articleTypeenumany (default) | journal | conference | book | preprint
proxyConfigurationobjectOptional. Defaults to no proxy β€” the actor will auto-escalate to Datacenter/Residential on rate-limits.

Example input

{
"searchQueries": [
"Tomato Shelf Life Prediction using IoT and Machine Learning",
"Federated learning healthcare"
],
"maxItems": 100,
"sortBy": "cited_by_count",
"filter": "open_access",
"articleType": "journal",
"proxyConfiguration": { "useApifyProxy": false }
}

πŸ“¦ Output

Each dataset row matches the well-known Scholar / SerpAPI-style shape:

{
"query": "Tomato Shelf Life Prediction using IoT and Machine Learning",
"cidCode": "W4409060190",
"didCode": "W4409060190",
"lidCode": "",
"aidCode": "W4409060190",
"resultIndex": 0,
"type": "ARTICLE",
"title": "Tomato Shelf Life Prediction using IoT and Machine Learning",
"link": "https://doi.org/10.1109/iciset62123.2024.10939467",
"documentLink": "",
"documentType": "",
"fullAttribution": "Nazmul Arafin Naim, Raisul Islam, Mohammed Saifuddin, ... - , 2024",
"authors": "Nazmul Arafin Naim, Raisul Islam, Mohammed Saifuddin, ...",
"publication": "",
"year": 2024,
"source": "",
"searchMatch": "Predicting tomato shelf life is crucial for ...",
"citations": 1,
"citationsLink": "https://openalex.org/W4409060190",
"relatedArticlesLink": "https://openalex.org/W4409060190",
"versions": 1,
"versionsLink": "https://openalex.org/W4409060190"
}
FieldMeaning
queryOriginal query that produced this row (lets you group sections).
cidCode / didCode / aidCodeStable record identifiers (OpenAlex ID or hash).
resultIndexPosition within that query's result set.
titlePaper title.
authorsUp to five lead authors.
publication / sourceJournal / venue name.
yearPublication year.
citationsTotal citation count.
documentLink / documentTypeDirect PDF/OA URL when available.
searchMatchAbstract snippet (first ~300 chars).
citationsLink / relatedArticlesLink / versionsLinkApify-friendly clickable links.

πŸš€ How to Use (Apify Console)

  1. Log in at https://console.apify.com β†’ Actors.
  2. Open Google Scholar Scraper.
  3. Paste your queries (or Scholar URLs) into Search Queries.
  4. Tune maxItems, sortBy, filter, articleType to taste.
  5. Leave Proxy on its default (no proxy) β€” the Actor auto-escalates on rate-limits.
  6. Click β–Ά Start.
  7. Watch the live log β€” every section reports progress in real time.
  8. Open the Output tab and export to JSON / CSV / XLSX.

πŸ€– Use via API

curl -X POST "https://api.apify.com/v2/acts/<ACTOR_ID>/run-sync-get-dataset-items?token=$APIFY_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"searchQueries": ["Federated learning healthcare"],
"maxItems": 50,
"sortBy": "cited_by_count"
}'

🎯 Best Use Cases

  • πŸ”¬ Literature reviews β€” pull a full corpus on a research topic in minutes.
  • πŸ“ˆ Citation tracking β€” monitor how a paper or author cluster grows over time.
  • πŸ§ͺ Trend detection β€” slice by recent_5_years to spot emerging directions.
  • πŸ“š Library / EdTech tools β€” feed clean, normalised records into your platform.
  • πŸ€– AI agents β€” give RAG/LLM pipelines high-quality academic context.

πŸ’Έ Pricing

This Actor is best deployed under the Pay-per-event (PPE) model:

  • One event = one paper pushed to the dataset (apify-default-dataset-item).
  • No surprise compute charges, no rental β€” you pay for results, not waiting.
  • Free 5-second startup included by Apify on every run.

Configure the exact event prices in the Apify Console β†’ Publication β†’ Monetization tab.


❓ Frequently Asked Questions

Q: Do I need a Google Scholar account? No. We connect to OpenAlex + Semantic Scholar β€” both are open scholarly knowledge graphs.

Q: How fresh is the data? OpenAlex syncs daily with Crossref, DOAJ, PubMed and others. Most papers appear within 24 – 48 h of publication.

Q: Will I get blocked? Unlikely β€” the actor uses official, rate-limit-friendly APIs and auto-escalates through Datacenter β†’ Residential proxies if a host ever pushes back.

Q: Can I pass full Scholar URLs instead of keywords? Yes. URLs like https://scholar.google.com/scholar?q=... are auto-parsed for the q= term.

Q: Why two views in the output? The Overview view is great for quick scanning. The Full Details view is the complete record β€” same data, more columns.


πŸ›Ÿ Support & Feedback

Found a bug or have a feature request? Open an issue or message us through the Apify Store page. We respond fast.


  • Data is collected only from publicly available sources (OpenAlex, Semantic Scholar).
  • You are responsible for downstream use that complies with GDPR/CCPA, target ToS, and copyright.
  • Respect rate-limits and robots.txt β€” being a good citizen reduces blocks too.