Google Scholar Scraper avatar

Google Scholar Scraper

Pricing

from $1.00 / 1,000 results

Go to Apify Store
Google Scholar Scraper

Google Scholar Scraper

Scrape academic papers, articles, and citations from Google Scholar. Search by keywords with filters for year range, document type, sort order, and article type. Extract titles, authors, citations, links, and more.

Pricing

from $1.00 / 1,000 results

Rating

5.0

(6)

Developer

Crawler Bros

Crawler Bros

Maintained by Community

Actor stats

7

Bookmarked

3

Total users

1

Monthly active users

3 days ago

Last modified

Share

Extract academic papers, research articles, and citation data from Google Scholar. Search by keywords with filters for year range, document type, sort order, and article type — no login or API key required. A powerful Google Scholar API alternative for researchers, academics, and data analysts.

What can this scraper do?

  • Search by keywords — Enter any research topic and get structured data for each result
  • Filter by year range — Limit results to specific publication years
  • Sort by relevance or date — Choose how results are ordered
  • Filter by document type — Get only PDF or HTML documents
  • Filter by article type — Search for review articles specifically
  • Extract citation data — Get citation counts with links to citing articles
  • Pagination support — Automatically fetch multiple pages of results

Input

FieldTypeRequiredDefaultDescription
Search Queriesstring[]YesKeywords to search on Google Scholar
Max ResultsintegerNo100Maximum articles per query (1–1,000)
Sort ByenumNoRelevanceSort by relevance or publication date
Document FormatenumNoAllFilter: all formats, PDF only, or HTML only
Article TypeenumNoAnyFilter: all types or review articles only
Published AfterintegerNoOnly articles from this year onward
Published BeforeintegerNoOnly articles up to this year
Proxy ConfigurationobjectNoProxy settings (often not needed)

Example input

{
"queries": ["machine learning", "deep learning"],
"maxItems": 50,
"sortBy": "relevance",
"newerThan": 2020
}
{
"queries": ["cancer treatment review"],
"maxItems": 30,
"articleType": "review",
"filter": "pdfOnly"
}

Output

Each row in the dataset represents one academic article or paper found in search results.

Output fields

FieldTypeExample
titlestring"Deep Learning"
linkstring"https://link.springer.com/article/..."
documentLinkstring"https://example.com/paper.pdf"
documentTypestring"PDF", "HTML", or empty
authorsstring"Y LeCun, Y Bengio, G Hinton"
publicationstring"Nature"
yearinteger2015
sourcestring"springer.com"
fullAttributionstring"Y LeCun, Y Bengio, G Hinton - Nature, 2015 - springer.com"
searchMatchstringSnippet or excerpt from the article
citationsinteger65432
citationsLinkstringLink to view all citing articles
relatedArticlesLinkstringLink to related articles on Scholar
versionsinteger12
versionsLinkstringLink to all versions of this article
typestring"ARTICLE" or "CITATION"
resultIndexinteger0 (position in results)
searchQuerystring"deep learning"
scrapeTimestampstring"2026-03-09T12:00:00+00:00"

Sample output

{
"title": "Deep Learning",
"link": "https://www.nature.com/articles/nature14539",
"documentLink": "https://creativecoding.soe.ucsc.edu/courses/cs523/slides/week3/DeepLearning_LeCun.pdf",
"documentType": "PDF",
"authors": "Y LeCun, Y Bengio, G Hinton",
"publication": "Nature",
"year": 2015,
"source": "nature.com",
"fullAttribution": "Y LeCun, Y Bengio, G Hinton - Nature, 2015 - nature.com",
"searchMatch": "Deep learning allows computational models composed of multiple processing layers to learn representations of data...",
"citations": 65432,
"citationsLink": "https://scholar.google.com/scholar?cites=...",
"relatedArticlesLink": "https://scholar.google.com/scholar?q=related:...",
"versions": 12,
"versionsLink": "https://scholar.google.com/scholar?cluster=...",
"type": "ARTICLE",
"resultIndex": 0,
"searchQuery": "deep learning",
"scrapeTimestamp": "2026-03-09T12:00:00+00:00"
}

FAQs

Do I need a Google Scholar account?

No. Google Scholar is publicly accessible and the scraper works without any authentication.

Do I need a proxy?

Often not. Google Scholar is more accessible than regular Google Search from datacenter IPs. Try running without a proxy first. If you get blocked (CAPTCHA), enable Apify proxy.

How many results can I get?

Up to 1,000 results per search query. Google Scholar shows 10 results per page, and the scraper automatically paginates through multiple pages.

Can I filter by publication year?

Yes. Use the Published After and Published Before fields to limit results to a specific year range. For example, set "Published After" to 2020 to get only recent articles.

  • link is the main article URL (journal page, abstract, etc.)
  • documentLink is a direct link to the document file (PDF or HTML) when available

What does the "citations" field contain?

The number of times this article has been cited by other papers, as reported by Google Scholar. The citationsLink field provides a direct link to see all citing articles.

Can I search for review articles only?

Yes. Set the Article Type to "Review articles only" to filter results to review papers.

What is the "type" field?

Results are either "ARTICLE" (full papers with links) or "CITATION" (references without direct links, typically older works only available as citations).

Limitations

  • Google Scholar may show CAPTCHA for high-volume requests from datacenter IPs — use proxy if this happens
  • Maximum 1,000 results per query (Google Scholar pagination limit)
  • Year filters are not effective when sorting by date
  • Citation counts and version numbers are as reported by Google Scholar and may not be perfectly up-to-date
  • The scraper extracts publicly visible search results only