Google Scholar Scraper avatar

Google Scholar Scraper

Try for free

No credit card required

View all Actors
Google Scholar Scraper

Google Scholar Scraper

marco.gullo/google-scholar-scraper
Try for free

No credit card required

Scrape publication details from scholar.google.com. Add your query, time range, and optionally document type (PDF or HTML only). Extract information about articles such as titles, authors, links, related articles, and more.

🎓 What is Google Scholar Scraper?

Google Scholar Scraper is a web scraping tool that enables you to quickly extract publication data from scholar.google.com. Just enter your search query and scrape publication details such as authors, article titles, citations, dates, and more.

📖 What can this Google Scholar Scraper do?

Google Scholar Scraper is a data extraction tool created to serve as an alternative to Google Scholar API. With this scraping tool, you can:

🔍 Extract publications metadata by search query

⌛️ Specify the time range for your search

📄 Filter out articles by document type: PDFs only or HTMLs only, All documents or Reviews only

📒 Set up sorting by date or relevance

⬇️ Export data in formats such as Excel, CSV, JSON, HTML

🦾 Use the API in Python and Node.js, API Endpoints, webhooks, and integrations with other apps

📕 What data can this Google Scholar Scraper extract?

Google Scholar Scraper is capable of extracting publication details such as:

📚 Document type📝 Title
🔗 Document link📄 Additional document link
🔍 Full attribution👥 Authors
📅 Publication📆 Publication year
🔍 Source🔎 Search match
📖 Citations🔗 Link to citations
🔗 Link to related articles🥉 Versions

💸 How much does it cost to scrape articles from Google Scholar?

When it comes to scraping, it can be challenging to estimate the resources needed to extract data, as use cases may vary significantly. That's why the best course of action is to run a test scrape with a small sample of input data and limited output. You’ll get your price per scrape, which you’ll then multiply by the number of scrapes you intend to do.

Apify provides you with $5 free usage credits to use every month on the Apify Free plan. That should be enough to give this scraper a test drive.

Watch this video for a few helpful tips. And don't forget that choosing a higher plan will save you money in the long run.

👨🏻‍🏫 How do I use Google Scholar Scraper to extract data?

This Google Scholar Scraper was designed for an easy start even if you've never extracted article data from the web before. Here's how you can scrape data from Google Scholar search with this tool:

  1. Create a free Apify account using your email.
  2. Open Google Scholar Scraper.
  3. Enter your search queries.
  4. Customize your search parameters, such as time range or document type.
  5. Click "Start" and wait for the data to be extracted.
  6. Export your Google Scholar data in Excel, CSV, JSON, or other formats.

You can also follow this guide on scraping Google Scholar.

⬇️ Input

The input for Google Scholar Scraper should be one search query. You can also specify additional parameters such as the time range, document type (PDFs or HTML only), sorting type, or scraping article reviews specifically.

Here's a simple input example of scraping research papers about COVID published after 2020 and sorted by date:

1{
2  "articleType": "any",
3  "enableDebugDumps": false,
4  "filter": "all",
5  "keyword": "COVID-19",
6  "maxItems": 100,
7  "newerThan": 2020,
8  "proxyOptions": {
9    "useApifyProxy": true
10  },
11  "sortBy": "date"
12}

Click on the input tab for a full explanation of input parameters.

⬆️ Output sample

The extracted Google Scholar data will be shown as a dataset which you can find in the Output tab. Note that the output will first be organized as a table for viewing convenience.

Google scholar API dataset

You can preview all the fields in the Storage and Output tabs and choose the format in which to export the Google Scholar data you've extracted: JSON, CSV, Excel, or HTML table. Here below is a sample dataset in JSON:

1{
2    "cidCode": "rvTRXmkWdSIJ",
3    "didCode": "rvTRXmkWdSIJ",
4    "lidCode": "",
5    "aidCode": "rvTRXmkWdSIJ",
6    "resultIndex": 3,
7    "type": "ARTICLE",
8    "title": "… OF THREE-TIME POINT ESTIMATION OF INFLAMMATORY MARKERS WITH THE SEVERITY AND OUTCOME IN PATIENTS OF COVID-19 IN A TERTIARY CARE …",
9    "link": "https://www.jpmi.org.pk/index.php/jpmi/article/view/3251",
10    "documentLink": "N/A",
11    "documentType": "N/A",
12    "fullAttribution": "M Hussain, S Orakzai, MM Dawood, A Ijaz… - Journal of Postgraduate …, 2024 - jpmi.org.pk",
13    "authors": "M Hussain, S Orakzai, MM Dawood, A Ijaz…",
14    "publication": "Journal of Postgraduate …",
15    "year": 2024,
16    "source": "jpmi.org.pk",
17    "searchMatch": "2 days ago - … COVID-19 Quality & Clinical Research Collaborative. C-reactive protein as a \nprognostic indicator in hospitalized patients with COVID-19. … fatalities caused by COVID-19: a …",
18    "citations": 0,
19    "citationsLink": "N/A",
20    "relatedArticlesLink": "https://scholar.google.com/scholar?q=related:rvTRXmkWdSIJ:scholar.google.com/&scioq=COVID-19&hl=en&scisbd=1&as_sdt=0,33",
21    "versions": 2,
22    "versionsLink": "https://scholar.google.com/scholar?cluster=2482915411382891694&hl=en&scisbd=1&as_sdt=0,33"
23  },
24  {
25    "cidCode": "UZ71Uw_IxggJ",
26    "didCode": "UZ71Uw_IxggJ",
27    "lidCode": "",
28    "aidCode": "UZ71Uw_IxggJ",
29    "resultIndex": 4,
30    "type": "ARTICLE",
31    "title": "Environmental Impact of Covid-19 Pandemic in Owerri Metropolis, Imo State of Nigeria",
32    "link": "https://hspublishing.org/GRES/article/view/363",
33    "documentLink": "N/A",
34    "documentType": "N/A",
35    "fullAttribution": "CV Amadi, RF Njoku-Tony - Global Research in Environment and …, 2024 - hspublishing.org",
36    "authors": "CV Amadi, RF Njoku-Tony",
37    "publication": "Global Research in Environment and …",
38    "year": 2024,
39    "source": "hspublishing.org",
40    "searchMatch": "2 days ago - … environmental impact of COVID-19 in Owerri metropolis … environmental impacts \nof COVID-19 pandemic in Owerri … environmental impact of COVID-19 pandemic in Owerri …",
41    "citations": 0,
42    "citationsLink": "N/A",
43    "relatedArticlesLink": "https://scholar.google.com/scholar?q=related:UZ71Uw_IxggJ:scholar.google.com/&scioq=COVID-19&hl=en&scisbd=1&as_sdt=0,33",
44    "versions": 0,
45    "versionsLink": "N/A"
46  },
47  {
48    "cidCode": "M3C8n-b4NGsJ",
49    "didCode": "M3C8n-b4NGsJ",
50    "lidCode": "",
51    "aidCode": "M3C8n-b4NGsJ",
52    "resultIndex": 5,
53    "type": "HTML",
54    "title": "Identification of factors affecting student academic burnout in online education during the COVID-19 pandemic using grey Delphi and grey-DEMATEL …",
55    "link": "https://www.nature.com/articles/s41598-024-53233-7",
56    "documentLink": "https://www.nature.com/articles/s41598-024-53233-7",
57    "documentType": "HTML",
58    "fullAttribution": "A Aria, P Jafari, M Behifar - Scientific Reports, 2024 - nature.com",
59    "authors": "A Aria, P Jafari, M Behifar",
60    "publication": "Scientific Reports",
61    "year": 2024,
62    "source": "nature.com",
63    "searchMatch": "2 days ago - … Although after the end of Covid-19, most educational institutions have returned \nto the … online education in the post-Covid-19 era by gaining valuable experience during the …",
64    "citations": 0,
65    "citationsLink": "N/A",
66    "relatedArticlesLink": "https://scholar.google.com/scholar?q=related:M3C8n-b4NGsJ:scholar.google.com/&scioq=COVID-19&hl=en&scisbd=1&as_sdt=0,33",
67    "versions": 0,
68    "versionsLink": "N/A"
69  },
70  {
71    "cidCode": "X68f7LOXWUoJ",
72    "didCode": "X68f7LOXWUoJ",
73    "lidCode": "",
74    "aidCode": "X68f7LOXWUoJ",
75    "resultIndex": 6,
76    "type": "ARTICLE",
77    "title": "Reframing the Service Environment in Collegiate Sport: A Transformative Sport Service Research Approach",
78    "link": "https://journals.ku.edu/jis/article/view/19739",
79    "documentLink": "N/A",
80    "documentType": "N/A",
81    "fullAttribution": "Y Yang, E Gray, K Kinoshita… - Journal of Intercollegiate …, 2024 - journals.ku.edu",
82    "authors": "Y Yang, E Gray, K Kinoshita…",
83    "publication": "Journal of Intercollegiate …",
84    "year": 2024,
85    "source": "journals.ku.edu",
86    "searchMatch": "2 days ago - This study applies a transformative sport service research approach to \nexamine student-athletes’ wellness within a collegiate sport setting. Sixteen semi-structured …",
87    "citations": 0,
88    "citationsLink": "N/A",
89    "relatedArticlesLink": "https://scholar.google.com/scholar?q=related:X68f7LOXWUoJ:scholar.google.com/&scioq=COVID-19&hl=en&scisbd=1&as_sdt=0,33",
90    "versions": 0,
91    "versionsLink": "N/A"
92  },
93  {
94    "cidCode": "1fxjSu8kPT4J",
95    "didCode": "1fxjSu8kPT4J",
96    "lidCode": "",
97    "aidCode": "1fxjSu8kPT4J",
98    "resultIndex": 7,
99    "type": "ARTICLE",
100    "title": "THE LEADERSHIP OF THE MADRASA PRINCIPAL IN ENHANCING LEARNING QUALITY AMIDST COVID-19 PANDEMIC IN CENTRAL ACEH REGENCY",
101    "link": "https://jurnal-assalam.org/index.php/JAS/article/view/703",
102    "documentLink": "N/A",
103    "documentType": "N/A",
104    "fullAttribution": "B Mizal, T Tathahira, RI Basith - Jurnal As-Salam, 2024 - jurnal-assalam.org",
105    "authors": "B Mizal, T Tathahira, RI Basith",
106    "publication": "Jurnal As-Salam",
107    "year": 2024,
108    "source": "jurnal-assalam.org",
109    "searchMatch": "2 days ago - … The COVID-19 pandemic is a scourge for education actors, especially school \nand … the quality of learning during the COVID-19 pandemic. This research is classified as …",
110    "citations": 0,
111    "citationsLink": "N/A",
112    "relatedArticlesLink": "https://scholar.google.com/scholar?q=related:1fxjSu8kPT4J:scholar.google.com/&scioq=COVID-19&hl=en&scisbd=1&as_sdt=0,33",
113    "versions": 2,
114    "versionsLink": "https://scholar.google.com/scholar?cluster=4484781414094732501&hl=en&scisbd=1&as_sdt=0,33"
115  },
116...

📚 What are other tools for scraping Google?

If you need to scrape specific data from Google Scholar, you can try these tools:

📍 Google Maps Extractor🔍 Google Search Scraper
📉 Google Trending Searches📈 Google Trends Scraper
👁 Google Lens Actor🎑 Google Image Scraper
📩 Google Maps Email Extractor🤟 Google Datasets Translator

❓FAQ

Is there an official Google Scholar API?

No, which makes researchers unable to directly access Google Scholar data using Google's APIs. Since there isn't an official way to get data from Google Scholar, people use other ways like web scraping or open-source APIs. Much like the API, web scraping tools like Google Scholar Scraper can visit the Google Scholar website, conduct a search, and extract article and author information from the pages they find.

Can I integrate Google Scholar Scraper with other apps?

Yes. This Google Scholar Scraper can be connected with almost any cloud service or web app thanks to integrations on the Apify platform. You can integrate with Make, Zapier, Slack, Airbyte, GitHub, Google Sheets, Google Drive, LangChain and more.

Or you can use webhooks to carry out an action whenever an event occurs, e.g. get a notification whenever Google Scholar Scraper successfully finishes a run.

Can I use Google Scholar Scraper as its own API?

Yes, you can use the Apify API to access Google Scholar Scraper programmatically. The API allows you to manage, schedule, and run Apify Actors, access datasets, monitor performance, get results, create and update Actor versions, and more.

To access the API using Node.js, you can use the apify-client NPM package. To access the API using Python, you can use the apify-client PyPI package.

For detailed information and code examples, refer to the Apify API documentation.

Can I use this Google Scholar API in Python?

Yes, you can use the Apify API with Python. To access the Google Scholar API with Python, use the apify-client PyPI package. You can find more details about the client in our Python Client documentation.

Not your cup of tea? Build your own Google Scholar scraper.

Google Scholar Scraper doesn’t exactly do what you need? You can always build one of your own! We have various web scraping templates in Python, JavaScript, and TypeScript to get you started. Alternatively, you can write it from scratch using our open-source library Crawlee. You can keep the scraper to yourself or make it public by adding it to Apify Store (and find users for it).

Your feedback

We’re always working on improving the performance of our Actors. So if you’ve got any technical feedback for Google Scholar Scraper or simply found a bug, please create an issue on the Actor’s Issues tab in Apify Console.

Developer
Maintained by Community

Actor Metrics

  • 117 monthly users

  • 14 stars

  • >99% runs succeeded

  • 2.1 days response time

  • Created in Feb 2024

  • Modified 2 months ago

Categories