OpenAlex Scraper avatar

OpenAlex Scraper

Pricing

Pay per usage

Go to Apify Store
OpenAlex Scraper

OpenAlex Scraper

Extract scholarly data from OpenAlex—titles, authors, institutions, venues, concepts—using this fast Apify actor. Get academic research in bulk via API, and export results as CSV, Excel, or HTML datasets for research, analytics, or discovery.

Pricing

Pay per usage

Rating

5.0

(1)

Developer

Shahid Irfan

Shahid Irfan

Maintained by Community

Actor stats

0

Bookmarked

13

Total users

2

Monthly active users

20 days ago

Last modified

Share

OpenAlex Scholarly Data Scraper

Extract comprehensive academic data from OpenAlex, the largest open database of scholarly works, authors, institutions, venues, and concepts. Gather research papers, profiles, and metadata at scale to power your bibliometric analysis, literature reviews, and research intelligence.


Features

  • Multi-Entity Extraction — Scrape works, authors, institutions, venues, and concepts in one place
  • Flexible Search Capabilities — Search by title, author name, keyword, or institution details
  • Automated Pagination — Retrieve thousands of records without manual page shifting
  • Polite Data Scraping — Optimized performance and respectful rate limit management
  • Clean Data Formatting — Get data ready in structured formats for spreadsheet and analysis tools

Use Cases

Bibliometric Analysis

Track citation patterns, monitor research impact, and visualize academic networks over time. Perfect for university departments and institutional analysis.

Literature Reviews

Systematically collect research papers, abstracts, and metadata on specific subjects to accelerate research projects and reviews.

Talent Scouting

Discover top researchers, authors, and industry experts in specific scientific domains by analyzing publication counts and citation records.

Institutional Tracking

Analyze and compare the research output, collaboration networks, and geographic distribution of academic institutions.


Input Parameters

ParameterTypeRequiredDefaultDescription
searchStringNo"machine learning"Search query (title, author, institution name, etc.)
entityStringNo"works"The entity type to scrape (works, authors, institutions, venues, sources, concepts)
results_wantedIntegerNo20Maximum number of results to collect
max_pagesIntegerNo10Maximum number of pages to fetch
sortStringNo"relevance_score:desc"Sort order (e.g., 'relevance_score:desc' or 'cited_by_count:desc')

Output Data

Each item in the dataset contains clean, structured metadata. Below are the key fields extracted depending on the entity type:

FieldTypeDescription
idStringUnique OpenAlex identifier
urlStringWeb URL for the entity
titleStringTitle of the work or publication
authorsArrayNames of the authors involved
institutionsArrayAffiliated institutions and organizations
publication_yearIntegerYear of publication
doiStringDigital Object Identifier for the work
abstractStringReconstructed abstract/summary text
cited_by_countIntegerTotal number of citations
display_nameStringDisplay name for authors, institutions, or venues
works_countIntegerNumber of publications for an author/institution
last_known_institutionStringLast known institution for an author
orcidStringORCID identifier for researchers
country_codeStringCountry code for institutions

Usage Examples

Search for scholarly articles related to machine learning:

{
"search": "machine learning",
"entity": "works",
"results_wanted": 20
}

High-Impact Author Extraction

Find leading authors sorted by citation count:

{
"search": "artificial intelligence",
"entity": "authors",
"sort": "cited_by_count:desc",
"results_wanted": 50
}

Institutional Profiles

Retrieve specific institution details:

{
"search": "Stanford University",
"entity": "institutions",
"results_wanted": 5
}

Sample Output

Here is an example of a research work item extracted from the dataset:

{
"id": "https://openalex.org/W2741809807",
"url": "https://openalex.org/W2741809807",
"source": "openalex.org",
"title": "Deep Learning",
"authors": [
"Yann LeCun",
"Yoshua Bengio",
"Geoffrey Hinton"
],
"institutions": [
"New York University",
"Université de Montréal",
"University of Toronto"
],
"publication_year": 2015,
"doi": "https://doi.org/10.1038/nature14539",
"abstract": "Deep learning allows computational models that are composed of multiple processing layers...",
"concepts": [
"Deep learning",
"Artificial intelligence",
"Machine learning"
],
"cited_by_count": 48512,
"type": "journal-article"
}

Tips for Best Results

Refine Your Search Queries

  • Use specific terminology rather than general keywords to get highly relevant matches.
  • Double check names and spelling when searching for authors and institutions.

Balance Results and Pages

  • Set results_wanted to a smaller value (e.g., 20-50) for fast tests.
  • Ensure max_pages is sufficiently high if you want to retrieve a large number of results, as each page retrieves up to 100 items.

Integrations

Connect your extracted academic data with:

  • Google Sheets — Export for spreadsheets and manual analysis
  • Airtable — Build research databases
  • Slack — Get alerts when new papers match your criteria
  • Webhooks — Automate downstream data workflows

Export Formats

Download data in multiple standard formats:

  • JSON — Ready for developers and APIs
  • CSV — For spreadsheet calculations
  • Excel — For business reporting
  • XML — For custom integrations

Frequently Asked Questions

Can I scrape journals and publication venues?

Yes, set the entity parameter to venues or sources to extract journals, conferences, and publishers.

How are abstracts reconstructed?

The scraper automatically converts the internal inverted index format into readable text, providing full-text summaries where available.

What is the limit of results I can extract?

You can retrieve thousands of records. The system handles pagination automatically to fetch up to your specified count limit.

Is an API key required to use this scraper?

No, the scraper handles all connections natively without requiring user-provided API credentials.

What should I do if a field is empty?

Some academic records may have incomplete details in the public catalog. In those cases, the corresponding fields will display as empty or null.


Support

For issues or feature requests, contact support through the Apify Console.

Resources


This actor is designed for legitimate data collection purposes. Users are responsible for ensuring compliance with website terms of service and applicable laws. Use data responsibly and respect rate limits.