OpenAlex Scraper
Pricing
Pay per usage
OpenAlex Scraper
Extract scholarly data from OpenAlex—titles, authors, institutions, venues, concepts—using this fast Apify actor. Get academic research in bulk via API, and export results as CSV, Excel, or HTML datasets for research, analytics, or discovery.
Pricing
Pay per usage
Rating
5.0
(1)
Developer
Shahid Irfan
Maintained by CommunityActor stats
0
Bookmarked
13
Total users
2
Monthly active users
20 days ago
Last modified
Categories
Share
OpenAlex Scholarly Data Scraper
Extract comprehensive academic data from OpenAlex, the largest open database of scholarly works, authors, institutions, venues, and concepts. Gather research papers, profiles, and metadata at scale to power your bibliometric analysis, literature reviews, and research intelligence.
Features
- Multi-Entity Extraction — Scrape works, authors, institutions, venues, and concepts in one place
- Flexible Search Capabilities — Search by title, author name, keyword, or institution details
- Automated Pagination — Retrieve thousands of records without manual page shifting
- Polite Data Scraping — Optimized performance and respectful rate limit management
- Clean Data Formatting — Get data ready in structured formats for spreadsheet and analysis tools
Use Cases
Bibliometric Analysis
Track citation patterns, monitor research impact, and visualize academic networks over time. Perfect for university departments and institutional analysis.
Literature Reviews
Systematically collect research papers, abstracts, and metadata on specific subjects to accelerate research projects and reviews.
Talent Scouting
Discover top researchers, authors, and industry experts in specific scientific domains by analyzing publication counts and citation records.
Institutional Tracking
Analyze and compare the research output, collaboration networks, and geographic distribution of academic institutions.
Input Parameters
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
search | String | No | "machine learning" | Search query (title, author, institution name, etc.) |
entity | String | No | "works" | The entity type to scrape (works, authors, institutions, venues, sources, concepts) |
results_wanted | Integer | No | 20 | Maximum number of results to collect |
max_pages | Integer | No | 10 | Maximum number of pages to fetch |
sort | String | No | "relevance_score:desc" | Sort order (e.g., 'relevance_score:desc' or 'cited_by_count:desc') |
Output Data
Each item in the dataset contains clean, structured metadata. Below are the key fields extracted depending on the entity type:
| Field | Type | Description |
|---|---|---|
id | String | Unique OpenAlex identifier |
url | String | Web URL for the entity |
title | String | Title of the work or publication |
authors | Array | Names of the authors involved |
institutions | Array | Affiliated institutions and organizations |
publication_year | Integer | Year of publication |
doi | String | Digital Object Identifier for the work |
abstract | String | Reconstructed abstract/summary text |
cited_by_count | Integer | Total number of citations |
display_name | String | Display name for authors, institutions, or venues |
works_count | Integer | Number of publications for an author/institution |
last_known_institution | String | Last known institution for an author |
orcid | String | ORCID identifier for researchers |
country_code | String | Country code for institutions |
Usage Examples
Basic Research Paper Search
Search for scholarly articles related to machine learning:
{"search": "machine learning","entity": "works","results_wanted": 20}
High-Impact Author Extraction
Find leading authors sorted by citation count:
{"search": "artificial intelligence","entity": "authors","sort": "cited_by_count:desc","results_wanted": 50}
Institutional Profiles
Retrieve specific institution details:
{"search": "Stanford University","entity": "institutions","results_wanted": 5}
Sample Output
Here is an example of a research work item extracted from the dataset:
{"id": "https://openalex.org/W2741809807","url": "https://openalex.org/W2741809807","source": "openalex.org","title": "Deep Learning","authors": ["Yann LeCun","Yoshua Bengio","Geoffrey Hinton"],"institutions": ["New York University","Université de Montréal","University of Toronto"],"publication_year": 2015,"doi": "https://doi.org/10.1038/nature14539","abstract": "Deep learning allows computational models that are composed of multiple processing layers...","concepts": ["Deep learning","Artificial intelligence","Machine learning"],"cited_by_count": 48512,"type": "journal-article"}
Tips for Best Results
Refine Your Search Queries
- Use specific terminology rather than general keywords to get highly relevant matches.
- Double check names and spelling when searching for authors and institutions.
Balance Results and Pages
- Set
results_wantedto a smaller value (e.g., 20-50) for fast tests. - Ensure
max_pagesis sufficiently high if you want to retrieve a large number of results, as each page retrieves up to 100 items.
Integrations
Connect your extracted academic data with:
- Google Sheets — Export for spreadsheets and manual analysis
- Airtable — Build research databases
- Slack — Get alerts when new papers match your criteria
- Webhooks — Automate downstream data workflows
Export Formats
Download data in multiple standard formats:
- JSON — Ready for developers and APIs
- CSV — For spreadsheet calculations
- Excel — For business reporting
- XML — For custom integrations
Frequently Asked Questions
Can I scrape journals and publication venues?
Yes, set the entity parameter to venues or sources to extract journals, conferences, and publishers.
How are abstracts reconstructed?
The scraper automatically converts the internal inverted index format into readable text, providing full-text summaries where available.
What is the limit of results I can extract?
You can retrieve thousands of records. The system handles pagination automatically to fetch up to your specified count limit.
Is an API key required to use this scraper?
No, the scraper handles all connections natively without requiring user-provided API credentials.
What should I do if a field is empty?
Some academic records may have incomplete details in the public catalog. In those cases, the corresponding fields will display as empty or null.
Support
For issues or feature requests, contact support through the Apify Console.
Resources
Legal Notice
This actor is designed for legitimate data collection purposes. Users are responsible for ensuring compliance with website terms of service and applicable laws. Use data responsibly and respect rate limits.