OpenAlex Scraper
Pricing
Pay per usage
OpenAlex Scraper
Extract scholarly data from OpenAlex—titles, authors, institutions, venues, concepts—using this fast Apify actor. Get academic research in bulk via API, and export results as CSV, Excel, or HTML datasets for research, analytics, or discovery.
Pricing
Pay per usage
Rating
0.0
(0)
Developer

Shahid Irfan
Actor stats
0
Bookmarked
3
Total users
2
Monthly active users
12 days ago
Last modified
Categories
Share
Extract comprehensive academic data from OpenAlex, the largest open database of scholarly works, authors, institutions, venues, and concepts. This powerful scraper enables researchers, analysts, and developers to access millions of records for bibliometric analysis, literature reviews, and data-driven insights.
🚀 Key Features
- Multi-Entity Support: Scrape works, authors, institutions, venues, and concepts from OpenAlex
- Advanced Search & Filtering: Use powerful search queries with custom filters and sorting options
- High-Volume Data Collection: Retrieve thousands of records with automatic pagination
- Rate Limit Optimization: Polite pool access for maximum API throughput (up to 100,000 requests/day)
- Automatic Error Handling: Built-in retries and rate limit management
- Structured Data Output: Clean, consistent JSON output ready for analysis
📊 What You Can Scrape
- Works: Research papers, articles, books with full metadata, abstracts, and citations
- Authors: Researcher profiles with publication counts and institutional affiliations
- Institutions: University and research organization data with country information
- Venues: Journals, conferences, and publishers with impact metrics
- Concepts: Research topics and keywords with hierarchical relationships
🔧 Input Configuration
Configure your scraping job with these parameters:
| Parameter | Type | Description | Default |
|---|---|---|---|
search | string | Search query (title, author, institution name, etc.) | "" |
entity | select | Entity type to scrape | "works" |
results_wanted | integer | Maximum results to collect | 100 |
max_pages | integer | Maximum API pages to fetch | 10 |
email | string | Email for polite pool (higher rate limits) | "" |
filters | object | Additional API filters | {} |
sort | string | Sort order | "relevance_score:desc" |
Entity Options
works- Scholarly publicationsauthors- Researcher profilesinstitutions- Academic organizationsvenues- Publication outletsconcepts- Research topics
Example Filters
{"publication_year": "2023","cited_by_count": ">100","country_code": "US"}
📤 Output Data Structure
Works Entity Example
{"id": "https://openalex.org/W123456789","title": "Machine Learning in Healthcare: A Comprehensive Review","authors": ["Dr. Jane Smith", "Prof. John Doe"],"institutions": ["Harvard University", "MIT"],"publication_year": 2023,"doi": "10.1234/health-ml-2023","url": "https://openalex.org/W123456789","abstract": "This paper explores the applications of machine learning...","concepts": ["Machine Learning", "Healthcare", "Artificial Intelligence"],"cited_by_count": 245,"type": "journal-article","source": "openalex.org"}
Authors Entity Example
{"id": "https://openalex.org/A123456789","display_name": "Dr. Jane Smith","works_count": 87,"cited_by_count": 1250,"last_known_institution": "Harvard University","orcid": "0000-0001-2345-6789","source": "openalex.org"}
🎯 Usage Examples
Basic Research Paper Search
{"search": "machine learning healthcare","entity": "works","results_wanted": 500,"email": "your-email@example.com"}
Top Cited Authors in AI
{"entity": "authors","search": "artificial intelligence","sort": "cited_by_count:desc","results_wanted": 100,"filters": {"works_count": ">50"}}
University Research Output
{"entity": "institutions","search": "Stanford University","results_wanted": 1,"email": "researcher@university.edu"}
Trending Research Topics
{"entity": "concepts","sort": "works_count:desc","results_wanted": 50,"filters": {"level": "1"}}
⚙️ Advanced Configuration
Optimizing for Large Datasets
- Use email parameter for polite pool access
- Set appropriate
max_pagesto control API usage - Apply filters to narrow results before pagination
Rate Limiting
- Free tier: 100,000 requests/day
- Polite pool (with email): Higher priority access
- Automatic handling of rate limits with retry logic
Data Filtering Tips
- Use
publication_yearfor time-based analysis - Filter by
cited_by_countfor impact studies - Country codes for geographical research
- Concept IDs for topic-specific queries
📈 Use Cases
- Bibliometric Analysis: Track citation patterns and research impact
- Literature Reviews: Systematic collection of papers on specific topics
- Researcher Profiling: Build comprehensive author databases
- Institutional Rankings: Compare research output across organizations
- Trend Analysis: Identify emerging research areas and concepts
- Academic Network Mapping: Discover collaborations and affiliations
🔍 API Integration
This scraper uses the official OpenAlex REST API:
- Base URL:
https://api.openalex.org - Documentation: OpenAlex API Guide
- Rate Limits: 100,000 requests/day per user
- No authentication required (email optional for polite pool)
📋 Limits & Considerations
- Rate Limits: 100,000 API calls per day (higher with polite pool)
- Result Limits: Up to 10,000 results per entity type
- Data Freshness: OpenAlex updates data regularly
- Data Coverage: Over 200 million works, 15 million authors, 100,000 institutions
🤝 Contributing
Found a bug or have a feature request? Open an issue on our GitHub repository.
📄 License
This project is licensed under the MIT License - see the LICENSE file for details.
Keywords: OpenAlex scraper, academic data extraction, scholarly works API, bibliometric data, research papers scraper, author profiles, institution data, academic analytics, citation analysis, research trends


