OpenAlex Scraper avatar
OpenAlex Scraper

Pricing

Pay per usage

Go to Apify Store
OpenAlex Scraper

OpenAlex Scraper

Extract scholarly data from OpenAlex—titles, authors, institutions, venues, concepts—using this fast Apify actor. Get academic research in bulk via API, and export results as CSV, Excel, or HTML datasets for research, analytics, or discovery.

Pricing

Pay per usage

Rating

0.0

(0)

Developer

Shahid Irfan

Shahid Irfan

Maintained by Community

Actor stats

0

Bookmarked

3

Total users

2

Monthly active users

12 days ago

Last modified

Share

Extract comprehensive academic data from OpenAlex, the largest open database of scholarly works, authors, institutions, venues, and concepts. This powerful scraper enables researchers, analysts, and developers to access millions of records for bibliometric analysis, literature reviews, and data-driven insights.

🚀 Key Features

  • Multi-Entity Support: Scrape works, authors, institutions, venues, and concepts from OpenAlex
  • Advanced Search & Filtering: Use powerful search queries with custom filters and sorting options
  • High-Volume Data Collection: Retrieve thousands of records with automatic pagination
  • Rate Limit Optimization: Polite pool access for maximum API throughput (up to 100,000 requests/day)
  • Automatic Error Handling: Built-in retries and rate limit management
  • Structured Data Output: Clean, consistent JSON output ready for analysis

📊 What You Can Scrape

  • Works: Research papers, articles, books with full metadata, abstracts, and citations
  • Authors: Researcher profiles with publication counts and institutional affiliations
  • Institutions: University and research organization data with country information
  • Venues: Journals, conferences, and publishers with impact metrics
  • Concepts: Research topics and keywords with hierarchical relationships

🔧 Input Configuration

Configure your scraping job with these parameters:

ParameterTypeDescriptionDefault
searchstringSearch query (title, author, institution name, etc.)""
entityselectEntity type to scrape"works"
results_wantedintegerMaximum results to collect100
max_pagesintegerMaximum API pages to fetch10
emailstringEmail for polite pool (higher rate limits)""
filtersobjectAdditional API filters{}
sortstringSort order"relevance_score:desc"

Entity Options

  • works - Scholarly publications
  • authors - Researcher profiles
  • institutions - Academic organizations
  • venues - Publication outlets
  • concepts - Research topics

Example Filters

{
"publication_year": "2023",
"cited_by_count": ">100",
"country_code": "US"
}

📤 Output Data Structure

Works Entity Example

{
"id": "https://openalex.org/W123456789",
"title": "Machine Learning in Healthcare: A Comprehensive Review",
"authors": ["Dr. Jane Smith", "Prof. John Doe"],
"institutions": ["Harvard University", "MIT"],
"publication_year": 2023,
"doi": "10.1234/health-ml-2023",
"url": "https://openalex.org/W123456789",
"abstract": "This paper explores the applications of machine learning...",
"concepts": ["Machine Learning", "Healthcare", "Artificial Intelligence"],
"cited_by_count": 245,
"type": "journal-article",
"source": "openalex.org"
}

Authors Entity Example

{
"id": "https://openalex.org/A123456789",
"display_name": "Dr. Jane Smith",
"works_count": 87,
"cited_by_count": 1250,
"last_known_institution": "Harvard University",
"orcid": "0000-0001-2345-6789",
"source": "openalex.org"
}

🎯 Usage Examples

{
"search": "machine learning healthcare",
"entity": "works",
"results_wanted": 500,
"email": "your-email@example.com"
}

Top Cited Authors in AI

{
"entity": "authors",
"search": "artificial intelligence",
"sort": "cited_by_count:desc",
"results_wanted": 100,
"filters": {
"works_count": ">50"
}
}

University Research Output

{
"entity": "institutions",
"search": "Stanford University",
"results_wanted": 1,
"email": "researcher@university.edu"
}
{
"entity": "concepts",
"sort": "works_count:desc",
"results_wanted": 50,
"filters": {
"level": "1"
}
}

⚙️ Advanced Configuration

Optimizing for Large Datasets

  • Use email parameter for polite pool access
  • Set appropriate max_pages to control API usage
  • Apply filters to narrow results before pagination

Rate Limiting

  • Free tier: 100,000 requests/day
  • Polite pool (with email): Higher priority access
  • Automatic handling of rate limits with retry logic

Data Filtering Tips

  • Use publication_year for time-based analysis
  • Filter by cited_by_count for impact studies
  • Country codes for geographical research
  • Concept IDs for topic-specific queries

📈 Use Cases

  • Bibliometric Analysis: Track citation patterns and research impact
  • Literature Reviews: Systematic collection of papers on specific topics
  • Researcher Profiling: Build comprehensive author databases
  • Institutional Rankings: Compare research output across organizations
  • Trend Analysis: Identify emerging research areas and concepts
  • Academic Network Mapping: Discover collaborations and affiliations

🔍 API Integration

This scraper uses the official OpenAlex REST API:

  • Base URL: https://api.openalex.org
  • Documentation: OpenAlex API Guide
  • Rate Limits: 100,000 requests/day per user
  • No authentication required (email optional for polite pool)

📋 Limits & Considerations

  • Rate Limits: 100,000 API calls per day (higher with polite pool)
  • Result Limits: Up to 10,000 results per entity type
  • Data Freshness: OpenAlex updates data regularly
  • Data Coverage: Over 200 million works, 15 million authors, 100,000 institutions

🤝 Contributing

Found a bug or have a feature request? Open an issue on our GitHub repository.

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.


Keywords: OpenAlex scraper, academic data extraction, scholarly works API, bibliometric data, research papers scraper, author profiles, institution data, academic analytics, citation analysis, research trends