OpenAlex Scholarly Works Scraper
Pricing
Pay per event
OpenAlex Scholarly Works Scraper
Export academic works, authors, institutions, sources, and concepts from OpenAlexs open catalog of 250M+ scholarly records. Successor to Microsoft Academic Graph. Filter by author, concept, year, open access status, or affiliation.
Pricing
Pay per event
Rating
5.0
(1)
Developer
ParseForge
Actor stats
0
Bookmarked
7
Total users
1
Monthly active users
3 hours ago
Last modified
Categories
Share

🎓 OpenAlex Scholarly Works Scraper
🚀 Export academic works, authors, institutions, and more from OpenAlex in seconds. Filter by search query, entity type, or custom filters. No coding, no API keys required.
🕒 Last updated: 2026-04-16 · 📊 30+ fields · 🔄 Runs on Apify cloud or locally · 📁 Export: JSON, CSV, Excel
The OpenAlex Scholarly Works Scraper connects to OpenAlex, the free and open catalog of 250M+ scholarly records that succeeded Microsoft Academic Graph. It supports 7 entity types: works, authors, institutions, sources, concepts, publishers, and funders. Each record includes 30+ structured fields with titles, DOIs, citation counts, open access status, author details, institutional affiliations, and more. Whether you need 10 papers for a quick lookup or millions of records for a large-scale bibliometric study, this tool handles it efficiently.
Built for researchers conducting literature reviews, bibliometricians analyzing citation networks, university administrators tracking institutional output, and data teams building scholarly knowledge graphs. The scraper uses the OpenAlex API with support for free-text search and the full OpenAlex filter syntax. Providing a contact email puts your requests in the "polite pool" for faster processing.
| Target Audience | Use Cases |
|---|---|
| Academic Researchers | Literature reviews, citation analysis |
| Bibliometricians | Citation network mapping, impact studies |
| University Administrators | Institutional output tracking |
| Data Scientists | Knowledge graph construction, NLP corpus building |
| Funding Agencies | Research output assessment, grant evaluation |
| Library Scientists | Collection development, trend analysis |
📋 What the OpenAlex Scholarly Works Scraper does
- 📝 Extracts scholarly work metadata including titles, abstracts, DOIs, publication dates, and citation counts for bibliometric analysis
- 👥 Collects author profiles with names, ORCID IDs, institutional affiliations, and publication histories
- 🏫 Gathers institution data including names, types, locations, and research output statistics
- 📰 Pulls source information for journals, conferences, and repositories with ISSN, publisher, and open access details
- 🔗 Captures concept and topic data for subject classification and research trend analysis
- 📊 Tracks open access status with OA type, OA URL, and license information for each work
The scraper queries the OpenAlex API with your search terms and optional filters, handles cursor-based pagination, and processes results efficiently. The OpenAlex filter syntax supports field-level filtering like publication_year:2024,is_oa:true,authorships.institutions.country_code:US for precise targeting.
💡 Why it matters: OpenAlex is the largest free scholarly database, covering 250M+ works, 90M+ authors, and 100K+ institutions. This scraper gives you structured access to this data without writing API integration code.
🎬 Full Demo
🚧 Coming soon...
⚙️ Input
| Field | Type | Required | Description |
|---|---|---|---|
| maxItems | integer | No | Maximum records to collect. Free users: limited to 10. Paid users: up to 1,000,000. |
| entity | string | No | Entity type: works, authors, institutions, sources, concepts, publishers, or funders. |
| search | string | No | Free text search across titles, abstracts, and display names. |
| filter | string | No | OpenAlex filter string (e.g., "publication_year:2024,is_oa:true"). |
| string | No | Contact email for OpenAlex "polite pool" (faster processing). Optional. |
Example 1: Search for machine learning papers
{"entity": "works","search": "machine learning","maxItems": 100}
Example 2: Open access papers from US institutions in 2024
{"entity": "works","search": "climate change","filter": "publication_year:2024,is_oa:true,authorships.institutions.country_code:US","maxItems": 500,"email": "researcher@university.edu"}
⚠️ Good to Know: Providing your email address puts your requests in OpenAlex's "polite pool" for faster rate limits. The filter syntax supports dozens of fields. Free users are automatically limited to 10 items per run.
📊 Output
🧾 Schema
| Emoji | Field | Type | Description |
|---|---|---|---|
| 📝 | title | string | Work title or entity display name |
| 🆔 | id | string | OpenAlex ID |
| 🔗 | doi | string | Digital Object Identifier (works) |
| 🌐 | url | string | OpenAlex URL |
| 📅 | publicationDate | string | Publication date (works) |
| 📅 | publicationYear | number | Publication year |
| 👥 | authors | array | Author names and affiliations |
| 📊 | citationCount | number | Total citations received |
| 📊 | citedByCount | number | Number of citing works |
| 📖 | abstract | string | Article abstract (when available) |
| 📰 | source | string | Journal or venue name |
| 🔓 | isOpenAccess | boolean | Whether the work is open access |
| 🔓 | oaType | string | OA type (gold, green, bronze, hybrid) |
| 🔗 | oaUrl | string | URL to free version |
| ⚖️ | license | string | License type |
| 🏷️ | concepts | array | Associated concepts/topics |
| 🏫 | institutions | array | Author institutions |
| 🌍 | countries | array | Author country codes |
| 📊 | referencedWorksCount | number | Number of references |
| 📊 | relatedWorksCount | number | Number of related works |
| 🔢 | volume | string | Journal volume |
| 🔢 | issue | string | Journal issue |
| 📄 | pages | string | Page range |
| 🏷️ | type | string | Work type (article, book, etc.) |
| 🔢 | orcid | string | Author ORCID ID (authors entity) |
| 🏫 | affiliation | string | Current affiliation (authors) |
| 📊 | worksCount | number | Total works (authors/institutions) |
| 📊 | hIndex | number | H-index (authors) |
| 📅 | scrapedAt | string | Data collection timestamp |
| ❌ | error | string | Error message if extraction failed |
📦 Sample records
✨ Why choose this Actor
| Feature | Details |
|---|---|
| 📊 250M+ records | Access the largest free scholarly database |
| 🔍 7 entity types | Works, authors, institutions, sources, concepts, publishers, funders |
| 🔓 Open access tracking | OA status, type, URL, and license for every work |
| 📊 Citation metrics | Citation counts, h-index, and referenced works |
| 🔧 Advanced filters | Full OpenAlex filter syntax for precise queries |
| 📁 Multiple export formats | JSON, CSV, Excel for any workflow |
| ⚡ Polite pool support | Provide email for faster processing |
📈 Typical performance: Collects 500+ records per minute in polite pool mode. A dataset of 10,000 works takes roughly 20 minutes.
📈 How it compares to alternatives
| Feature | This Actor | Direct API Integration | Generic Scrapers |
|---|---|---|---|
| 30+ structured fields per record | ✅ | ✅ (requires coding) | Partial |
| 7 entity types in one tool | ✅ | ✅ (requires coding) | ❌ |
| No coding required | ✅ | ❌ | ❌ |
| Export to CSV/JSON/Excel | ✅ | ❌ (raw JSON) | Partial |
| Automatic pagination | ✅ | Manual | Partial |
| Scheduled runs | ✅ | Custom setup | Partial |
| Filter syntax support | ✅ | ✅ | ❌ |
All the features of the OpenAlex API, without writing a single line of code.
🚀 How to use
- Create a free Apify account - Sign up here (includes free credits)
- Open the OpenAlex Scholarly Works Scraper - Navigate to the Actor page and click "Start"
- Choose your entity type - Select works, authors, institutions, or another entity type
- Set your search and filters - Enter a search query and optional OpenAlex filters
- Run and download - Click "Start", wait for completion, then export as JSON, CSV, or Excel
⏱️ First results appear in under 10 seconds. A typical run of 100 records completes in about 30 seconds.
💼 Business use cases
|
Academic Research
|
University Administration
|
|
Data Science & AI
|
Funding & Policy
|
🔌 Automating OpenAlex Scholarly Works Scraper
Node.js
import { ApifyClient } from 'apify-client';const client = new ApifyClient({ token: 'YOUR_API_TOKEN' });const run = await client.actor("parseforge/openalex-scraper").call({entity: "works",search: "machine learning",filter: "publication_year:2024,is_oa:true",maxItems: 200});const { items } = await client.dataset(run.defaultDatasetId).listItems();console.log(items);
Python
from apify_client import ApifyClientclient = ApifyClient("YOUR_API_TOKEN")run = client.actor("parseforge/openalex-scraper").call(run_input={"entity": "works","search": "machine learning","filter": "publication_year:2024,is_oa:true","maxItems": 200})items = list(client.dataset(run["defaultDatasetId"]).iterate_items())print(items)
Schedules: Set up weekly or monthly runs with Apify Schedules to track new publications, monitor citation growth, or maintain up-to-date researcher profiles.
❓ Frequently Asked Questions
🔌 Integrate with any app
- 🔗 Make (Integromat) - Connect OpenAlex data to 1,000+ apps with visual workflows
- 🔗 Zapier - Trigger actions when new scholarly records match your criteria
- 🔗 Slack - Get notifications when new papers are published in your field
- 🔗 Airbyte - Sync scholarly data to your data warehouse
- 🔗 GitHub - Automate research data pipelines with GitHub Actions
- 🔗 Google Drive - Export scholarly data directly to Google Sheets
🔗 Recommended Actors
| Actor | Description |
|---|---|
| 📚 PubMed Citation Scraper | Extract citation data and metadata from PubMed biomedical literature |
| 📖 PLOS Journals Scraper | Collect article data from PLOS ONE and other PLOS journals |
| 🧬 Crossref Scraper | Collect DOI metadata and citation information from Crossref |
| 📰 medRxiv Scraper | Extract health sciences preprint data from medRxiv |
| 📄 Semantic Scholar Scraper | Query the Semantic Scholar API for academic paper data |
💡 Pro Tip: Use OpenAlex to find papers by topic, then cross-reference with the Crossref Scraper for detailed citation metadata and reference lists.
🆘 Need Help? Open our contact form and we will get back to you within 24 hours. For bug reports, feature requests, or integration help, we are here to assist.
Disclaimer: This Actor is provided as-is, without warranty. It is not affiliated with or endorsed by OpenAlex or OurResearch. Use it responsibly and in compliance with applicable terms of service. The authors are not responsible for how the collected data is used. Always verify data accuracy for critical applications.
