OpenAlex Scholarly Works Scraper avatar

OpenAlex Scholarly Works Scraper

Pricing

Pay per event

Go to Apify Store
OpenAlex Scholarly Works Scraper

OpenAlex Scholarly Works Scraper

Export academic works, authors, institutions, sources, and concepts from OpenAlexs open catalog of 250M+ scholarly records. Successor to Microsoft Academic Graph. Filter by author, concept, year, open access status, or affiliation.

Pricing

Pay per event

Rating

5.0

(1)

Developer

ParseForge

ParseForge

Maintained by Community

Actor stats

0

Bookmarked

7

Total users

1

Monthly active users

3 hours ago

Last modified

Share

ParseForge Banner

🎓 OpenAlex Scholarly Works Scraper

🚀 Export academic works, authors, institutions, and more from OpenAlex in seconds. Filter by search query, entity type, or custom filters. No coding, no API keys required.

🕒 Last updated: 2026-04-16 · 📊 30+ fields · 🔄 Runs on Apify cloud or locally · 📁 Export: JSON, CSV, Excel

The OpenAlex Scholarly Works Scraper connects to OpenAlex, the free and open catalog of 250M+ scholarly records that succeeded Microsoft Academic Graph. It supports 7 entity types: works, authors, institutions, sources, concepts, publishers, and funders. Each record includes 30+ structured fields with titles, DOIs, citation counts, open access status, author details, institutional affiliations, and more. Whether you need 10 papers for a quick lookup or millions of records for a large-scale bibliometric study, this tool handles it efficiently.

Built for researchers conducting literature reviews, bibliometricians analyzing citation networks, university administrators tracking institutional output, and data teams building scholarly knowledge graphs. The scraper uses the OpenAlex API with support for free-text search and the full OpenAlex filter syntax. Providing a contact email puts your requests in the "polite pool" for faster processing.

Target AudienceUse Cases
Academic ResearchersLiterature reviews, citation analysis
BibliometriciansCitation network mapping, impact studies
University AdministratorsInstitutional output tracking
Data ScientistsKnowledge graph construction, NLP corpus building
Funding AgenciesResearch output assessment, grant evaluation
Library ScientistsCollection development, trend analysis

📋 What the OpenAlex Scholarly Works Scraper does

  • 📝 Extracts scholarly work metadata including titles, abstracts, DOIs, publication dates, and citation counts for bibliometric analysis
  • 👥 Collects author profiles with names, ORCID IDs, institutional affiliations, and publication histories
  • 🏫 Gathers institution data including names, types, locations, and research output statistics
  • 📰 Pulls source information for journals, conferences, and repositories with ISSN, publisher, and open access details
  • 🔗 Captures concept and topic data for subject classification and research trend analysis
  • 📊 Tracks open access status with OA type, OA URL, and license information for each work

The scraper queries the OpenAlex API with your search terms and optional filters, handles cursor-based pagination, and processes results efficiently. The OpenAlex filter syntax supports field-level filtering like publication_year:2024,is_oa:true,authorships.institutions.country_code:US for precise targeting.

💡 Why it matters: OpenAlex is the largest free scholarly database, covering 250M+ works, 90M+ authors, and 100K+ institutions. This scraper gives you structured access to this data without writing API integration code.


🎬 Full Demo

🚧 Coming soon...


⚙️ Input

FieldTypeRequiredDescription
maxItemsintegerNoMaximum records to collect. Free users: limited to 10. Paid users: up to 1,000,000.
entitystringNoEntity type: works, authors, institutions, sources, concepts, publishers, or funders.
searchstringNoFree text search across titles, abstracts, and display names.
filterstringNoOpenAlex filter string (e.g., "publication_year:2024,is_oa:true").
emailstringNoContact email for OpenAlex "polite pool" (faster processing). Optional.

Example 1: Search for machine learning papers

{
"entity": "works",
"search": "machine learning",
"maxItems": 100
}

Example 2: Open access papers from US institutions in 2024

{
"entity": "works",
"search": "climate change",
"filter": "publication_year:2024,is_oa:true,authorships.institutions.country_code:US",
"maxItems": 500,
"email": "researcher@university.edu"
}

⚠️ Good to Know: Providing your email address puts your requests in OpenAlex's "polite pool" for faster rate limits. The filter syntax supports dozens of fields. Free users are automatically limited to 10 items per run.


📊 Output

🧾 Schema

EmojiFieldTypeDescription
📝titlestringWork title or entity display name
🆔idstringOpenAlex ID
🔗doistringDigital Object Identifier (works)
🌐urlstringOpenAlex URL
📅publicationDatestringPublication date (works)
📅publicationYearnumberPublication year
👥authorsarrayAuthor names and affiliations
📊citationCountnumberTotal citations received
📊citedByCountnumberNumber of citing works
📖abstractstringArticle abstract (when available)
📰sourcestringJournal or venue name
🔓isOpenAccessbooleanWhether the work is open access
🔓oaTypestringOA type (gold, green, bronze, hybrid)
🔗oaUrlstringURL to free version
⚖️licensestringLicense type
🏷️conceptsarrayAssociated concepts/topics
🏫institutionsarrayAuthor institutions
🌍countriesarrayAuthor country codes
📊referencedWorksCountnumberNumber of references
📊relatedWorksCountnumberNumber of related works
🔢volumestringJournal volume
🔢issuestringJournal issue
📄pagesstringPage range
🏷️typestringWork type (article, book, etc.)
🔢orcidstringAuthor ORCID ID (authors entity)
🏫affiliationstringCurrent affiliation (authors)
📊worksCountnumberTotal works (authors/institutions)
📊hIndexnumberH-index (authors)
📅scrapedAtstringData collection timestamp
errorstringError message if extraction failed

📦 Sample records


✨ Why choose this Actor

FeatureDetails
📊 250M+ recordsAccess the largest free scholarly database
🔍 7 entity typesWorks, authors, institutions, sources, concepts, publishers, funders
🔓 Open access trackingOA status, type, URL, and license for every work
📊 Citation metricsCitation counts, h-index, and referenced works
🔧 Advanced filtersFull OpenAlex filter syntax for precise queries
📁 Multiple export formatsJSON, CSV, Excel for any workflow
⚡ Polite pool supportProvide email for faster processing

📈 Typical performance: Collects 500+ records per minute in polite pool mode. A dataset of 10,000 works takes roughly 20 minutes.


📈 How it compares to alternatives

FeatureThis ActorDirect API IntegrationGeneric Scrapers
30+ structured fields per record✅ (requires coding)Partial
7 entity types in one tool✅ (requires coding)
No coding required
Export to CSV/JSON/Excel❌ (raw JSON)Partial
Automatic paginationManualPartial
Scheduled runsCustom setupPartial
Filter syntax support

All the features of the OpenAlex API, without writing a single line of code.


🚀 How to use

  1. Create a free Apify account - Sign up here (includes free credits)
  2. Open the OpenAlex Scholarly Works Scraper - Navigate to the Actor page and click "Start"
  3. Choose your entity type - Select works, authors, institutions, or another entity type
  4. Set your search and filters - Enter a search query and optional OpenAlex filters
  5. Run and download - Click "Start", wait for completion, then export as JSON, CSV, or Excel

⏱️ First results appear in under 10 seconds. A typical run of 100 records completes in about 30 seconds.


💼 Business use cases

Academic Research

  • Build citation network datasets
  • Track research trends by topic over time
  • Find collaborators at specific institutions
  • Monitor open access adoption in your field

University Administration

  • Track institutional research output
  • Benchmark against peer institutions
  • Generate faculty publication reports
  • Monitor author h-indexes and citation impact

Data Science & AI

  • Build scholarly knowledge graphs
  • Create NLP training corpora from abstracts
  • Analyze collaboration patterns
  • Train topic classification models

Funding & Policy

  • Assess research output for grant evaluation
  • Track funded research productivity
  • Analyze open access compliance rates
  • Map research activity by country and institution

🔌 Automating OpenAlex Scholarly Works Scraper

Node.js

import { ApifyClient } from 'apify-client';
const client = new ApifyClient({ token: 'YOUR_API_TOKEN' });
const run = await client.actor("parseforge/openalex-scraper").call({
entity: "works",
search: "machine learning",
filter: "publication_year:2024,is_oa:true",
maxItems: 200
});
const { items } = await client.dataset(run.defaultDatasetId).listItems();
console.log(items);

Python

from apify_client import ApifyClient
client = ApifyClient("YOUR_API_TOKEN")
run = client.actor("parseforge/openalex-scraper").call(run_input={
"entity": "works",
"search": "machine learning",
"filter": "publication_year:2024,is_oa:true",
"maxItems": 200
})
items = list(client.dataset(run["defaultDatasetId"]).iterate_items())
print(items)

Schedules: Set up weekly or monthly runs with Apify Schedules to track new publications, monitor citation growth, or maintain up-to-date researcher profiles.


❓ Frequently Asked Questions


🔌 Integrate with any app

  • 🔗 Make (Integromat) - Connect OpenAlex data to 1,000+ apps with visual workflows
  • 🔗 Zapier - Trigger actions when new scholarly records match your criteria
  • 🔗 Slack - Get notifications when new papers are published in your field
  • 🔗 Airbyte - Sync scholarly data to your data warehouse
  • 🔗 GitHub - Automate research data pipelines with GitHub Actions
  • 🔗 Google Drive - Export scholarly data directly to Google Sheets

ActorDescription
📚 PubMed Citation ScraperExtract citation data and metadata from PubMed biomedical literature
📖 PLOS Journals ScraperCollect article data from PLOS ONE and other PLOS journals
🧬 Crossref ScraperCollect DOI metadata and citation information from Crossref
📰 medRxiv ScraperExtract health sciences preprint data from medRxiv
📄 Semantic Scholar ScraperQuery the Semantic Scholar API for academic paper data

💡 Pro Tip: Use OpenAlex to find papers by topic, then cross-reference with the Crossref Scraper for detailed citation metadata and reference lists.


🆘 Need Help? Open our contact form and we will get back to you within 24 hours. For bug reports, feature requests, or integration help, we are here to assist.


Disclaimer: This Actor is provided as-is, without warranty. It is not affiliated with or endorsed by OpenAlex or OurResearch. Use it responsibly and in compliance with applicable terms of service. The authors are not responsible for how the collected data is used. Always verify data accuracy for critical applications.