Openalex Scraper
Pricing
Pay per event
Openalex Scraper
Extract research papers from OpenAlex — titles, authors, citations, institutions, and open access links.
Pricing
Pay per event
Rating
0.0
(0)
Developer
Stas Persiianenko
Actor stats
0
Bookmarked
4
Total users
3
Monthly active users
11 hours ago
Last modified
Share
OpenAlex Academic Papers Scraper
Search OpenAlex — the world's largest open catalog of academic research — and extract structured data for papers, authors, citations, and institutions across 250M+ works.
What does OpenAlex Academic Papers Scraper do?
This actor searches the OpenAlex database and returns detailed metadata for academic research papers. OpenAlex indexes over 250 million scholarly works from all fields of research. For each paper, it extracts:
- Bibliographic data: title, DOI, publication date, journal, volume, issue, pages
- Author details: names, ORCID IDs, institutional affiliations, country codes
- Citation metrics: cited-by count, number of references
- Open access: OA status, free PDF links, license information
- Abstracts: full abstract text reconstructed from OpenAlex inverted index
- Topics & keywords: research topics and keyword classifications
- Source metadata: journal name, ISSN, publisher
Why use OpenAlex Academic Papers Scraper?
- 250M+ works — the largest open academic database, successor to Microsoft Academic Graph
- No API key needed — OpenAlex is completely free and open
- Rich filtering — filter by year, citation count, and open access status
- Full abstracts — reconstructed from OpenAlex's inverted index format
- Citation sorting — find the most influential papers in any field
- Author & institution data — ORCID IDs and institutional affiliations included
- Structured output — clean JSON ready for analysis or integration
Use cases
- Literature reviews: Find the most cited papers on any research topic
- Research trend analysis: Track publication volume and citation patterns over time
- Academic evaluation: Analyze citation impact for researchers and institutions
- Competitive intelligence: Monitor competitor research output and focus areas
- Patent analysis: Cross-reference academic publications with patent portfolios
- Grant applications: Support funding proposals with citation and impact data
- Pharma R&D: Systematic literature reviews for drug development
- VC due diligence: Evaluate research depth behind deep-tech startups
How to scrape academic papers from OpenAlex
- Go to OpenAlex Academic Papers Scraper on Apify Store.
- Enter one or more search queries (e.g., "machine learning", "CRISPR gene editing").
- Optionally filter by publication year, minimum citations, or open access.
- Choose sort order: relevance, most cited, or newest first.
- Set maximum results per query (1–500).
- Click Start and download your data as JSON, CSV, or Excel.
Input parameters
| Parameter | Type | Description |
|---|---|---|
searchQueries | Array | Search terms for papers (required). Example: "machine learning", "CRISPR" |
publicationYear | String | Year filter: single year ("2024") or range ("2020-2025") |
minCitations | Integer | Only papers with at least this many citations (default: 0) |
openAccessOnly | Boolean | Only return open access papers (default: false) |
sortBy | String | Sort by "relevance", "cited_by_count", or "publication_date" |
maxResults | Integer | Max papers per query, 1–500 (default: 50) |
Output example
Each paper in the dataset contains these fields:
{"openAlexId": "W2741809807","doi": "10.48550/arxiv.1706.03762","title": "Attention Is All You Need","publicationYear": 2025,"publicationDate": "2025-06-12","type": "article","language": "en","authors": "Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones","authorDetails": [{"name": "Ashish Vaswani","orcid": "","institution": "Google","country": "US","position": "first"}],"journalName": "Advances in Neural Information Processing Systems","journalIssn": "1049-5258","publisher": "Neural Information Processing Systems Foundation","volume": "30","issue": "","firstPage": "","lastPage": "","citedByCount": 6494,"referencedWorksCount": 38,"isOpenAccess": true,"openAccessUrl": "https://arxiv.org/pdf/1706.03762","openAccessStatus": "gold","abstractText": "The dominant sequence transduction models are based on complex recurrent or convolutional neural networks...","topics": ["Natural Language Processing Techniques", "Topic Modeling"],"keywords": ["attention mechanism", "transformer", "neural machine translation"],"openAlexUrl": "https://openalex.org/W2741809807","doiUrl": "https://doi.org/10.48550/arxiv.1706.03762","pdfUrl": "https://arxiv.org/pdf/1706.03762","searchQuery": "transformer attention mechanism","relevanceScore": 0.95,"scrapedAt": "2026-03-03T12:00:00.000Z"}
How much does it cost to scrape OpenAlex?
OpenAlex Academic Papers Scraper uses a pay-per-event pricing model:
| Event | Price |
|---|---|
| Run started | $0.001 |
| Per paper extracted | $0.001 |
Cost examples:
- 50 papers: $0.001 + (50 × $0.001) = $0.051
- 200 papers: $0.001 + (200 × $0.001) = $0.201
- 500 papers: $0.001 + (500 × $0.001) = $0.501
API usage
Node.js
import { ApifyClient } from 'apify-client';const client = new ApifyClient({ token: 'YOUR_API_TOKEN' });const run = await client.actor('YOUR_USERNAME/openalex-scraper').call({searchQueries: ['machine learning'],publicationYear: '2023-2025',minCitations: 100,sortBy: 'cited_by_count',maxResults: 50,});const { items } = await client.dataset(run.defaultDatasetId).listItems();items.forEach(paper => {console.log(`${paper.citedByCount} cites | ${paper.title}`);});
Python
from apify_client import ApifyClientclient = ApifyClient('YOUR_API_TOKEN')run = client.actor('YOUR_USERNAME/openalex-scraper').call(run_input={'searchQueries': ['machine learning'],'publicationYear': '2023-2025','minCitations': 100,'sortBy': 'cited_by_count','maxResults': 50,})dataset = client.dataset(run['defaultDatasetId']).list_items().itemsfor paper in dataset:print(f"{paper['citedByCount']} cites | {paper['title']}")
Integrations
Connect OpenAlex Scraper with other tools using Apify integrations:
- Google Sheets — Export citation data to spreadsheets for analysis
- Slack / Email — Get alerts when new papers match your search criteria
- Webhooks — Trigger downstream processing when extraction completes
- Zapier / Make — Connect to 5,000+ apps for automated research workflows
- Amazon S3 / Google Cloud — Archive large literature datasets
Tips and best practices
- Use specific search terms — "transformer attention mechanism" returns more relevant results than just "AI"
- Filter by citations — set
minCitationsto find influential, well-cited papers - Year ranges — use "2020-2025" format to focus on recent research
- Open access filter — enable
openAccessOnlyto get papers with free PDF downloads - Sort by citations — "cited_by_count" surfaces the most impactful papers first
- Multiple queries — search for multiple topics in a single run to compare across fields
- Abstracts — OpenAlex stores abstracts as inverted indexes; this actor reconstructs the full text automatically
Data source
All data comes from OpenAlex, a free and open catalog of the world's scholarly research. OpenAlex indexes over 250 million works from journals, conference proceedings, preprints, and other academic sources. Data is updated daily and available under CC0 (public domain).
Use with AI agents via MCP
OpenAlex Scraper is available as a tool for AI assistants via the Model Context Protocol (MCP).
Setup for Claude Code
$claude mcp add --transport http apify "https://mcp.apify.com"
Setup for Claude Desktop, Cursor, or VS Code
Add this to your MCP config file:
{"mcpServers": {"apify": {"url": "https://mcp.apify.com"}}}
Example prompts
- "Search OpenAlex for climate change research papers"
- "Get papers by this author from OpenAlex"
- "Find the top-cited open access papers about quantum computing from the last 3 years"
cURL
curl -X POST "https://api.apify.com/v2/acts/automation-lab~openalex-scraper/runs?token=YOUR_API_TOKEN" \-H "Content-Type: application/json" \-d '{"searchQueries": ["machine learning"],"sortBy": "cited_by_count","maxResults": 50}'
Legality
Scraping publicly available data is generally legal according to the US Court of Appeals ruling (HiQ Labs v. LinkedIn). This actor only accesses publicly available information and does not require authentication. Always review and comply with the target website's Terms of Service before scraping. For personal data, ensure compliance with GDPR, CCPA, and other applicable privacy regulations.
FAQ
Q: How does OpenAlex compare to Google Scholar? A: OpenAlex provides structured API access to 250M+ works with DOIs, ORCID IDs, and institutional data. Google Scholar has broader web coverage but no structured API.
Q: Are abstracts always available?
A: Not all papers have abstracts in OpenAlex. The abstractInvertedIndex field indicates whether one is available.
Q: Can I search for specific authors? A: Use the author's name in your search query. OpenAlex's full-text search includes author names.
Q: What is the "relevance score"? A: A score from 0 to 1 indicating how well the paper matches your search query, calculated by OpenAlex's search engine.
Q: Why are some abstracts empty or garbled? A: OpenAlex stores abstracts as inverted indexes (word-position maps). This actor reconstructs the full text automatically, but a small number of papers have malformed indexes that produce incomplete abstracts. If the abstract is critical, check the paper via its DOI link.
Q: The scraper returns 0 results for my search query.
A: OpenAlex's search is case-insensitive but sensitive to special characters. Remove quotes, parentheses, and special characters from your query. Also verify your publicationYear filter is not excluding all results.
Other research and data scrapers on Apify
- arXiv Scraper — scrape preprint papers from arXiv
- CrossRef Scraper — extract scholarly article metadata via CrossRef
- ClinicalTrials Scraper — extract clinical trial data from ClinicalTrials.gov
- NASA Images Scraper — search and extract NASA images with full metadata
- OpenFDA Scraper — extract FDA drug adverse event reports
- Open Food Facts Scraper — scrape food product nutrition data