Openalex Scraper
Pricing
Pay per event
Openalex Scraper
Search 250M+ academic papers from OpenAlex. Get citations, authors, institutions, open access links, DOIs, and metadata for research papers worldwide.
Pricing
Pay per event
Rating
0.0
(0)
Developer

Stas Persiianenko
Actor stats
0
Bookmarked
2
Total users
1
Monthly active users
23 minutes ago
Last modified
Categories
Share
OpenAlex Academic Papers Scraper
Search OpenAlex — the world's largest open catalog of academic research — and extract structured data for papers, authors, citations, and institutions across 250M+ works.
What does OpenAlex Academic Papers Scraper do?
This actor searches the OpenAlex database and returns detailed metadata for academic research papers. OpenAlex indexes over 250 million scholarly works from all fields of research. For each paper, it extracts:
- Bibliographic data: title, DOI, publication date, journal, volume, issue, pages
- Author details: names, ORCID IDs, institutional affiliations, country codes
- Citation metrics: cited-by count, number of references
- Open access: OA status, free PDF links, license information
- Abstracts: full abstract text reconstructed from OpenAlex inverted index
- Topics & keywords: research topics and keyword classifications
- Source metadata: journal name, ISSN, publisher
Why use OpenAlex Academic Papers Scraper?
- 250M+ works — the largest open academic database, successor to Microsoft Academic Graph
- No API key needed — OpenAlex is completely free and open
- Rich filtering — filter by year, citation count, and open access status
- Full abstracts — reconstructed from OpenAlex's inverted index format
- Citation sorting — find the most influential papers in any field
- Author & institution data — ORCID IDs and institutional affiliations included
- Structured output — clean JSON ready for analysis or integration
Use cases
- Literature reviews: Find the most cited papers on any research topic
- Research trend analysis: Track publication volume and citation patterns over time
- Academic evaluation: Analyze citation impact for researchers and institutions
- Competitive intelligence: Monitor competitor research output and focus areas
- Patent analysis: Cross-reference academic publications with patent portfolios
- Grant applications: Support funding proposals with citation and impact data
- Pharma R&D: Systematic literature reviews for drug development
- VC due diligence: Evaluate research depth behind deep-tech startups
How to use
- Go to the OpenAlex Academic Papers Scraper page on Apify Store.
- Enter one or more search queries (e.g., "machine learning", "CRISPR gene editing").
- Optionally filter by publication year, minimum citations, or open access.
- Choose sort order: relevance, most cited, or newest first.
- Set maximum results per query (1–500).
- Click Start and download your data as JSON, CSV, or Excel.
Input parameters
| Parameter | Type | Description |
|---|---|---|
searchQueries | Array | Search terms for papers (required). Example: "machine learning", "CRISPR" |
publicationYear | String | Year filter: single year ("2024") or range ("2020-2025") |
minCitations | Integer | Only papers with at least this many citations (default: 0) |
openAccessOnly | Boolean | Only return open access papers (default: false) |
sortBy | String | Sort by "relevance", "cited_by_count", or "publication_date" |
maxResults | Integer | Max papers per query, 1–500 (default: 50) |
Output example
Each paper in the dataset contains these fields:
{"openAlexId": "W2741809807","doi": "10.48550/arxiv.1706.03762","title": "Attention Is All You Need","publicationYear": 2025,"publicationDate": "2025-06-12","type": "article","language": "en","authors": "Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones","authorDetails": [{"name": "Ashish Vaswani","orcid": "","institution": "Google","country": "US","position": "first"}],"journalName": "Advances in Neural Information Processing Systems","journalIssn": "1049-5258","publisher": "Neural Information Processing Systems Foundation","volume": "30","issue": "","firstPage": "","lastPage": "","citedByCount": 6494,"referencedWorksCount": 38,"isOpenAccess": true,"openAccessUrl": "https://arxiv.org/pdf/1706.03762","openAccessStatus": "gold","abstractText": "The dominant sequence transduction models are based on complex recurrent or convolutional neural networks...","topics": ["Natural Language Processing Techniques", "Topic Modeling"],"keywords": ["attention mechanism", "transformer", "neural machine translation"],"openAlexUrl": "https://openalex.org/W2741809807","doiUrl": "https://doi.org/10.48550/arxiv.1706.03762","pdfUrl": "https://arxiv.org/pdf/1706.03762","searchQuery": "transformer attention mechanism","relevanceScore": 0.95,"scrapedAt": "2026-03-03T12:00:00.000Z"}
Pricing
OpenAlex Academic Papers Scraper uses a pay-per-event pricing model:
| Event | Price |
|---|---|
| Run started | $0.001 |
| Per paper extracted | $0.001 |
Cost examples:
- 50 papers: $0.001 + (50 × $0.001) = $0.051
- 200 papers: $0.001 + (200 × $0.001) = $0.201
- 500 papers: $0.001 + (500 × $0.001) = $0.501
API usage
Node.js
import { ApifyClient } from 'apify-client';const client = new ApifyClient({ token: 'YOUR_API_TOKEN' });const run = await client.actor('YOUR_USERNAME/openalex-scraper').call({searchQueries: ['machine learning'],publicationYear: '2023-2025',minCitations: 100,sortBy: 'cited_by_count',maxResults: 50,});const { items } = await client.dataset(run.defaultDatasetId).listItems();items.forEach(paper => {console.log(`${paper.citedByCount} cites | ${paper.title}`);});
Python
from apify_client import ApifyClientclient = ApifyClient('YOUR_API_TOKEN')run = client.actor('YOUR_USERNAME/openalex-scraper').call(run_input={'searchQueries': ['machine learning'],'publicationYear': '2023-2025','minCitations': 100,'sortBy': 'cited_by_count','maxResults': 50,})dataset = client.dataset(run['defaultDatasetId']).list_items().itemsfor paper in dataset:print(f"{paper['citedByCount']} cites | {paper['title']}")
Integrations
Connect OpenAlex Scraper with other tools using Apify integrations:
- Google Sheets — Export citation data to spreadsheets for analysis
- Slack / Email — Get alerts when new papers match your search criteria
- Webhooks — Trigger downstream processing when extraction completes
- Zapier / Make — Connect to 5,000+ apps for automated research workflows
- Amazon S3 / Google Cloud — Archive large literature datasets
Tips and best practices
- Use specific search terms — "transformer attention mechanism" returns more relevant results than just "AI"
- Filter by citations — set
minCitationsto find influential, well-cited papers - Year ranges — use "2020-2025" format to focus on recent research
- Open access filter — enable
openAccessOnlyto get papers with free PDF downloads - Sort by citations — "cited_by_count" surfaces the most impactful papers first
- Multiple queries — search for multiple topics in a single run to compare across fields
- Abstracts — OpenAlex stores abstracts as inverted indexes; this actor reconstructs the full text automatically
Data source
All data comes from OpenAlex, a free and open catalog of the world's scholarly research. OpenAlex indexes over 250 million works from journals, conference proceedings, preprints, and other academic sources. Data is updated daily and available under CC0 (public domain).
FAQ
Q: How does OpenAlex compare to Google Scholar? A: OpenAlex provides structured API access to 250M+ works with DOIs, ORCID IDs, and institutional data. Google Scholar has broader web coverage but no structured API.
Q: Are abstracts always available?
A: Not all papers have abstracts in OpenAlex. The abstractInvertedIndex field indicates whether one is available.
Q: Can I search for specific authors? A: Use the author's name in your search query. OpenAlex's full-text search includes author names.
Q: What is the "relevance score"? A: A score from 0 to 1 indicating how well the paper matches your search query, calculated by OpenAlex's search engine.