OpenAlex Scraper
Pricing
from $1.00 / 1,000 results
OpenAlex Scraper
Scrape OpenAlex the free, open catalog of 250M+ scholarly works, authors, institutions, and concepts. Search papers, authors, or fetch by OpenAlex ID / DOI. Pulls citations, open-access status, abstracts, authorships, journals, topics, and more.
Pricing
from $1.00 / 1,000 results
Rating
5.0
(21)
Developer
Crawler Bros
Actor stats
21
Bookmarked
2
Total users
1
Monthly active users
3 days ago
Last modified
Categories
Share
Scrape OpenAlex — the free, open catalog of 250M+ scholarly works, authors, institutions, and concepts. Search papers, authors, or fetch by OpenAlex ID / DOI / PMID. Pulls citations, open-access status, abstracts, authorships, journals, topics. HTTP-only via the public api.openalex.org API. No auth, no proxy, no rate-limit drama (100k req/day in the polite pool).
What this actor does
- Four modes:
searchWorks,searchAuthors,byWorkIds,byAuthorIds - Universal IDs: OpenAlex (
W…,A…), DOI, PMID, PMCID, ORCID — all auto-normalized - Reconstructs abstracts from OpenAlex's inverted index (zero extra API calls)
- Filters: publication year range, min citation count, open-access only, work type
- Sorts: relevance, most cited, newest publication date / year
- Empty fields are omitted — no nulls reach the dataset
Output per work
openalexId,doi,pmid,pmcid,magId— universal IDstitle,publicationDate,publicationYear,type,languagecitedByCount,fwci(field-weighted citation impact),hasFulltextisOa,openAccessOaUrl,openAccessStatus,bestOaUrlvenue—{name, issn_l, publisher, type, isOa, license}authorships[]—[{authorId, name, orcid, position, institutions}, ...](whenincludeAuthorships=true)primaryAuthor— first author display name (always present scalar)concepts[]— top 10 OpenAlex concept tags (whenincludeConcepts=true)abstract— reconstructed text (whenincludeAbstract=trueand OpenAlex has it)relevanceScore— search relevance score (search modes)openalexUrl— canonical linkrecordType: "work",scrapedAt
Output per author
openalexId,name,orcidworksCount,citedByCountlastKnownInstitutions[]hIndex,i10IndexopenalexUrl,recordType: "author",scrapedAt
Input
| Field | Type | Default | Description |
|---|---|---|---|
mode | string | searchWorks | searchWorks / searchAuthors / byWorkIds / byAuthorIds |
searchQuery | string | large language models | For searchWorks / searchAuthors |
workIds | array | – | OpenAlex IDs / DOIs / PMIDs / PMCIDs (for byWorkIds) |
authorIds | array | – | OpenAlex author IDs / ORCIDs (for byAuthorIds) |
publicationYearMin | int | – | Drop works before this year |
publicationYearMax | int | – | Drop works after this year |
minCitedBy | int | – | Drop works with fewer citations |
openAccessOnly | bool | false | Only emit OA works |
workType | string | any | article/book/preprint/review/dataset/etc. |
sortBy | string | relevance_score:desc | Search ordering |
includeAbstract | bool | true | Reconstruct abstract from inverted index |
includeAuthorships | bool | true | Full authorship array |
includeConcepts | bool | true | Top concept tags |
userAgentEmail | string | apify-actor@noreply.apify.com | OpenAlex polite-pool email |
maxItems | int | 50 | Hard cap (1–10000) |
Example: top-cited LLM papers from 2024
{"mode": "searchWorks","searchQuery": "large language models","publicationYearMin": 2024,"minCitedBy": 50,"sortBy": "cited_by_count:desc","maxItems": 100}
Example: lookup specific papers by DOI
{"mode": "byWorkIds","workIds": ["10.1145/3442188.3445922","https://doi.org/10.48550/arXiv.2310.06825","pmid:25524000"]}
Example: all works by an author (Geoffrey Hinton)
{"mode": "byAuthorIds","authorIds": ["A1969205038"],"minCitedBy": 100,"maxItems": 200}
Example: open-access ML papers only
{"mode": "searchWorks","searchQuery": "machine learning fairness","openAccessOnly": true,"workType": "article","publicationYearMin": 2020}
Use cases
- Literature reviews — bulk-export every paper matching a topic across all disciplines
- Citation tracking — find the most-cited works on a topic, or all works citing a specific paper
- Author intelligence — track an author's publication record, h-index, institutional affiliations
- Open-access auditing — find OA copies of every paper in a reading list
- Topic monitoring — schedule recurring runs to catch new papers in your area
- Cross-database enrichment — feed DOIs from arXiv / PubMed / Crossref → enrich with OpenAlex citations
FAQ
What's OpenAlex? An open replacement for Microsoft Academic Graph: 250M+ scholarly works, 80M+ authors, free for any use, fully indexed by content+citations. See openalex.org.
Is there a rate limit? Yes — 100k requests/day in the polite pool (anyone with an email in their User-Agent). The actor sets this header automatically.
Why are abstracts sometimes missing? OpenAlex omits abstracts when their license doesn't permit redistribution. The actor returns whatever's available; missing abstracts mean the source publisher doesn't allow it.
How does it differ from arXiv / PubMed? OpenAlex is broader — covers all disciplines, all sources (preprint servers, journals, books, datasets). arXiv only covers preprints in physics/math/CS. PubMed only covers biomedical literature.
What ID formats are accepted? OpenAlex IDs (W123…, A123…), full DOI URLs (https://doi.org/10.1145/...), bare DOIs (10.1145/...), pmid:N, pmcid:N, and ORCIDs (0000-0001-…).
What's fwci? Field-weighted citation impact — a paper's citation count normalized to its field's average. 1.0 = field average, 2.0 = twice field average, etc. Useful for cross-discipline comparison.
Why is concepts capped at 10? OpenAlex assigns dozens of low-confidence concepts per work. We keep the top 10 (already sorted by score) for table display compactness; the full list is in OpenAlex's web UI.
How fresh is the data? Daily — OpenAlex re-indexes nightly from Crossref, PubMed, ORCID, ROR, etc.