OpenAlex Scraper avatar

OpenAlex Scraper

Pricing

from $1.00 / 1,000 results

Go to Apify Store
OpenAlex Scraper

OpenAlex Scraper

Scrape OpenAlex the free, open catalog of 250M+ scholarly works, authors, institutions, and concepts. Search papers, authors, or fetch by OpenAlex ID / DOI. Pulls citations, open-access status, abstracts, authorships, journals, topics, and more.

Pricing

from $1.00 / 1,000 results

Rating

5.0

(21)

Developer

Crawler Bros

Crawler Bros

Maintained by Community

Actor stats

21

Bookmarked

2

Total users

1

Monthly active users

3 days ago

Last modified

Share

Scrape OpenAlex — the free, open catalog of 250M+ scholarly works, authors, institutions, and concepts. Search papers, authors, or fetch by OpenAlex ID / DOI / PMID. Pulls citations, open-access status, abstracts, authorships, journals, topics. HTTP-only via the public api.openalex.org API. No auth, no proxy, no rate-limit drama (100k req/day in the polite pool).

What this actor does

  • Four modes: searchWorks, searchAuthors, byWorkIds, byAuthorIds
  • Universal IDs: OpenAlex (W…, A…), DOI, PMID, PMCID, ORCID — all auto-normalized
  • Reconstructs abstracts from OpenAlex's inverted index (zero extra API calls)
  • Filters: publication year range, min citation count, open-access only, work type
  • Sorts: relevance, most cited, newest publication date / year
  • Empty fields are omitted — no nulls reach the dataset

Output per work

  • openalexId, doi, pmid, pmcid, magId — universal IDs
  • title, publicationDate, publicationYear, type, language
  • citedByCount, fwci (field-weighted citation impact), hasFulltext
  • isOa, openAccessOaUrl, openAccessStatus, bestOaUrl
  • venue{name, issn_l, publisher, type, isOa, license}
  • authorships[][{authorId, name, orcid, position, institutions}, ...] (when includeAuthorships=true)
  • primaryAuthor — first author display name (always present scalar)
  • concepts[] — top 10 OpenAlex concept tags (when includeConcepts=true)
  • abstract — reconstructed text (when includeAbstract=true and OpenAlex has it)
  • relevanceScore — search relevance score (search modes)
  • openalexUrl — canonical link
  • recordType: "work", scrapedAt

Output per author

  • openalexId, name, orcid
  • worksCount, citedByCount
  • lastKnownInstitutions[]
  • hIndex, i10Index
  • openalexUrl, recordType: "author", scrapedAt

Input

FieldTypeDefaultDescription
modestringsearchWorkssearchWorks / searchAuthors / byWorkIds / byAuthorIds
searchQuerystringlarge language modelsFor searchWorks / searchAuthors
workIdsarrayOpenAlex IDs / DOIs / PMIDs / PMCIDs (for byWorkIds)
authorIdsarrayOpenAlex author IDs / ORCIDs (for byAuthorIds)
publicationYearMinintDrop works before this year
publicationYearMaxintDrop works after this year
minCitedByintDrop works with fewer citations
openAccessOnlyboolfalseOnly emit OA works
workTypestringanyarticle/book/preprint/review/dataset/etc.
sortBystringrelevance_score:descSearch ordering
includeAbstractbooltrueReconstruct abstract from inverted index
includeAuthorshipsbooltrueFull authorship array
includeConceptsbooltrueTop concept tags
userAgentEmailstringapify-actor@noreply.apify.comOpenAlex polite-pool email
maxItemsint50Hard cap (1–10000)

Example: top-cited LLM papers from 2024

{
"mode": "searchWorks",
"searchQuery": "large language models",
"publicationYearMin": 2024,
"minCitedBy": 50,
"sortBy": "cited_by_count:desc",
"maxItems": 100
}

Example: lookup specific papers by DOI

{
"mode": "byWorkIds",
"workIds": [
"10.1145/3442188.3445922",
"https://doi.org/10.48550/arXiv.2310.06825",
"pmid:25524000"
]
}

Example: all works by an author (Geoffrey Hinton)

{
"mode": "byAuthorIds",
"authorIds": ["A1969205038"],
"minCitedBy": 100,
"maxItems": 200
}

Example: open-access ML papers only

{
"mode": "searchWorks",
"searchQuery": "machine learning fairness",
"openAccessOnly": true,
"workType": "article",
"publicationYearMin": 2020
}

Use cases

  • Literature reviews — bulk-export every paper matching a topic across all disciplines
  • Citation tracking — find the most-cited works on a topic, or all works citing a specific paper
  • Author intelligence — track an author's publication record, h-index, institutional affiliations
  • Open-access auditing — find OA copies of every paper in a reading list
  • Topic monitoring — schedule recurring runs to catch new papers in your area
  • Cross-database enrichment — feed DOIs from arXiv / PubMed / Crossref → enrich with OpenAlex citations

FAQ

What's OpenAlex? An open replacement for Microsoft Academic Graph: 250M+ scholarly works, 80M+ authors, free for any use, fully indexed by content+citations. See openalex.org.

Is there a rate limit? Yes — 100k requests/day in the polite pool (anyone with an email in their User-Agent). The actor sets this header automatically.

Why are abstracts sometimes missing? OpenAlex omits abstracts when their license doesn't permit redistribution. The actor returns whatever's available; missing abstracts mean the source publisher doesn't allow it.

How does it differ from arXiv / PubMed? OpenAlex is broader — covers all disciplines, all sources (preprint servers, journals, books, datasets). arXiv only covers preprints in physics/math/CS. PubMed only covers biomedical literature.

What ID formats are accepted? OpenAlex IDs (W123…, A123…), full DOI URLs (https://doi.org/10.1145/...), bare DOIs (10.1145/...), pmid:N, pmcid:N, and ORCIDs (0000-0001-…).

What's fwci? Field-weighted citation impact — a paper's citation count normalized to its field's average. 1.0 = field average, 2.0 = twice field average, etc. Useful for cross-discipline comparison.

Why is concepts capped at 10? OpenAlex assigns dozens of low-confidence concepts per work. We keep the top 10 (already sorted by score) for table display compactness; the full list is in OpenAlex's web UI.

How fresh is the data? Daily — OpenAlex re-indexes nightly from Crossref, PubMed, ORCID, ROR, etc.