OpenAlex Scholarly Data Extractor avatar

OpenAlex Scholarly Data Extractor

Pricing

from $0.15 / 1,000 results

Go to Apify Store
OpenAlex Scholarly Data Extractor

OpenAlex Scholarly Data Extractor

Extract scholarly works, authors, institutions, journals, publishers, and funders from OpenAlex — one record per row. 316M+ works. Public data, no key.

Pricing

from $0.15 / 1,000 results

Rating

0.0

(0)

Developer

xtractoo

xtractoo

Maintained by Community

Actor stats

0

Bookmarked

1

Total users

0

Monthly active users

2 days ago

Last modified

Categories

Share

Extract scholarly data at scale from OpenAlexworks (papers), authors, institutions, journals/sources, publishers, funders, topics, and concepts — one record per row.

Built for research-intelligence and R&D competitive analysis, bibliometrics, science-funding analysis, and academic data teams.


Why use this actor

  • Massive corpus — 316M+ works (verified), plus ~95M authors and the full institution/journal/funder graph.
  • One record per row, with a flat header (openalex_id, display_name, cited_by_count, publication_year) plus the full raw OpenAlex object (authorships, institutions, open-access, IDs, …).
  • No login, no key. Add your email to join the faster "polite pool".

Input

FieldTypeDescription
entitydropdownWorks / Authors / Sources / Institutions / Publishers / Funders / Topics / Concepts.
filtertextOpenAlex filter, e.g. publication_year:2026,is_oa:true.
searchtextFull-text keyword search.
mailtotextYour email — joins OpenAlex's faster pool (recommended).
perPageint1–200 (default 200).
maxItemsint0 = all matching (up to the 100k cursor window).

entity is a pick-list. filter/search are free-text because OpenAlex's filter grammar is open-ended (documented at docs.openalex.org).

Output

Envelope + recordType (e.g. WORK, AUTHOR) + entity_type + flat header, then the raw OpenAlex object:

{
"_input": "filter=publication_year:2026,is_oa:true",
"_source": "S1-openalex",
"_scrapedAt": "2026-06-03T10:00:00Z",
"recordType": "WORK",
"entity_type": "works",
"openalex_id": "https://openalex.org/W...",
"display_name": "...",
"cited_by_count": 12,
"publication_year": 2026,
"doi": "https://doi.org/...",
"authorships": [ "..." ],
"open_access": { "is_oa": true, "...": "..." }
}

How it works

  1. Your entity choice and filters define what to pull.
  2. The actor automatically pages through all matching results (up to 200 per request).
  3. Each entity streams into the dataset.

Known limits

  • Public data — no account needed, runs from any connection. Backs off on HTTP 429. Add mailto for the faster pool.
  • A single query streams up to ~100,000 results; to pull a larger slice, narrow with filter (e.g. by year) and run per slice.
  • Verified live 2026-06-03: works total 315,965,530; cursor=*meta.next_cursor confirmed.