OpenAlex Scholarly Data Extractor
Pricing
from $0.15 / 1,000 results
Go to Apify Store
OpenAlex Scholarly Data Extractor
Extract scholarly works, authors, institutions, journals, publishers, and funders from OpenAlex — one record per row. 316M+ works. Public data, no key.
Extract scholarly data at scale from OpenAlex — works (papers), authors, institutions, journals/sources, publishers, funders, topics, and concepts — one record per row.
Built for research-intelligence and R&D competitive analysis, bibliometrics, science-funding analysis, and academic data teams.
Why use this actor
- Massive corpus — 316M+ works (verified), plus ~95M authors and the full institution/journal/funder graph.
- One record per row, with a flat header (
openalex_id,display_name,cited_by_count,publication_year) plus the full raw OpenAlex object (authorships, institutions, open-access, IDs, …). - No login, no key. Add your email to join the faster "polite pool".
Input
| Field | Type | Description |
|---|---|---|
entity | dropdown | Works / Authors / Sources / Institutions / Publishers / Funders / Topics / Concepts. |
filter | text | OpenAlex filter, e.g. publication_year:2026,is_oa:true. |
search | text | Full-text keyword search. |
mailto | text | Your email — joins OpenAlex's faster pool (recommended). |
perPage | int | 1–200 (default 200). |
maxItems | int | 0 = all matching (up to the 100k cursor window). |
entity is a pick-list. filter/search are free-text because OpenAlex's filter grammar is open-ended (documented at docs.openalex.org).
Output
Envelope + recordType (e.g. WORK, AUTHOR) + entity_type + flat header, then the raw OpenAlex object:
{"_input": "filter=publication_year:2026,is_oa:true","_source": "S1-openalex","_scrapedAt": "2026-06-03T10:00:00Z","recordType": "WORK","entity_type": "works","openalex_id": "https://openalex.org/W...","display_name": "...","cited_by_count": 12,"publication_year": 2026,"doi": "https://doi.org/...","authorships": [ "..." ],"open_access": { "is_oa": true, "...": "..." }}
How it works
- Your entity choice and filters define what to pull.
- The actor automatically pages through all matching results (up to 200 per request).
- Each entity streams into the dataset.
Known limits
- Public data — no account needed, runs from any connection. Backs off on HTTP 429. Add
mailtofor the faster pool. - A single query streams up to ~100,000 results; to pull a larger slice, narrow with
filter(e.g. by year) and run per slice. - Verified live 2026-06-03: works total 315,965,530;
cursor=*→meta.next_cursorconfirmed.