OpenAlex Scholarly Works Scraper avatar

OpenAlex Scholarly Works Scraper

Pricing

from $2.00 / 1,000 work returneds

Go to Apify Store
OpenAlex Scholarly Works Scraper

OpenAlex Scholarly Works Scraper

Searches OpenAlex (250M+ scholarly works) by keyword and returns structured records: title, authors, institutions, venue, year, citation count, concepts, open-access link, and the full reconstructed abstract for literature reviews.

Pricing

from $2.00 / 1,000 work returneds

Rating

0.0

(0)

Developer

Dami's Studio

Dami's Studio

Maintained by Community

Actor stats

0

Bookmarked

1

Total users

1

Monthly active users

3 hours ago

Last modified

Share

Search the OpenAlex catalog of 250M+ scholarly works and get clean, structured records — no API key, no login. OpenAlex is a free, open index of scholarship (an open replacement for Microsoft Academic Graph / Scopus).

This actor calls the public OpenAlex works endpoint, walks results with cursor pagination (the reliable way past the first couple hundred), reconstructs each abstract from its inverted index into readable text, and returns one flat row per work.

It is a polite API citizen: every request carries a contact mailto (both as a query param and in the User-Agent), which routes traffic to OpenAlex's faster, more reliable "polite pool".

Input

FieldTypeDefaultDescription
querystring— (required)Keywords to search (title, abstract, fulltext), e.g. machine learning.
sortstringrelevancerelevance, citations (most cited first), or date (newest first).
fromDatestringOptional YYYY-MM-DD; only works published on/after this date.
filterstringOptional raw OpenAlex filter, e.g. type:article,is_oa:true. Merged with fromDate.
maxItemsinteger100Max works to return (50 fetched per page via cursor).
proxyConfigurationobject{ "useApifyProxy": false }Optional. Not needed — OpenAlex is a clean public API.

Example input

{
"query": "crispr",
"sort": "citations",
"fromDate": "2020-01-01",
"maxItems": 120
}

Output

One row per work:

{
"ok": true,
"openalexId": "https://openalex.org/W...",
"doi": "https://doi.org/10....",
"title": "…",
"authors": ["Jane Doe", "John Roe"],
"institutions": ["Some University"],
"year": 2021,
"publicationDate": "2021-05-03",
"type": "article",
"venue": "Nature",
"citations": 1234,
"concepts": ["Biology", "Genetics"],
"isOpenAccess": true,
"oaUrl": "https://…pdf",
"abstract": "Reconstructed abstract text…",
"url": "https://openalex.org/W..."
}

abstract is rebuilt from OpenAlex's abstract_inverted_index; when no abstract is indexed it is null. Results are deduplicated by openalexId.

Diagnostics & billing

On failure or no results, the actor pushes a single diagnostic row (ok:false) with an errorCode (BAD_INPUT, NO_RESULTS, RATE_LIMITED, SERVER_ERROR, NETWORK) instead of failing silently. Only successful work rows are charged (one work unit each) — diagnostics and empty results are never billed.

Data source

Data comes from OpenAlex, released under CC0. Please cite OpenAlex when you use it.