OpenAlex Scholarly Works Scraper avatar

OpenAlex Scholarly Works Scraper

Pricing

from $4.00 / 1,000 results

Go to Apify Store
OpenAlex Scholarly Works Scraper

OpenAlex Scholarly Works Scraper

Extract publication, author, institution, source, citation, topic, DOI, and open-access signals from the official OpenAlex API.

Pricing

from $4.00 / 1,000 results

Rating

0.0

(0)

Developer

naoki anzai

naoki anzai

Maintained by Community

Actor stats

0

Bookmarked

3

Total users

2

Monthly active users

4 days ago

Last modified

Share

OpenAlex Research Intelligence

Extract research intelligence from the official OpenAlex API without browser scraping. Use search terms, OpenAlex work IDs, author IDs, institution IDs, or concept IDs to produce flattened dataset rows for works, authors, and publication sources.

Use cases

Use this actor for academic / bibliometric research — query 250M+ scholarly works by topic, author, institution, or citation. Auth-free, official-API-first, with a stable output schema and documented source compliance.

Inputs

FieldDefaultNotes
searchTerms[]Research queries for OpenAlex works.
workIds[]OpenAlex work IDs or URLs.
authorIds[]OpenAlex author IDs.
institutionIds[]OpenAlex institution IDs.
conceptIds[]OpenAlex concept IDs.
fromDate / toDateemptyPublication date filters in YYYY-MM-DD.
sortcited_by_count:descCitation, publication date, or relevance sort.
limitPerSource25Works fetched per search/filter source.
maxWorks100Global unique work cap.
includeAbstractfalseReconstruct abstract text from OpenAlex inverted index when available.
mailtoemptyOptional polite-pool email parameter for OpenAlex.
deliverydatasetdataset or webhook.
dryRunfalseSkip dataset/webhook delivery.

At least one of searchTerms, workIds, authorIds, institutionIds, or conceptIds is required.

Dataset Rows

work_summary

  • title, DOI, OpenAlex ID, publication year/date, type
  • citation count, open-access status, retraction flag
  • primary source, publisher, landing page, PDF URL
  • topics, concepts, countries, institutions

author_signal

  • work ID/title, author name and OpenAlex ID
  • author order/position, corresponding author flag
  • affiliated institutions and countries

source_summary

  • work ID/title, journal or source name
  • source type, ISSN, host organization, open-access source flag

Example Input

{
"searchTerms": ["large language models", "retrieval augmented generation"],
"fromDate": "2024-01-01",
"sort": "cited_by_count:desc",
"limitPerSource": 10,
"maxWorks": 20,
"includeAbstract": false,
"delivery": "dataset",
"dryRun": false
}

Sample output

Each run produces structured dataset rows (see the Dataset Rows section above for the field list). Run the actor once with the example input to see a live sample before scheduling.

Local Development

npm install
npm test
node src/index.js

output/result.json contains the full payload. Apify dataset delivery writes flattened rows.

Limitations

  • OpenAlex coverage is broad but not identical to Crossref, PubMed, Semantic Scholar, or publisher APIs.
  • Citation counts and metadata can lag source publications.
  • Relevance sort is used only for search sources; non-search filters fall back to citation sort.
  • includeAbstract can increase payload size substantially.

Input Examples

Example: Single-target audit

{
"targets": [
"example-target-1"
],
"maxResultsPerTarget": 30
}

Example: Bulk portfolio

{
"targets": [
"target-1",
"target-2",
"target-3"
],
"maxResultsPerTarget": 50,
"snapshotKey": "openalex-research-intelligence-state"
}

Example: Recurring delta watch

{
"targets": [
"target-1"
],
"snapshotKey": "openalex-research-intelligence-state",
"emitChangedOnly": true
}