Crossref Scraper avatar

Crossref Scraper

Pricing

from $1.00 / 1,000 results

Go to Apify Store
Crossref Scraper

Crossref Scraper

Scrape Crossref, the world's largest DOI registry. Search 130M+ scholarly works, fetch by DOI, filter by date / type / journal, and pull authors, references, citation counts, ISSN, ORCIDs, and more.

Pricing

from $1.00 / 1,000 results

Rating

5.0

(13)

Developer

Crawler Bros

Crawler Bros

Maintained by Community

Actor stats

13

Bookmarked

2

Total users

1

Monthly active users

a day ago

Last modified

Share

Scrape Crossref — the world's largest DOI registry. Search 130M+ scholarly works, fetch by DOI, filter by date / type / publisher. Pulls authors, references, citation counts, ISSN, ORCIDs, full bibliographic metadata. HTTP-only via the public api.crossref.org REST API. No auth, no proxy. Polite-pool email gives ~50 req/s rate limit.

What this actor does

  • Four modes: searchWorks, byDois, byMember (publisher), byJournalIssn
  • Universal IDs: accepts bare DOIs, DOI URLs, doi: prefix
  • Filters: publication date range, work type, member (publisher) ID, min citation count
  • Sorts: relevance, published, citation count, deposit / index / issued date
  • Cursor pagination — no 10k result cap (unlike most search APIs)
  • Optional references — full cited references list per work
  • Empty fields are omitted — no nulls in output

Output per work

  • doi, doiUrl, title, subtitle, type
  • publisher, containerTitle (journal/book), shortContainerTitle
  • issn[], isbn[]
  • publishedDate, publishedYear, publishedPrintDate, publishedOnlineDate
  • createdAt, depositedAt, indexedAt — Crossref-internal lifecycle dates
  • isReferencedByCount — how many works cite this one (Crossref's count)
  • referencesCount — number of references in this work
  • authors[][{name, family, given, orcid, sequence, affiliations}, ...]
  • primaryAuthor — first author scalar
  • editors[] (when present)
  • subjects[] — Crossref's subject taxonomy
  • abstract (when provided by publisher; usually JATS XML)
  • url, primaryUrl — landing-page URLs
  • page, volume, issue, language
  • licenses[][{url, contentVersion, delayInDays}, ...]
  • references[] — when includeReferences=true
  • score — search relevance (search modes)
  • recordType: "work", scrapedAt

Input

FieldTypeDefaultDescription
modestringsearchWorkssearchWorks / byDois / byMember / byJournalIssn
searchQuerystringlarge language modelsFree-text search
queryAuthorstringConstrain to author
queryTitlestringConstrain to title
doisarrayDOIs (mode=byDois)
memberIdstringNumeric Crossref member ID (mode=byMember)
journalIssnstringJournal ISSN like 1532-4435 (mode=byJournalIssn)
fromPubDatestringYYYY-MM-DD
untilPubDatestringYYYY-MM-DD
workTypestringanyjournal-article/book/preprint/etc.
minCitationCountintDrop works cited fewer times than this
sortBystringrelevancerelevance/published/is-referenced-by-count/etc.
sortOrderstringdescasc / desc
includeReferencesboolfalseInclude cited references
userAgentEmailstringapify-actor@noreply.apify.comCrossref polite-pool email
maxItemsint50Hard cap (1–10000)

Example: top-cited LLM works since 2024

{
"mode": "searchWorks",
"searchQuery": "large language models",
"fromPubDate": "2024-01-01",
"minCitationCount": 50,
"sortBy": "is-referenced-by-count",
"maxItems": 100
}

Example: lookup by DOI list

{
"mode": "byDois",
"dois": [
"10.1145/3442188.3445922",
"https://doi.org/10.48550/arxiv.2310.06825",
"doi:10.1126/science.abc1234"
],
"includeReferences": true
}

Example: all journal articles by Springer (member 297) in 2024

{
"mode": "byMember",
"memberId": "297",
"workType": "journal-article",
"fromPubDate": "2024-01-01",
"untilPubDate": "2024-12-31",
"maxItems": 500
}

Example: only datasets in your area

{
"mode": "searchWorks",
"searchQuery": "climate temperature anomaly",
"workType": "dataset",
"sortBy": "published",
"maxItems": 200
}

Use cases

  • Citation enrichment — feed any DOI list and get back citation counts, references, journal info
  • Literature reviews — bulk-export every work matching a query, filter by date / type / publisher
  • Publisher monitoring — track all output of a specific Crossref member ID
  • Reference resolutionincludeReferences: true resolves the bibliography of any reviewed paper
  • Bibliometrics — author productivity, ORCID-based identification, citation networks

FAQ

What's Crossref? The not-for-profit registration agency for scholarly DOIs. ~130M registered works, sourced from 20k+ publishers (Springer, Elsevier, Wiley, ACM, Nature, etc.). See crossref.org.

Is there a rate limit? Crossref runs a polite pool (provide a contact email in User-Agent for higher limits — ~50 req/s). The actor sets this header automatically.

How does it differ from OpenAlex? Crossref is the registry: every DOI's authoritative metadata. OpenAlex is the derivative analytics layer: citation counts (computed from references), concepts (computed via NLP), open-access detection (computed via Unpaywall). Use Crossref for canonical metadata, OpenAlex for citation networks + analytics.

What ID formats are accepted? Bare DOIs (10.1145/3442188.3445922), full URLs (https://doi.org/10.1145/3442188.3445922), doi: prefix (doi:10.1145/...).

What does is-referenced-by-count mean? How many other Crossref-registered works cite this one (i.e. citations from works whose own references are deposited at Crossref). It's a lower bound on real citation count.

Why are abstracts often missing? Most publishers don't deposit abstracts at Crossref. When present, they're usually JATS-XML. For richer abstract coverage, query OpenAlex (the actor in this repo).

What's a "member"? A Crossref publisher account. Each publisher has a numeric member ID; you can fetch all their works via mode=byMember. Find IDs at api.crossref.org/members.

Are references resolved to other works? Crossref returns references as nested objects with DOI / unstructured strings. To resolve them, run a byDois mode pass over the DOIs in the references array.

What's the cursor pagination limit? Effectively unlimited (Crossref doesn't impose the 10k cap that most search APIs use). The actor walks cursors until it hits maxItems or runs out of results.