Crossref Scraper
Pricing
from $1.00 / 1,000 results
Crossref Scraper
Scrape Crossref, the world's largest DOI registry. Search 130M+ scholarly works, fetch by DOI, filter by date / type / journal, and pull authors, references, citation counts, ISSN, ORCIDs, and more.
Pricing
from $1.00 / 1,000 results
Rating
5.0
(13)
Developer
Crawler Bros
Actor stats
13
Bookmarked
2
Total users
1
Monthly active users
a day ago
Last modified
Categories
Share
Scrape Crossref — the world's largest DOI registry. Search 130M+ scholarly works, fetch by DOI, filter by date / type / publisher. Pulls authors, references, citation counts, ISSN, ORCIDs, full bibliographic metadata. HTTP-only via the public api.crossref.org REST API. No auth, no proxy. Polite-pool email gives ~50 req/s rate limit.
What this actor does
- Four modes:
searchWorks,byDois,byMember(publisher),byJournalIssn - Universal IDs: accepts bare DOIs, DOI URLs,
doi:prefix - Filters: publication date range, work type, member (publisher) ID, min citation count
- Sorts: relevance, published, citation count, deposit / index / issued date
- Cursor pagination — no 10k result cap (unlike most search APIs)
- Optional references — full cited references list per work
- Empty fields are omitted — no nulls in output
Output per work
doi,doiUrl,title,subtitle,typepublisher,containerTitle(journal/book),shortContainerTitleissn[],isbn[]publishedDate,publishedYear,publishedPrintDate,publishedOnlineDatecreatedAt,depositedAt,indexedAt— Crossref-internal lifecycle datesisReferencedByCount— how many works cite this one (Crossref's count)referencesCount— number of references in this workauthors[]—[{name, family, given, orcid, sequence, affiliations}, ...]primaryAuthor— first author scalareditors[](when present)subjects[]— Crossref's subject taxonomyabstract(when provided by publisher; usually JATS XML)url,primaryUrl— landing-page URLspage,volume,issue,languagelicenses[]—[{url, contentVersion, delayInDays}, ...]references[]— whenincludeReferences=truescore— search relevance (search modes)recordType: "work",scrapedAt
Input
| Field | Type | Default | Description |
|---|---|---|---|
mode | string | searchWorks | searchWorks / byDois / byMember / byJournalIssn |
searchQuery | string | large language models | Free-text search |
queryAuthor | string | – | Constrain to author |
queryTitle | string | – | Constrain to title |
dois | array | – | DOIs (mode=byDois) |
memberId | string | – | Numeric Crossref member ID (mode=byMember) |
journalIssn | string | – | Journal ISSN like 1532-4435 (mode=byJournalIssn) |
fromPubDate | string | – | YYYY-MM-DD |
untilPubDate | string | – | YYYY-MM-DD |
workType | string | any | journal-article/book/preprint/etc. |
minCitationCount | int | – | Drop works cited fewer times than this |
sortBy | string | relevance | relevance/published/is-referenced-by-count/etc. |
sortOrder | string | desc | asc / desc |
includeReferences | bool | false | Include cited references |
userAgentEmail | string | apify-actor@noreply.apify.com | Crossref polite-pool email |
maxItems | int | 50 | Hard cap (1–10000) |
Example: top-cited LLM works since 2024
{"mode": "searchWorks","searchQuery": "large language models","fromPubDate": "2024-01-01","minCitationCount": 50,"sortBy": "is-referenced-by-count","maxItems": 100}
Example: lookup by DOI list
{"mode": "byDois","dois": ["10.1145/3442188.3445922","https://doi.org/10.48550/arxiv.2310.06825","doi:10.1126/science.abc1234"],"includeReferences": true}
Example: all journal articles by Springer (member 297) in 2024
{"mode": "byMember","memberId": "297","workType": "journal-article","fromPubDate": "2024-01-01","untilPubDate": "2024-12-31","maxItems": 500}
Example: only datasets in your area
{"mode": "searchWorks","searchQuery": "climate temperature anomaly","workType": "dataset","sortBy": "published","maxItems": 200}
Use cases
- Citation enrichment — feed any DOI list and get back citation counts, references, journal info
- Literature reviews — bulk-export every work matching a query, filter by date / type / publisher
- Publisher monitoring — track all output of a specific Crossref member ID
- Reference resolution —
includeReferences: trueresolves the bibliography of any reviewed paper - Bibliometrics — author productivity, ORCID-based identification, citation networks
FAQ
What's Crossref? The not-for-profit registration agency for scholarly DOIs. ~130M registered works, sourced from 20k+ publishers (Springer, Elsevier, Wiley, ACM, Nature, etc.). See crossref.org.
Is there a rate limit? Crossref runs a polite pool (provide a contact email in User-Agent for higher limits — ~50 req/s). The actor sets this header automatically.
How does it differ from OpenAlex? Crossref is the registry: every DOI's authoritative metadata. OpenAlex is the derivative analytics layer: citation counts (computed from references), concepts (computed via NLP), open-access detection (computed via Unpaywall). Use Crossref for canonical metadata, OpenAlex for citation networks + analytics.
What ID formats are accepted? Bare DOIs (10.1145/3442188.3445922), full URLs (https://doi.org/10.1145/3442188.3445922), doi: prefix (doi:10.1145/...).
What does is-referenced-by-count mean? How many other Crossref-registered works cite this one (i.e. citations from works whose own references are deposited at Crossref). It's a lower bound on real citation count.
Why are abstracts often missing? Most publishers don't deposit abstracts at Crossref. When present, they're usually JATS-XML. For richer abstract coverage, query OpenAlex (the actor in this repo).
What's a "member"? A Crossref publisher account. Each publisher has a numeric member ID; you can fetch all their works via mode=byMember. Find IDs at api.crossref.org/members.
Are references resolved to other works? Crossref returns references as nested objects with DOI / unstructured strings. To resolve them, run a byDois mode pass over the DOIs in the references array.
What's the cursor pagination limit? Effectively unlimited (Crossref doesn't impose the 10k cap that most search APIs use). The actor walks cursors until it hits maxItems or runs out of results.