Crossref Scholarly Works Scraper avatar

Crossref Scholarly Works Scraper

Pricing

$1.00 / 1,000 work returneds

Go to Apify Store
Crossref Scholarly Works Scraper

Crossref Scholarly Works Scraper

Searches the Crossref API (150M+ scholarly works) and returns clean records: DOI, title, authors, journal, publisher, date, citation count, subjects, ISSN, abstract. Filter by work type/date, sort by relevance, citations, or newest for lit reviews.

Pricing

$1.00 / 1,000 work returneds

Rating

0.0

(0)

Developer

Dami's Studio

Dami's Studio

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

3 hours ago

Last modified

Share

Search the Crossref catalog of 150M+ scholarly works (journal articles, preprints, books, datasets, and more) via its public REST API — no API key, no login, no anti-bot.

The actor is a polite Crossref client: it identifies itself with a contact User-Agent and a mailto query parameter so Crossref routes it to the faster "polite pool", and it uses deep cursor pagination (cursor=*next-cursor) which is the only reliable way to page past 1,000 rows.

Input

FieldTypeDefaultDescription
querystring (required)deep learningKeywords searched across titles, authors, abstracts and metadata.
filterTypestringallRestrict to a Crossref work type, e.g. journal-article.
fromDatestring YYYY-MM-DDnoneOnly works published on/after this date.
sortenumrelevancerelevance, is-referenced-by-count (most cited), or published (newest).
maxItemsinteger100Max works to return (cursor pagination handles >100).
proxyConfigurationobjectnoneOptional and off by default; Crossref is a public, no-key API with no anti-bot, so a proxy adds no benefit. Only enable it if you hit IP-level rate limits.

Output

Each successful row:

{
"ok": true,
"doi": "10.1038/nature14539",
"title": "Deep learning",
"authors": ["Yann LeCun", "Yoshua Bengio", "Geoffrey Hinton"],
"journal": "Nature",
"publisher": "Springer Science and Business Media LLC",
"type": "journal-article",
"publishedDate": "2015-05-28",
"citations": 70000,
"subjects": ["Multidisciplinary"],
"issn": ["0028-0836", "1476-4687"],
"abstract": null,
"url": "https://doi.org/10.1038/nature14539"
}
  • authors are formatted "Given Family" (organizational authors fall back to their name).
  • publishedDate is assembled from Crossref's date-parts (may be year-only or year-month for older records).
  • citations is Crossref's is-referenced-by-count.
  • abstract is the JATS-XML abstract stripped to plain text, or null when Crossref has none.
  • Nullable fields: title, journal, publisher, type, publishedDate, abstract, and url may be null, and authors, subjects, and issn may be empty arrays, depending on what the publisher deposited with Crossref. doi is always present (rows without a DOI are dropped). citations defaults to 0 when absent.

Results are deduplicated by DOI. Charging is per successful work (work event). Diagnostic / empty / blocked rows (ok: false with an errorCode) are never charged — this includes BAD_INPUT (empty query or malformed fromDate), NO_RESULTS, and any network/block error.

Troubleshooting

  • BAD_INPUT row, no results: you left query empty or fromDate isn't YYYY-MM-DD. Fix the input and re-run — you were not charged.
  • NO_RESULTS row: your query/filter combination matched nothing in Crossref. Try broader keywords or drop the type/date filters.
  • RATE_LIMITED / BLOCKED row: rare for Crossref. The actor already retries with backoff; if it persists, enable a proxy to use a different IP.

Notes

  • Powered entirely by the public Crossref REST API (https://api.crossref.org/works). Please be considerate of the shared, free service.
  • Citation counts and abstracts depend on what publishers deposit with Crossref; coverage varies by record.