CrossRef Academic Metadata Scraper avatar
CrossRef Academic Metadata Scraper

Pricing

from $2.00 / 1,000 results

Go to Apify Store
CrossRef Academic Metadata Scraper

CrossRef Academic Metadata Scraper

Search CrossRef for academic paper metadata. Get DOIs, authors, journals, citations, and publication dates. Essential for research and bibliography building.

Pricing

from $2.00 / 1,000 results

Rating

0.0

(0)

Developer

Fortuitous Pirate

Fortuitous Pirate

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

3 days ago

Last modified

Categories

Share

CrossRef Scholarly Works Scraper

Scrapes scholarly publications metadata from the CrossRef API - access to millions of journal articles, books, conference papers, dissertations, datasets, and more.

Features

  • Massive Database: Access 145M+ scholarly works with DOIs
  • Rich Metadata: Authors, citations, abstracts, funders, licenses, and more
  • Two Modes: Single DOI lookup or bulk search
  • Fast with Polite Pool: Up to 50 requests/second with email registration
  • No API Key Required: Free access, email recommended for better rate limits

API Source

Input Parameters

ParameterTypeRequiredDefaultDescription
querystringNo-Search query for scholarly works
doistringNo-Specific DOI to look up (e.g., 10.1038/nature12373)
authorstringNo-Filter by author name
publishedSincestringNo-Filter works published after this date (YYYY-MM-DD format)
typeenumNoAllWork type filter (see options below)
mailtostringNo-Your email for CrossRef's polite pool (faster rate limits)
limitintegerNo100Maximum number of works to return (max: 10000)

Work Type Options

ValueDescription
(empty)All Types
journal-articleJournal Article
bookBook
book-chapterBook Chapter
proceedings-articleConference Paper
dissertationDissertation
datasetDataset

Usage Modes

1. DOI Lookup Mode

Fetch metadata for a single work by its DOI:

{
"doi": "10.1038/nature12373",
"mailto": "your@email.com"
}

2. Search Mode

Search for works matching your criteria:

{
"query": "machine learning",
"author": "Hinton",
"type": "journal-article",
"publishedSince": "2020-01-01",
"mailto": "your@email.com",
"limit": 500
}

Output Fields

Each scraped work includes the following fields:

FieldTypeDescription
doistringDigital Object Identifier
titlestringWork title
subtitlestringWork subtitle (if any)
typestringWork type (journal-article, book, etc.)
authorsarrayList of authors (see below)
containerTitlestringJournal/book name containing the work
publisherstringPublisher name
publishedDatestringPublication date (YYYY-MM-DD)
volumestringVolume number
issuestringIssue number
pagestringPage range
abstractstringWork abstract (HTML tags stripped)
subjectarraySubject categories
referencesCountintegerNumber of references cited by this work
citedByCountintegerNumber of works citing this work
urlstringCrossRef URL for this work
licensestringLicense URL
issnarrayISSN(s) for the journal
isbnarrayISBN(s) for books
languagestringLanguage code
funderarrayFunding organizations (name and DOI)
scorenumberSearch relevance score
scrapedAtstringTimestamp when data was scraped

Author Object Structure

{
"given": "Geoffrey",
"family": "Hinton",
"name": null,
"orcid": "https://orcid.org/0000-0001-2345-6789",
"affiliation": ["University of Toronto", "Google Brain"]
}

Example Output

{
"doi": "10.1038/nature12373",
"title": "Playing Atari with Deep Reinforcement Learning",
"subtitle": null,
"type": "journal-article",
"authors": [
{
"given": "Volodymyr",
"family": "Mnih",
"orcid": null,
"affiliation": ["DeepMind Technologies"]
}
],
"containerTitle": "Nature",
"publisher": "Springer Nature",
"publishedDate": "2015-02-26",
"volume": "518",
"issue": "7540",
"page": "529-533",
"abstract": "We present the first deep learning model to successfully learn control policies...",
"subject": ["Multidisciplinary"],
"referencesCount": 35,
"citedByCount": 12847,
"url": "http://dx.doi.org/10.1038/nature14236",
"license": "https://www.springer.com/tdm",
"issn": ["0028-0836", "1476-4687"],
"isbn": null,
"language": "en",
"funder": null,
"score": 1.0,
"scrapedAt": "2026-01-25T12:00:00.000Z"
}

Tips for Best Results

  1. Always provide mailto: This gets you into CrossRef's "polite pool" with much faster rate limits (50 req/sec vs 1 req/sec)

  2. Combine filters: Use query, author, type, and publishedSince together for precise results

  3. DOI lookup is fastest: If you know the DOI, use that instead of searching

  4. Mind the limits: Maximum 10,000 results per run; for larger datasets, run multiple times with date filters

Local Development

# Install dependencies
npm install
# Run locally with Apify CLI
apify run -p '{"query": "deep learning", "limit": 10}'

Resources