CrossRef Academic Metadata Scraper
Pricing
from $2.00 / 1,000 results
CrossRef Academic Metadata Scraper
Search CrossRef for academic paper metadata. Get DOIs, authors, journals, citations, and publication dates. Essential for research and bibliography building.
Pricing
from $2.00 / 1,000 results
Rating
0.0
(0)
Developer

Fortuitous Pirate
Actor stats
0
Bookmarked
2
Total users
1
Monthly active users
3 days ago
Last modified
Categories
Share
CrossRef Scholarly Works Scraper
Scrapes scholarly publications metadata from the CrossRef API - access to millions of journal articles, books, conference papers, dissertations, datasets, and more.
Features
- Massive Database: Access 145M+ scholarly works with DOIs
- Rich Metadata: Authors, citations, abstracts, funders, licenses, and more
- Two Modes: Single DOI lookup or bulk search
- Fast with Polite Pool: Up to 50 requests/second with email registration
- No API Key Required: Free access, email recommended for better rate limits
API Source
- Base URL: https://api.crossref.org
- Documentation: https://api.crossref.org/swagger-ui/index.html
- API Key: Not required
- Rate Limits:
- With
mailto(polite pool): ~50 requests/second - Without
mailto: ~1 request/second (slower)
- With
Input Parameters
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
query | string | No | - | Search query for scholarly works |
doi | string | No | - | Specific DOI to look up (e.g., 10.1038/nature12373) |
author | string | No | - | Filter by author name |
publishedSince | string | No | - | Filter works published after this date (YYYY-MM-DD format) |
type | enum | No | All | Work type filter (see options below) |
mailto | string | No | - | Your email for CrossRef's polite pool (faster rate limits) |
limit | integer | No | 100 | Maximum number of works to return (max: 10000) |
Work Type Options
| Value | Description |
|---|---|
| (empty) | All Types |
journal-article | Journal Article |
book | Book |
book-chapter | Book Chapter |
proceedings-article | Conference Paper |
dissertation | Dissertation |
dataset | Dataset |
Usage Modes
1. DOI Lookup Mode
Fetch metadata for a single work by its DOI:
{"doi": "10.1038/nature12373","mailto": "your@email.com"}
2. Search Mode
Search for works matching your criteria:
{"query": "machine learning","author": "Hinton","type": "journal-article","publishedSince": "2020-01-01","mailto": "your@email.com","limit": 500}
Output Fields
Each scraped work includes the following fields:
| Field | Type | Description |
|---|---|---|
doi | string | Digital Object Identifier |
title | string | Work title |
subtitle | string | Work subtitle (if any) |
type | string | Work type (journal-article, book, etc.) |
authors | array | List of authors (see below) |
containerTitle | string | Journal/book name containing the work |
publisher | string | Publisher name |
publishedDate | string | Publication date (YYYY-MM-DD) |
volume | string | Volume number |
issue | string | Issue number |
page | string | Page range |
abstract | string | Work abstract (HTML tags stripped) |
subject | array | Subject categories |
referencesCount | integer | Number of references cited by this work |
citedByCount | integer | Number of works citing this work |
url | string | CrossRef URL for this work |
license | string | License URL |
issn | array | ISSN(s) for the journal |
isbn | array | ISBN(s) for books |
language | string | Language code |
funder | array | Funding organizations (name and DOI) |
score | number | Search relevance score |
scrapedAt | string | Timestamp when data was scraped |
Author Object Structure
{"given": "Geoffrey","family": "Hinton","name": null,"orcid": "https://orcid.org/0000-0001-2345-6789","affiliation": ["University of Toronto", "Google Brain"]}
Example Output
{"doi": "10.1038/nature12373","title": "Playing Atari with Deep Reinforcement Learning","subtitle": null,"type": "journal-article","authors": [{"given": "Volodymyr","family": "Mnih","orcid": null,"affiliation": ["DeepMind Technologies"]}],"containerTitle": "Nature","publisher": "Springer Nature","publishedDate": "2015-02-26","volume": "518","issue": "7540","page": "529-533","abstract": "We present the first deep learning model to successfully learn control policies...","subject": ["Multidisciplinary"],"referencesCount": 35,"citedByCount": 12847,"url": "http://dx.doi.org/10.1038/nature14236","license": "https://www.springer.com/tdm","issn": ["0028-0836", "1476-4687"],"isbn": null,"language": "en","funder": null,"score": 1.0,"scrapedAt": "2026-01-25T12:00:00.000Z"}
Tips for Best Results
-
Always provide
mailto: This gets you into CrossRef's "polite pool" with much faster rate limits (50 req/sec vs 1 req/sec) -
Combine filters: Use
query,author,type, andpublishedSincetogether for precise results -
DOI lookup is fastest: If you know the DOI, use that instead of searching
-
Mind the limits: Maximum 10,000 results per run; for larger datasets, run multiple times with date filters
Local Development
# Install dependenciesnpm install# Run locally with Apify CLIapify run -p '{"query": "deep learning", "limit": 10}'