Crossref Scholarly Metadata — 150M+ Works
Pricing
from $2.00 / 1,000 results
Crossref Scholarly Metadata — 150M+ Works
Search 150M+ scholarly works from 20,000+ publishers via Crossref API. Extract DOIs, citations, authors, abstracts, journals, funding data, and publication metadata. Essential for bibliometrics, research impact analysis, citation network studies, and academic data science. No API key required.
Pricing
from $2.00 / 1,000 results
Rating
0.0
(0)
Developer
kettledrum
Actor stats
0
Bookmarked
2
Total users
0
Monthly active users
10 hours ago
Last modified
Categories
Share
Crossref Scholarly Metadata
Extract scholarly metadata from the Crossref REST API — the largest open database of academic works with 150M+ records from 20,000+ publishers.
How much does it cost?
This Actor uses pay-per-event pricing. You are charged per result item returned.
No proxy costs. No API key costs. Crossref data is freely accessible from any location.
What data is available?
- Articles: Journal articles, conference papers, preprints, dissertations
- Books: Books, book chapters, monographs, edited volumes
- Datasets: Research datasets with DOIs
- Other: Reports, peer reviews, standards, grants
Each record includes DOI, title, authors (with ORCIDs), abstract, citation counts, publication date, journal/book info, subjects, license, and PDF links when available.
Modes
1. Works Search (default)
Search scholarly works by keyword, author, topic, or any combination.
Example — Find highly-cited machine learning papers:
- Query:
machine learning - Sort:
Citation Count - Order:
Descending - Max Results:
100
Example — Recent journal articles with abstracts:
- Query:
CRISPR gene editing - Work Type:
Journal Article - From Date:
2024-01-01 - Has Abstract:
true
2. DOI Lookup
Get complete metadata for a single work by its DOI, including the full reference list.
Example:
- DOI:
10.1038/nature12373
Returns the work's metadata plus up to 500 references with their DOIs, titles, and authors.
3. Journals Search
Search journals by name. Returns coverage statistics, ISSN, publisher info, and DOI counts.
Example:
- Query:
nature
4. Funders Search
Search funding organizations from the Open Funder Registry.
Example:
- Query:
national science foundation
Filters (Works mode)
| Filter | Description |
|---|---|
| Work Type | Filter by content type (journal-article, book, dataset, etc.) |
| Sort By | Sort by relevance, publication date, citation count, or reference count |
| From/Until Date | Filter by publication date range (YYYY-MM-DD or YYYY) |
| Has Abstract | Only return works with abstracts |
Output
Each result includes (when available):
| Field | Description |
|---|---|
doi | Digital Object Identifier |
title | Work title |
authors | Author names, ORCIDs, and affiliations |
abstract | Abstract text (JATS tags removed) |
type | Content type (journal-article, book-chapter, etc.) |
publisher | Publisher name |
containerTitle | Journal or book title |
publishedDate | Publication date |
year | Publication year |
citationCount | Number of times cited by other works |
referenceCount | Number of references in the work |
subjects | Subject categories |
licenseUrl | License URL (e.g., Creative Commons) |
pdfUrl | Direct link to PDF |
Use cases
- Literature reviews — systematically collect papers on a topic with citation counts and metadata
- Bibliometric analysis — study publication patterns, citation networks, and research impact
- Research monitoring — track new publications in a field by date range
- Journal analysis — compare journals by DOI count, coverage depth, and publisher info
- Funding analysis — identify funding organizations and their research portfolios
- Academic data pipelines — feed structured scholarly data into research databases or AI models
FAQ
Q: How large is the Crossref database? A: Over 150 million works with DOIs from 20,000+ publishers. This includes journal articles, books, datasets, conference papers, preprints, and more. New records are added continuously.
Q: Can I get the full text of papers?
A: Crossref provides metadata, not full text. However, many records include a pdfUrl link to the publisher's full-text PDF (especially for open access articles). You can filter for records with abstracts using hasAbstract: true.
Q: How do I get all papers by a specific author? A: Use works search with the author's name as the query. For more precise results, use the author's ORCID if available. Results include author ORCID identifiers when registered.
Q: What's the maximum number of results per run?
A: Set maxResults up to any number. The Actor uses cursor-based pagination with no depth limit, so you can retrieve thousands of records in a single run. Cost scales linearly with result count.
Q: Can I search by journal or publisher? A: Yes. Use journals mode to find journal metadata by name, or include the journal name in your works search query. You can also filter by work type (journal-article, book-chapter, etc.).
Q: How often is the data updated? A: Crossref data updates in near real-time as publishers register new DOIs. Recently published papers typically appear within days of publication.
Integration with Python
from apify_client import ApifyClientclient = ApifyClient("YOUR_API_TOKEN")# Search for recent machine learning papersrun = client.actor("aligned_kettledrum/crossref-scholarly-data").call(run_input={"mode": "works","query": "large language models","fromDate": "2025-01-01","workType": "journal-article","hasAbstract": True,"sort": "citationCount","order": "desc","maxResults": 200,})# Analyze with pandasimport pandas as pditems = list(client.dataset(run["defaultDatasetId"]).iterate_items())df = pd.DataFrame(items)# Top cited papersprint(df[["title", "citationCount", "year", "containerTitle"]].head(20))# Citation distributionprint(f"Total papers: {len(df)}")print(f"Mean citations: {df['citationCount'].mean():.1f}")print(f"Median citations: {df['citationCount'].median():.0f}")
Technical details
- Data source: Crossref REST API — free, no API key required
- Database size: 150M+ works with DOIs
- Rate limiting: Uses polite pool (10 req/sec) with automatic rate limiting
- Pagination: Cursor-based deep pagination (no depth limit)
- Memory: 128-512 MB (API calls only, no browser needed)