Crossref Scholarly Metadata — 150M+ Works avatar

Crossref Scholarly Metadata — 150M+ Works

Pricing

from $2.00 / 1,000 results

Go to Apify Store
Crossref Scholarly Metadata — 150M+ Works

Crossref Scholarly Metadata — 150M+ Works

Search 150M+ scholarly works from 20,000+ publishers via Crossref API. Extract DOIs, citations, authors, abstracts, journals, funding data, and publication metadata. Essential for bibliometrics, research impact analysis, citation network studies, and academic data science. No API key required.

Pricing

from $2.00 / 1,000 results

Rating

0.0

(0)

Developer

kettledrum

kettledrum

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

0

Monthly active users

10 hours ago

Last modified

Share

Crossref Scholarly Metadata

Extract scholarly metadata from the Crossref REST API — the largest open database of academic works with 150M+ records from 20,000+ publishers.

How much does it cost?

This Actor uses pay-per-event pricing. You are charged per result item returned.

No proxy costs. No API key costs. Crossref data is freely accessible from any location.

What data is available?

  • Articles: Journal articles, conference papers, preprints, dissertations
  • Books: Books, book chapters, monographs, edited volumes
  • Datasets: Research datasets with DOIs
  • Other: Reports, peer reviews, standards, grants

Each record includes DOI, title, authors (with ORCIDs), abstract, citation counts, publication date, journal/book info, subjects, license, and PDF links when available.

Modes

1. Works Search (default)

Search scholarly works by keyword, author, topic, or any combination.

Example — Find highly-cited machine learning papers:

  • Query: machine learning
  • Sort: Citation Count
  • Order: Descending
  • Max Results: 100

Example — Recent journal articles with abstracts:

  • Query: CRISPR gene editing
  • Work Type: Journal Article
  • From Date: 2024-01-01
  • Has Abstract: true

2. DOI Lookup

Get complete metadata for a single work by its DOI, including the full reference list.

Example:

  • DOI: 10.1038/nature12373

Returns the work's metadata plus up to 500 references with their DOIs, titles, and authors.

Search journals by name. Returns coverage statistics, ISSN, publisher info, and DOI counts.

Example:

  • Query: nature

Search funding organizations from the Open Funder Registry.

Example:

  • Query: national science foundation

Filters (Works mode)

FilterDescription
Work TypeFilter by content type (journal-article, book, dataset, etc.)
Sort BySort by relevance, publication date, citation count, or reference count
From/Until DateFilter by publication date range (YYYY-MM-DD or YYYY)
Has AbstractOnly return works with abstracts

Output

Each result includes (when available):

FieldDescription
doiDigital Object Identifier
titleWork title
authorsAuthor names, ORCIDs, and affiliations
abstractAbstract text (JATS tags removed)
typeContent type (journal-article, book-chapter, etc.)
publisherPublisher name
containerTitleJournal or book title
publishedDatePublication date
yearPublication year
citationCountNumber of times cited by other works
referenceCountNumber of references in the work
subjectsSubject categories
licenseUrlLicense URL (e.g., Creative Commons)
pdfUrlDirect link to PDF

Use cases

  • Literature reviews — systematically collect papers on a topic with citation counts and metadata
  • Bibliometric analysis — study publication patterns, citation networks, and research impact
  • Research monitoring — track new publications in a field by date range
  • Journal analysis — compare journals by DOI count, coverage depth, and publisher info
  • Funding analysis — identify funding organizations and their research portfolios
  • Academic data pipelines — feed structured scholarly data into research databases or AI models

FAQ

Q: How large is the Crossref database? A: Over 150 million works with DOIs from 20,000+ publishers. This includes journal articles, books, datasets, conference papers, preprints, and more. New records are added continuously.

Q: Can I get the full text of papers? A: Crossref provides metadata, not full text. However, many records include a pdfUrl link to the publisher's full-text PDF (especially for open access articles). You can filter for records with abstracts using hasAbstract: true.

Q: How do I get all papers by a specific author? A: Use works search with the author's name as the query. For more precise results, use the author's ORCID if available. Results include author ORCID identifiers when registered.

Q: What's the maximum number of results per run? A: Set maxResults up to any number. The Actor uses cursor-based pagination with no depth limit, so you can retrieve thousands of records in a single run. Cost scales linearly with result count.

Q: Can I search by journal or publisher? A: Yes. Use journals mode to find journal metadata by name, or include the journal name in your works search query. You can also filter by work type (journal-article, book-chapter, etc.).

Q: How often is the data updated? A: Crossref data updates in near real-time as publishers register new DOIs. Recently published papers typically appear within days of publication.

Integration with Python

from apify_client import ApifyClient
client = ApifyClient("YOUR_API_TOKEN")
# Search for recent machine learning papers
run = client.actor("aligned_kettledrum/crossref-scholarly-data").call(
run_input={
"mode": "works",
"query": "large language models",
"fromDate": "2025-01-01",
"workType": "journal-article",
"hasAbstract": True,
"sort": "citationCount",
"order": "desc",
"maxResults": 200,
}
)
# Analyze with pandas
import pandas as pd
items = list(client.dataset(run["defaultDatasetId"]).iterate_items())
df = pd.DataFrame(items)
# Top cited papers
print(df[["title", "citationCount", "year", "containerTitle"]].head(20))
# Citation distribution
print(f"Total papers: {len(df)}")
print(f"Mean citations: {df['citationCount'].mean():.1f}")
print(f"Median citations: {df['citationCount'].median():.0f}")

Technical details

  • Data source: Crossref REST API — free, no API key required
  • Database size: 150M+ works with DOIs
  • Rate limiting: Uses polite pool (10 req/sec) with automatic rate limiting
  • Pagination: Cursor-based deep pagination (no depth limit)
  • Memory: 128-512 MB (API calls only, no browser needed)