Crossref Academic Paper Search avatar

Crossref Academic Paper Search

Pricing

from $2.00 / 1,000 paper fetcheds

Go to Apify Store
Crossref Academic Paper Search

Crossref Academic Paper Search

Search 150M+ scholarly papers via Crossref API. Filter by keywords, author, journal, DOI prefix, publication type, and year range. Returns DOIs, citations, authors with ORCID, abstracts, funding data, and publisher metadata. Free, no API key needed.

Pricing

from $2.00 / 1,000 paper fetcheds

Rating

0.0

(0)

Developer

ryan clinton

ryan clinton

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

0

Monthly active users

4 hours ago

Last modified

Share

Search over 150 million scholarly works indexed by Crossref -- the largest open registry of DOI metadata in the world. Retrieve structured publication data including titles, authors with ORCID identifiers, citation counts, journal names, funding information, abstracts, and more. No API key required.

What does Crossref Academic Paper Search do?

Crossref Academic Paper Search lets you programmatically query the Crossref REST API to find and extract metadata from academic papers, journal articles, book chapters, conference proceedings, preprints, datasets, and reports. Crossref indexes metadata from over 20,000 publishers and maintains records for more than 150 million scholarly works, making it one of the most comprehensive sources of academic publication data available.

The actor accepts flexible search filters -- free-text queries, author names, journal or conference titles, DOI prefixes, publication types, and date ranges -- then returns clean, structured JSON for each matching work. Each result includes the DOI, full title, author list with affiliations and ORCID IDs, citation count, reference count, journal name, publisher, volume/issue/page details, subject classifications, funding sources, abstract text, and license URL.

This is ideal for literature reviews, bibliometric analysis, research trend tracking, institutional reporting, and building academic datasets at scale.

Why use Crossref Academic Paper Search on Apify?

Running this actor on Apify gives you several advantages over calling the Crossref API directly:

  • No infrastructure to manage. The actor runs in the cloud, handles pagination automatically, and stores results in a dataset you can export in JSON, CSV, or Excel format.
  • No API key required. Crossref is a free, open API. This actor uses polite pool access for reliable rate limits without any registration on your part.
  • Pagination handled for you. The Crossref API returns a maximum of 100 results per request and limits deep paging to 10,000 offsets. The actor manages all of this transparently, fetching multiple pages until your maxResults limit is reached.
  • Structured, clean output. Raw Crossref responses contain deeply nested JSON with inconsistent date formats and HTML-encoded abstracts. This actor normalizes everything into a flat, consistent schema ready for analysis or import into spreadsheets and databases.
  • Schedule and automate. Set the actor to run on a schedule to track new publications on a topic, monitor an author's output, or watch for papers from a specific publisher or funder.
  • Integrate with anything. Connect results to Google Sheets, Slack, webhooks, or other Apify actors using built-in integrations.

Key features

  • Full-text search across titles, abstracts, and metadata for any topic or keyword.
  • Author filtering to find all works by a specific researcher.
  • Journal/source filtering to scope results to a particular publication venue.
  • DOI prefix filtering to retrieve works from a specific publisher (e.g., 10.1038 for Nature).
  • Publication type filtering for journal articles, book chapters, conference papers, preprints, books, datasets, or reports.
  • Date range filtering with from-year and to-year parameters.
  • Flexible sorting by relevance, citation count, or publication date.
  • Rich author details including given name, family name, sequence position, institutional affiliations, and ORCID identifiers.
  • Funding data with funder names and grant/award identifiers.
  • Up to 1,000 results per run with automatic multi-page fetching.
  1. Navigate to the actor's input page on Apify Console.
  2. Enter a Search Query such as "machine learning" or "CRISPR gene editing". You can also leave this blank and search by author, journal, or DOI prefix instead.
  3. Optionally set filters:
    • Author Name -- e.g., "Jennifer Doudna"
    • Journal/Source Name -- e.g., "Nature", "The Lancet"
    • DOI Prefix -- e.g., "10.1038" for Nature Publishing Group
    • Publication Type -- choose from journal article, book chapter, conference paper, preprint, book, dataset, or report
    • From Year / To Year -- restrict results to a date range
  4. Choose a Sort By option: Relevance (default), Most Cited, or Newest First.
  5. Set Max Results (1 to 1,000, default 50).
  6. Click Start and wait for the run to complete.
  7. Download your results from the Dataset tab in JSON, CSV, or Excel format.

Input parameters

ParameterTypeRequiredDefaultDescription
queryStringNo-Free-text search across titles, abstracts, and full text
authorNameStringNo-Filter by author name (e.g., "Einstein", "Jennifer Doudna")
containerTitleStringNo-Filter by journal or conference name (e.g., "Nature", "Science")
doiPrefixStringNo-Filter by DOI prefix (e.g., 10.1038 for Nature Publishing)
publicationTypeStringNo-Filter by type: journal-article, book-chapter, proceedings-article, posted-content, book, dataset, report
fromYearIntegerNo-Earliest publication year
toYearIntegerNo-Latest publication year
sortByStringNorelevanceSort order: relevance, is-referenced-by-count (most cited), published (newest)
maxResultsIntegerNo50Maximum papers to return (1--1,000)

At least one of query, authorName, containerTitle, or doiPrefix must be provided.

Input examples

Basic topic search -- find the 50 most relevant papers on CRISPR gene editing:

{
"query": "CRISPR gene editing",
"maxResults": 50
}

Author and journal filter -- find all papers by a specific author in Nature, sorted by citation count:

{
"query": "base editing",
"authorName": "David R. Liu",
"containerTitle": "Nature",
"sortBy": "is-referenced-by-count",
"maxResults": 100
}

Publisher scan with date range -- retrieve recent Elsevier journal articles from 2023 to 2025 on climate change:

{
"query": "climate change adaptation",
"doiPrefix": "10.1016",
"publicationType": "journal-article",
"fromYear": 2023,
"toYear": 2025,
"sortBy": "published",
"maxResults": 200
}

Conference proceedings search -- find the latest conference papers on transformer architectures:

{
"query": "transformer architecture attention mechanism",
"publicationType": "proceedings-article",
"fromYear": 2020,
"sortBy": "is-referenced-by-count",
"maxResults": 500
}

Tips for best results

  • Combine filters for precision. Use query together with authorName or containerTitle to narrow results. Searching "deep learning" in the journal "Nature" returns much more targeted results than a broad query alone.
  • Use DOI prefixes to target publishers. Every publisher has a unique DOI prefix. For example, 10.1038 targets Nature, 10.1016 targets Elsevier, 10.1007 targets Springer, and 10.1126 targets Science (AAAS). This is a powerful way to scope searches to a specific publisher's catalog.
  • Sort by citations for high-impact papers. Choose "Most Cited" sorting to find the most influential papers on a topic, which is useful for literature reviews and identifying seminal works.
  • Use date ranges for trend analysis. Combine a topic query with fromYear and toYear to track how research output has changed over time. Run multiple queries with different year ranges to build a timeline.
  • Schedule recurring runs. Set the actor to run weekly or monthly with "Newest First" sorting to monitor new publications on topics you care about.
  • Start broad, then filter. If you are unsure which filters to use, start with a simple query and review the output. Use the type, journal, and subjects fields in the results to decide which filters to add on subsequent runs.

Programmatic access

You can call Crossref Academic Paper Search programmatically using the Apify API. Here are examples in Python, JavaScript, and cURL.

Python (using the apify-client package):

from apify_client import ApifyClient
client = ApifyClient("YOUR_API_TOKEN")
run = client.actor("ryanclinton/crossref-paper-search").call(run_input={
"query": "CRISPR gene editing",
"authorName": "Jennifer Doudna",
"sortBy": "is-referenced-by-count",
"maxResults": 100,
})
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
print(f"{item['title']} — cited {item['citationCount']} times")

JavaScript (using the apify-client package):

import { ApifyClient } from "apify-client";
const client = new ApifyClient({ token: "YOUR_API_TOKEN" });
const run = await client.actor("ryanclinton/crossref-paper-search").call({
query: "CRISPR gene editing",
authorName: "Jennifer Doudna",
sortBy: "is-referenced-by-count",
maxResults: 100,
});
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
console.log(`${item.title} — cited ${item.citationCount} times`);
});

cURL (start a run and retrieve results):

# Start the actor run
curl -X POST "https://api.apify.com/v2/acts/ryanclinton~crossref-paper-search/runs?token=YOUR_API_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"query": "CRISPR gene editing",
"sortBy": "is-referenced-by-count",
"maxResults": 50
}'
# Retrieve results from the dataset (use the defaultDatasetId from the response above)
curl "https://api.apify.com/v2/datasets/DATASET_ID/items?token=YOUR_API_TOKEN&format=json"

Output example

Each item in the output dataset follows this structure:

{
"doi": "10.1126/science.aaf5573",
"url": "http://dx.doi.org/10.1126/science.aaf5573",
"title": "Programmable editing of a target base in genomic DNA without double-stranded DNA cleavage",
"publishedYear": 2016,
"publishedDate": "2016-04-20",
"type": "journal-article",
"citationCount": 3842,
"referencesCount": 47,
"authors": "Alexis C. Komor, Yongjoo B. Kim, Michael S. Packer, John A. Zuris, David R. Liu",
"authorDetails": [
{
"name": "Alexis C. Komor",
"sequence": "first",
"affiliations": ["Harvard University", "Broad Institute"],
"orcid": "https://orcid.org/0000-0003-4884-3253"
},
{
"name": "David R. Liu",
"sequence": "additional",
"affiliations": ["Harvard University", "Howard Hughes Medical Institute"],
"orcid": "https://orcid.org/0000-0002-9943-7557"
}
],
"journal": "Science",
"publisher": "American Association for the Advancement of Science (AAAS)",
"volume": "352",
"issue": "6293",
"page": "1423-1428",
"language": "en",
"issn": ["0036-8075", "1095-9203"],
"subjects": ["Multidisciplinary"],
"funders": [
{
"name": "National Institutes of Health",
"awards": ["R01 EB022376"]
},
{
"name": "Howard Hughes Medical Institute",
"awards": []
}
],
"abstract": "Current genome-editing technologies introduce double-stranded (ds) DNA breaks at a target locus as the first step to gene correction. Although most genetic diseases arise from point mutations, current approaches to point mutation correction are inefficient...",
"license": "https://www.science.org/doi/am-pdf/10.1126/science.aaf5573",
"relevanceScore": 18.742,
"extractedAt": "2026-02-10T14:30:00.000Z"
}

Output fields reference

FieldTypeDescription
doiStringDigital Object Identifier for the work
urlStringCanonical URL resolving to the work (typically via doi.org)
titleStringFull title of the work
publishedYearInteger or nullYear of publication extracted from date-parts
publishedDateString or nullFull publication date in YYYY-MM-DD format
typeStringCrossref work type (e.g., journal-article, book-chapter, proceedings-article)
citationCountIntegerNumber of times this work has been cited by other indexed works
referencesCountIntegerNumber of references this work cites
authorsStringComma-separated list of author names
authorDetailsArrayStructured author data: name, sequence (first/additional), affiliations (array), orcid (URL or null)
journalString or nullJournal or container title (conference name, book title, etc.)
publisherStringName of the publisher
volumeString or nullVolume number
issueString or nullIssue number
pageString or nullPage range (e.g., "1423-1428")
languageString or nullISO language code (e.g., "en")
issnArrayList of ISSNs for the journal (print and electronic)
subjectsArraySubject classification terms assigned by the publisher
fundersArrayFunding sources: name (funder organization) and awards (array of grant IDs)
abstractString or nullPlain-text abstract with HTML tags stripped
licenseString or nullURL to the license or full-text access link
relevanceScoreNumberCrossref relevance score for the query match
extractedAtStringISO 8601 timestamp of when the data was extracted

How it works

The actor follows a straightforward pipeline to query the Crossref REST API, paginate through results, and transform raw metadata into clean output:

Crossref Academic Paper Search
┌─────────────────────────────────────────────────────────────────┐
│ │
│ INPUT │
│ query, authorName, containerTitle, doiPrefix, publicationType │
│ fromYear, toYear, sortBy, maxResults │
│ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────────┐ │
│ │ 1. Validate │───>2. Build URL │───>3. Fetch page │ │
│ │ input params │ │ + filters │ │ (100 rows/req) │ │
│ └──────────────┘ └──────────────┘ └───────┬──────────┘ │
│ │ │
│ ┌───────v──────────┐ │
│ │ 4. More pages? │ │
│ │ offset < 10,000 │ │
│ │ count < maxResults│──>│ Loop back to 3
│ └───────┬──────────┘ │
│ │ Done │
│ ┌──────────────────┐ ┌───────────────────┐ │ │
│ │ 6. Push to Apify │<───│ 5. Transform each │<──┘ │
│ │ dataset │ │ work record │ │
│ └──────────────────┘ └───────────────────┘ │
│ │
│ OUTPUT │
│ Flat JSON with 24 fields per paper │
│ + Run summary: type breakdown, citation stats, top journals │
│ │
└─────────────────────────────────────────────────────────────────┘

Date-parts extraction

Crossref stores publication dates as nested arrays called date-parts rather than ISO date strings. A date might look like [[2016, 4, 20]] for April 20, 2016, or just [[2016]] if only the year is known. The actor extracts dates by checking work.published first, then falling back to work.issued. Missing month and day values are padded to 01 so the output always produces a clean YYYY-MM-DD string when a year is available.

HTML stripping

Some Crossref abstracts contain inline HTML tags (such as <jats:p>, <jats:italic>, and <sub>/<sup>) from the original JATS XML markup. The actor strips all HTML tags using a regex replacement and collapses extra whitespace, so the abstract field always contains clean plain text.

Query parameters vs. filter parameters

The Crossref API distinguishes between query parameters and filter parameters. The query, query.author, and query.container-title fields are query parameters that perform fuzzy matching and contribute to the relevance score. The prefix, type, from-pub-date, and until-pub-date fields are filter parameters that perform exact matching and are joined with commas in a single filter URL parameter. The actor handles this distinction automatically -- author and journal names use query-based matching for flexibility, while DOI prefix, publication type, and date range use strict filtering.

Polite pool access

The Crossref API offers a "polite pool" for clients that identify themselves with a mailto parameter. Polite pool users receive faster, more reliable responses with higher rate limits compared to anonymous access. This actor always includes a mailto parameter in every request, so all runs benefit from polite pool access without any configuration on your part.

Deep paging limit

The Crossref API imposes a hard offset limit of 10,000. This means no matter how many matching papers exist, the actor can only paginate through the first 10,000 results. If your query matches more than 10,000 papers, use filters (date ranges, publication types, DOI prefixes) to narrow the result set. The maxResults parameter caps output at 1,000 per run, well within this limit for most use cases.

How much does it cost to run?

Crossref Academic Paper Search is very lightweight because it only makes REST API calls -- there is no browser rendering or web scraping involved. The Crossref API itself is completely free with no usage fees.

ScenarioPapersApprox. timeApprox. cost
Quick literature check50~10 seconds$0.001
Author bibliography200~20 seconds$0.003
Full dataset extraction1,000~60 seconds$0.01

The actor runs on 256 MB of memory by default, which is more than sufficient. Costs are based on Apify platform compute units and may vary slightly depending on network latency.

Limitations and responsible use

  • Deep paging cap. The Crossref API limits offsets to 10,000 results. For queries that match millions of papers, you will only be able to access the first 10,000. Use filters to narrow your search scope.
  • Rate limiting. Even with polite pool access, the Crossref API enforces rate limits. Very rapid consecutive runs may experience throttling. Space scheduled runs at least a few minutes apart.
  • Abstract availability. Not all works in Crossref include abstracts. Roughly 20-30% of records have an abstract field populated. When unavailable, the abstract field will be null.
  • Citation count lag. Citation counts (is-referenced-by-count) depend on publishers registering reference metadata with Crossref. Counts may lag behind other citation databases like Google Scholar or Semantic Scholar by weeks or months.
  • Metadata completeness. The quality and completeness of Crossref metadata varies by publisher. Some records may be missing author affiliations, ORCID identifiers, subject classifications, or funding data. The actor returns null or empty arrays for missing fields rather than guessing.
  • No full-text access. Crossref provides metadata only. The actor does not download or return the full text of papers. Use the doi or url fields to access the paper through the publisher's website.
  • Respectful use. This actor accesses a public, community-funded infrastructure. Do not schedule excessively frequent runs or extract data beyond what you need for legitimate research, analysis, or integration purposes.

FAQ

Do I need a Crossref API key to use this actor? No. The Crossref REST API is free and open. This actor uses Crossref's polite pool by including a contact email in every request, which provides better rate limits and reliability. No registration or API key is required on your part.

What types of publications can I search for? You can search across journal articles, book chapters, conference proceedings, preprints (posted content), books, datasets, and reports. Use the Publication Type filter to restrict results to a specific type, or leave it blank to search across all types.

How current is the data? Crossref metadata is updated continuously as publishers register new DOIs and update existing records. New publications typically appear within days of their official publication date. Citation counts are also updated regularly as new works reference existing publications.

What is the maximum number of results I can retrieve? You can retrieve up to 1,000 results per run. The Crossref API's deep paging limit of 10,000 offsets is the underlying constraint. For very broad queries matching millions of papers, use filters (date ranges, types, DOI prefixes) to narrow the scope.

How is sorting by relevance determined? When you sort by "relevance" (the default), Crossref calculates a relevance score based on how well the work's metadata matches your query terms. The score considers matches in the title, abstract, author names, and other metadata fields. The relevanceScore field in the output reflects this value.

Can I combine this actor with other academic search actors? Yes. Each academic database has different coverage and metadata strengths. You can run Crossref alongside PubMed (for biomedical focus), Semantic Scholar (for AI-generated insights), OpenAlex (for open-access metadata), or ArXiv (for preprints) and merge the results by DOI to build a comprehensive dataset.

ActorDescriptionBest for
OpenAlex Research SearchSearch the OpenAlex catalog of scholarly works, authors, and institutionsOpen-access metadata, institutional analysis
PubMed Biomedical Literature SearchSearch biomedical and life sciences literature from the U.S. National Library of MedicineMedical and biomedical research
Semantic Scholar Paper SearchSearch Semantic Scholar for papers with AI-generated abstracts and citation contextAI-powered paper recommendations, citation graphs
ArXiv Preprint Paper SearchSearch preprints from arXiv in physics, math, CS, and morePreprints, cutting-edge research before peer review
CORE Open Access PapersSearch millions of open access research papers from repositories worldwideFull-text open access content
Europe PMC Literature SearchSearch European biomedical literature including full-text open access articlesEuropean biomedical research, open access full text